Many years ago, I wrote a review article in which I analyzed what was then known about the structure of antibodies. Part of my introduction was a section that I titled “Caveat,†where I mentioned the limitations of the data which I was writing about. A senior colleague, who read a draft copy of my review, warned me that because of my caveats readers might not believe the results of my analysis. I persisted. That review remains one of my most cited pieces of work.
Like most others, I am always skeptical of assertions of a “scientific truth.†The designation of “truth†in science is often arrived at by consensus. And what may be “true†today may no longer be true tomorrow. Science is moving very fast, with rapid improvements in methods, equipment, sample collection, etc., so that better data and results are more easily obtained. In many areas of science, earlier results and interpretations are soon obsolete. And old “truths†are replaced by newer “truths.†So, like most others, I try to be very cautious when I interpret my results and, when I report those results, I mention the limitations of my data and state my assumptions, and warn the reader of the possible errors and consequences of those assumptions.
Most of what I do concern results obtained by X-ray crystallography and many times I avail myself of the data which had been deposited in the Protein Data Bank (PDB) (http://www.rcsb.org/pdb/home/home.do). Having been trained as a protein crystallographer, I understand the limitations of the technique and am familiar with how the structures and their interpretations are reported.
When analyzing a PDB entry, I carefully read the REMARK lines in the entries. I pay particular attention to the “resolution†of the structure (how much structural detail can be inferred from the study) and the “crystallographic residual†or “R-value†(how well the structural interpretation agrees with the experimental data). Those quantities are presented in the “RESOLUTION†and “R VALUE†remark lines. In addition, the uncertainties in the interpretation of the structure are presented in the lines containing the atomic coordinates. For example, in the sample line below, the estimated confidence that
[ATOM 1 N GLY A 15 36.532 0.924 -31.942 1.00 24.85 N]
the atom is positioned correctly can be inferred from the “B-value†(the “24.85†in the line). (The lower the B-value, the greater is the confidence.) Further, the presence of alternative structures (evidence of structural flexibility) is conveyed in the “Occupancy†(the “1.00†in the sample line). (If alternative conformations are present, some of the atoms are listed in two or more lines bearing the coordinates of the alternative positions and with occupancies all less than 1.00 (and usually totaling 1.00). Atoms whose location could not be ascertained, or are simply modeled in for completeness, are usually given occupancies of 0.00.) A PDB entry often also includes remark lines that reveal the existence of structural elements that could not be definitively located and which are listed under the heading “MISSING RESIDUES,†or “MISSING ATOMS.†Other departures from norms are conveyed in other remark lines. And so on …
Once, after listening to a presentation on a comparative analysis of protein structures, I asked the speaker (privately) if he had read all the “Remarks†in each of the entries he had analyzed. His response was that it would have taken a lot of time to download and read the details included in the many PDB entries that he was comparing, so he didn’t.
Indeed, it takes time to do a careful reading of every PDB entry, but the time is well spent in knowing which structures are to be believed and which should be viewed with skepticism. And data that are suspect should be revealed as such, especially if used in conjunction with data that are much more reliable. Let me show why.
Following is a rough guide in assessing a crystallographically determined structure (excerpted from http://www.chemie.de/lexikon/e/Protein_structure/ to which the reader is referred):
If the resolution is greater than 4.0 Angstroms, individual coordinates are unreliable; between 3.00 and 4.00 Angstroms, the fold (tertiary structure) is probably correct, but errors are very likely; from 2.00 and 3.00 Angstroms, the fold is even more likely correct, but some surface loops and side chain conformations are probably mismodeled; between 1.5 and 2.00 Angstroms, the fold is very rarely incorrect and most side chains are properly modeled; better than 1.5 Angstroms, errors in the structure are very rare.
One can see how unwise it would be to compare structures that had been obtained at 4.00-Angstrom resolution or worse, to those which had been analyzed at 2.00 Angstroms or better. Clearly, we should not simply download and use the structures without checking the details of the structure analyses. Our comparisons could be wrong and our audience may be led to believe what could be untrue.
Limitations in the availability of adequate samples, in instrumentation, in data collection facilities, as well as in data analysis, lead to results that cannot be perfect and conclusions based on those results have to be regarded with caution. When we report our results, we have to make the readers aware of the limitations of our study.
Caveats are important.
* * *
Eduardo A. Padlan was a research physicist at the US National Institutes of Health until his retirement in 2000. He is currently serving as an adjunct professor in the Marine Science Institute, University of the Philippines Diliman, and is a corresponding member of the NAST. He may be contacted at eduardo.padlan@gmail.com.