Leherte, Glasgow, Baxter, Steeg and Fortier 1997: Analysis of three-dimensional protein images

[ CogSci Summaries home | UP | email ]
http://www.jimdavies.org/summaries/

Leherte, L., Glasgow, J., Baxter, K., Steeg, E., and Fortier, S. (1997). Analysis of three-dimensional protein images. Journal of Artificial Intelligence Research, 7, 125--159.

@Article{LeherteGlasgowBaxterSteegFortier1997,
  author = 	 {Leherte, L. and Glasgow, J. and Baxter, K. and
  Steeg, E. and Fortier, S.},
  title = 	 {Analysis of three-dimensional protein images},
  journal = 	 {Journal of Artificial Intelligence Research},
  year = 	 {1997},
  OPTvolume = 	 {7},
  OPTpages = 	 {125--159},
  OPTnote = 	 {},
  OPTannote = 	 {}
}

Author of the summary: Jim Davies, 2004, jim@jimdavies.org

Cite this paper for:

SYSTEM: ORCRIT

ML methods can be successfully applied to predicting secondary structure from electron-density maps.

Abstract:
A fundamental goal of research in molecular biology is to understand protein structure. Protein crystallography is currently the most successful method for determining the three-dimensional (3D) conformation of a protein, yet it remains labor intensive and relies on an expert's ability to derive and evaluate a protein scene model. In this paper, the problem of protein structure determination is formulated as an exercise in scene analysis. A computational methodology is presented in which a 3D image of a protein is segmented into a graph of critical points. Bayesian and certainty factor approaches are described and used to analyze critical point graphs and identify meaningful substructures, such as alpha-helices and beta-sheets. Results of applying the methodologies to protein images at low and medium resolution are reported. The research is related to approaches to representation, segmentation and classification in vision, as well as to top-down approaches to protein structure prediction.
--end Abstract This paper describes techniques for segmenting proteins and identifying secondary structure. [127]

Molecular scene analysis: the processes of reconstruction, classification and understanding of molecular images. [125] It uses rules of biochemistry and structural templates to interpreting images from crystallization experiments. A protein crystal is a substance in which the protein regularly repeats. Any one repeating cubic section of the crystal is termed a "unit cell." [126] A unit cell is used to create an "electron density map," (EDM)which is "a 3d array of real values that estimate the electron density at given locations in the unit cell." That is, where there are a lot of electrons means there are atoms. So the rough shape of the molecules in the unit cell can be seen in the EDM. These EDMs are noisy because of the phase problem. Interpretation of EDMs involves a biologist segmenting the image into features, and then compared with anticipated structural motifs. These guesses result in information that allows the EDM to be more refined. It takes forever and requires an expert, who must recognize motifs in the 3d representation. The eventual goal of the research this paper describes it to automate this process.

Ideally, we would be able to predict global structure from the amino acid sequence, which is relatively easy to get. Because this is a difficult and unsolved problem, x-ray crystallography and nuclear magnetic resonance are the only realistic ways to do it.[128]

The 3d annotated graph
The simple representation this work uses preserves relevant shape, connectivity, and distance information, in 3d annotated graphs, which trace the main and side chains of the protein (protein is made of a chain of amino acids, the main chain, and connected side chains.) The graph nodes are amino acid residues and the edges are bond interactions. This graph can be used to determine secondary structure motifs in the protein (secondary structures are alpha helices and beta sheets. Most proteins are made of these structures and "loops," which connect them.

Section 2 of this paper describes the basic molecular biology of proteins, which I will not summarize.

Much of vision research involves constructing a 3d model from 2d images. In contrast crystallographic data is already in 3d voxels, it's noisy and incomplete, but shadows, shading, and occlusion are not problems. [130]

From the initial low-rez EDM, the first goal is to locate where the protein is and distinguish it from water (the "solvent").[131]
From the medium-rez EDM, the goal is to identify amino acid (residues) and the secondary structures (alpha helices and beta sheets). At high-rez you want to identify specific residues and perhaps the locations of individual atoms.

The first step of scene analysis is to partition the image into regions, where each region hopefully corresponds to some meaningful part. These parts are used as input for a classifier. Critical points define a skeleton (but not by thinning). The protein data bank (PDB) is a database of protein structures which is the data for pattern recognition. Substructures repeat in protein shapes.

This paper will argue for the feasibility of topological approaches to low and medium rez EDMs.[133]

There are peaks and passes along the chain of amino acids in the EDM. The peaks are generally associated with amino acid residues and and passes are bonds that link them.[137] Where there is ambiguity, there are plans to have the system evaluate hypotheses.

That at low resolution, linear sequences of critical points are secondary structures.

SYSTEM: ORCRIT

The peaks and passes at 5 angstroms have a hierarchical relationship with the more detailed peaks and passes at 3 angstroms.[140]

Statistics were used to analyze 63 protein structures. f(ssm|g) is the probability distribution where ssm is the secondary structure motif and g is the geometrical constraints. These were computed for alpha helix, beta sheets, and turns (loops).[142]

Both a Bayesian/Minimum Message Length (MML) approach and a MYCIN-like approach were used to try to identify secondary structure in ideal data. The first Bayes method and the MYCIN approach traded off on quality depending on whether an alpha helix, beta strand, or turn was being identified.[148] The second Bayes approach had a lot of false positives. [149]

The experiment was run with experimental data as well, using a post-processed version of the ORCRIT output. No method was much better than any other. All were "relatively sucessful" at identifying secondary structures.

Summary author's notes:

none

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:

JimDavies (jim@jimdavies.org)

Last modified: Tue May 13 10:28:57 EDT 2003