Predicting Conformational Changes in Proteins Based on Hydrophobic Cores
Pradeep Chowriappa1, Sumeet Dua1,2, and Hilary Thompson2
1 Data Mining Research Laboratory, Louisiana Tech University, Ruston, LA 71270, USA.
2 LSU Eye Center, LSU Health Sciences Center, New Orleans, LA 70112-2234.
{pradeep, sdua}@latech.edu, hthomp2@lsuhsc.edu
Abstract. Proteins are flexible macromolecules which allow the protein backbone to change from one specific folded conformation to another. Protein folding is frequently guided by local residue interactions that form clusters in the protein core. The interactions between residue clusters serve as potential nucleation sites in the folding process. Evidence indicates that functionally important protein flexible residue interactions are governed by the hydrophobic propensities that they possess. We hypothesize that proteins processes hydrophobic residue cores and that structural flexibility between interacting cores are vital to the function of the protein in the folded state. We propose a graph theory based data mining tool to extract and isolate protein structural features that sustain invariance in evolutionary related proteins, through the integrated analysis of five well-known hydrophobicity scales over the 3D structure of proteins. This tool is designed to successfully predict physico-chemical property-flexibility relationships that have been experimentally confirmed as functionally important. We previously obtained an average accuracy of 90% in protein classification, which we will now extend to incorporate protein flexibility.
Keywords: Protein domain, physico-chemical properties, structure prediction.
1 Introduction
The mechanisms of protein conformational changes have been studied using X-ray crystallography for more than 20 years. Structures of the same protein in different conformations (e.g., with a bound ligand) are available from the Protein Data Bank. The motions of proteins down to nanosecond timescales can now be obtained using time resolved X-ray crystallography [1]. Recently, it has become possible to study large-scale protein motions using NMR [2]. From these data we can identify the mechanisms involved in protein domain motions. A key concept in the study of protein structure is the domain. A domain is a compactly folded region of a protein that has independent stability and is usually linked to other domains by few structural elements, such as a loop or a helix. A domain is a relatively rigid region connected to other domains by flexible inter-domain regions. Most large proteins are built from assemblies of domains that consist of regions of nearly rigid motions joined by flexible regions. The ability of different protein regions to move relative to each other with a small expenditure of energy is defined as the proteins intrinsic flexibility [3]. The two types of motions associated with intrinsic flexibility are governed by the internal packing of the interfaces between two regions in a protein. The first type of motion is a hinge mechanism that occurs when there is no continuously maintained interface constraining the motion. Hinge motions usually occur in proteins with two domains with one domain rotating about the hinge as a rigid body. The rotation is caused by a few large torsion angle changes within the hinge region. The second type of motion is a shear mechanism that occurs when two interfaces slide across each other in order to maintain a well-packed interface. Shear motions are typically small, and a large shear motion will be composed of a number of individual shear motions.
2 Proposed Methodology
Motions of macromolecules (proteins and nucleic acids) are often the essential link between structure and function; that is, motion is frequently the way a structure carries out a particular function. Protein motions, in particular, are involved in basic functions such as catalysis, regulation of activity, transport of metabolites, formation of large assemblies, and cellular locomotion. In this tool, our data space for knowledge discovery consists of estimates of the hydrophobicity indices of individual amino acids in a protein calculated by a variety of common methods as well as other numeric indices of amino acid impact in 3D protein conformations (see Table 1).
Table 1. Ranks of amino acid based on propensities assigned by the five hydrophobic scales from the AAindex [4].
Rank
|
1 |
2 |
3
|
4
|
5 |
6
|
7
|
8
|
9 |
10
|
Kyle and Doolittle
|
ARG |
LYS |
ASP |
GLU
|
ASN |
GLN |
HIS |
PRO |
TYR |
TRP |
Hopp Woods
|
TRP |
PHE |
TYR |
ILE |
LEU |
VAL |
MET |
CYS |
ALA |
HIS |
Jamin et al.
|
LYS |
ARG |
GLU |
GLN |
ASP |
ASN |
TYR |
PRO |
THR |
HIS |
Rose et al.
|
LYS |
ASP |
GLU |
GLN |
ASN |
PRO |
ARG |
SER |
THR |
GLY |
Eisenberg et al.
|
ARG |
LYS |
ASP |
GLN |
ASN |
HIS |
HIS |
SER |
THR |
PRO |
| Rank |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18
|
19 |
20
|
Kyle and Doolittle
|
SER |
THR |
GLY |
ALA |
MET |
CYS |
PHE |
LEU |
VAL |
ILE |
Hopp Woods
|
THR |
GLY |
PRO |
ASN |
GLN |
SER |
GLU |
GLU |
LYS |
ARG |
Jamin et al.
|
SER |
ALA |
GLY |
TRP |
MET |
PHE |
VAL |
VAL
|
ILE |
CYS |
Rose et al.
|
ALA |
TYR |
HIS |
LEU |
MET
|
TRP |
PHE |
PHE |
ILE |
CYS |
Eisenberg et al.
|
TYR |
CYS |
GLY |
ALA |
MET |
TRP |
VAL |
VAL |
PHE |
ILE |
Mapping this information into projection spaces and creating graphs of amino acid forms in this space has allowed us to predict the membership of proteins into protein structure classes with a high degree of specificity and sensitivity.
2.1 Identification of Compact Structural Hydrophobic Cores
Fig. 1 provides a detailed overview of the methodology for identifying hydrophobic cores based on the key hydrophobic scales from the scales in Table 1. See [5] for details more information.
Fig 1. The Proposed Methodology [1].
2.2 Significance of Conserved Regions
We analyze all conserved residues and compare the structural environment to amino acids in the naturally occurring proteins in the dataset, using packing density, hydrogen bonding, and solvent accessibility to determine the parameters [5]. We discuss the results in Section 3.
2.3 The Identification of Flexibility Regions
The proposed graph based tool is based on the simplicity of the elastic theory [6], which is based on the coordinates of the Cα atoms serving as nodes. The connectivity within the protein structure is represented as a Kirchhoff matrix Γ where R is the distance between the Cα atoms of residues i and j with rc denoting the distance radius threshold (7 Å).
(1)
The equilibrium-correlated fluctuations between two sites can be obtained by finding the inverse of the Kirchhoff matrix and is represented as:

(2)
where kb is the Boltzmann constant, T is the absolute temperature, and γ is a single-parameter harmonic potential that accounts for the fluctuations of a residue about a mean axis.
Cross-correlated fluctuations between residues i and j are defined as:
(3)
Participation in correlated movements was used to define functionally important, flexible regions. See [6] for details.
3 Results
Fig 2. Composition of amino acids in conserved residues of the summary graphs compared with the entire protein representative set. The Y-axis is the percentage of amino acids and the X-axis is: a. hydrogen bonding interactions, b. Ooi number in an 8 Å radius around the amino acid, and c. solvent accessible contact area as a percentage of residue accessibility.
Download
The PC4 tutorial and executable file are available for download. Download Tutorial EXE.
References
-
Genick, U.K., Borgstahl, G.E., Ng, K., Ren, Z., Pradervand, C., Burke, P.M., Srajer, V., Teng, T.Y., Schildkamp, W., McRee, D.E., Moffat, K., Getzoff, E.D.: Structure of a protein photocycle intermediate by millisecond time-resolved crystallography. Science 275, 1471–1475 (1997).
- Volkman, B.F., Lipson, D., Wemmer, D.E., Kern, D.: Two-state allosteric behavior in a single-domain signaling protein. Science 291, 2429–2433 (2001).
- Gerstein, M., Lesk, A.M., Chothia, C.: Structural mechanisms for domain movements. Biochemistry 33, 6739–6749 (1994).
- Venkatarajan, M.S., Braun, W.: New Quantitative Descriptors of Amino Acids Based on Multi Dimensional Scaling of a Large Number of Physical-chemical Properties, Journal of Molecular Modeling 7, 445—453 (2001).
- Chowriappa, P., Dua, S., Kanno, J., Thompson H.W.: Protein Structure Classification Based on Conserved Hydrophobic Residues. IEEE/ACM TCBB 99, 5555 (2008).
- Gu, J., Gribskov, M., Bourne, P.E.: Wiggle—Predicting Functionally Flexible Regions from Primary Sequence. PLoS Comput. Biol. 2, e90 (2006).
|