spacer
Louisiana Tech University's Home Page CEnIT Home Page DMRL Home Page  
spacer
 
header

Tools Description
People
Contact


 

Predicting Conformational Changes in Proteins Based on Hydrophobic Cores

Pradeep Chowriappa1, Sumeet Dua1,2, and Hilary Thompson2

1 Data Mining Research Laboratory, Louisiana Tech University, Ruston, LA 71270, USA. 2 LSU Eye Center, LSU Health Sciences Center, New Orleans, LA 70112-2234.
{pradeep, sdua}@latech.edu, hthomp2@lsuhsc.edu

Abstract. Proteins are flexible macromolecules which allow the protein backbone to change from one specific folded conformation to another. Protein folding is frequently guided by local residue interactions that form clusters in the protein core. The interactions between residue clusters serve as potential nucleation sites in the folding process. Evidence indicates that functionally important protein flexible residue interactions are governed by the hydrophobic propensities that they possess. We hypothesize that proteins processes hydrophobic residue cores and that structural flexibility between interacting cores are vital to the function of the protein in the folded state. We propose a graph theory based data mining tool to extract and isolate protein structural features that sustain invariance in evolutionary related proteins, through the integrated analysis of five well-known hydrophobicity scales over the 3D structure of proteins. This tool is designed to successfully predict physico-chemical property-flexibility relationships that have been experimentally confirmed as functionally important. We previously obtained an average accuracy of 90% in protein classification, which we will now extend to incorporate protein flexibility.

Keywords: Protein domain, physico-chemical properties, structure prediction.

1 Introduction

The mechanisms of protein conformational changes have been studied using X-ray crystallography for more than 20 years. Structures of the same protein in different conformations (e.g., with a bound ligand) are available from the Protein Data Bank. The motions of proteins down to nanosecond timescales can now be obtained using time resolved X-ray crystallography [1]. Recently, it has become possible to study large-scale protein motions using NMR [2]. From these data we can identify the mechanisms involved in protein domain motions. A key concept in the study of protein structure is the domain. A domain is a compactly folded region of a protein that has independent stability and is usually linked to other domains by few structural elements, such as a loop or a helix. A domain is a relatively rigid region connected to other domains by flexible inter-domain regions. Most large proteins are built from assemblies of domains that consist of regions of nearly rigid motions joined by flexible regions. The ability of different protein regions to move relative to each other with a small expenditure of energy is defined as the proteins intrinsic flexibility [3]. The two types of motions associated with intrinsic flexibility are governed by the internal packing of the interfaces between two regions in a protein. The first type of motion is a hinge mechanism that occurs when there is no continuously maintained interface constraining the motion. Hinge motions usually occur in proteins with two domains with one domain rotating about the hinge as a rigid body. The rotation is caused by a few large torsion angle changes within the hinge region. The second type of motion is a shear mechanism that occurs when two interfaces slide across each other in order to maintain a well-packed interface. Shear motions are typically small, and a large shear motion will be composed of a number of individual shear motions.

2 Proposed Methodology

Motions of macromolecules (proteins and nucleic acids) are often the essential link between structure and function; that is, motion is frequently the way a structure carries out a particular function. Protein motions, in particular, are involved in basic functions such as catalysis, regulation of activity, transport of metabolites, formation of large assemblies, and cellular locomotion. In this tool, our data space for knowledge discovery consists of estimates of the hydrophobicity indices of individual amino acids in a protein calculated by a variety of common methods as well as other numeric indices of amino acid impact in 3D protein conformations (see Table 1).

Table 1. Ranks of amino acid based on propensities assigned by the five hydrophobic scales from the AAindex [4].
Rank
 1  2 3
4
 5 6
7
8
9 10
 Kyle and Doolittle
 ARG  LYS  ASP GLU
 ASN  GLN  HIS  PRO  TYR  TRP
 Hopp Woods
 TRP  PHE  TYR  ILE  LEU  VAL  MET  CYS  ALA  HIS
 Jamin et al.
 LYS  ARG  GLU  GLN  ASP  ASN  TYR  PRO  THR  HIS
 Rose et al.
 LYS  ASP  GLU  GLN  ASN  PRO  ARG  SER  THR  GLY
 Eisenberg et al.
 ARG  LYS  ASP  GLN  ASN  HIS  HIS  SER  THR  PRO
 Rank  11  12  13  14  15  16  17 18
 19 20
 Kyle and Doolittle
 SER  THR  GLY  ALA  MET  CYS  PHE  LEU  VAL  ILE
 Hopp Woods
 THR  GLY  PRO  ASN  GLN  SER  GLU  GLU  LYS  ARG
 Jamin et al.
 SER  ALA  GLY  TRP  MET  PHE  VAL  VAL
 ILE  CYS
 Rose et al.
 ALA  TYR  HIS  LEU MET
 TRP  PHE  PHE  ILE  CYS
 Eisenberg et al.
 TYR  CYS  GLY  ALA  MET  TRP  VAL  VAL  PHE  ILE


Mapping this information into projection spaces and creating graphs of amino acid forms in this space has allowed us to predict the membership of proteins into protein structure classes with a high degree of specificity and sensitivity.

2.1 Identification of Compact Structural Hydrophobic Cores

Fig. 1 provides a detailed overview of the methodology for identifying hydrophobic cores based on the key hydrophobic scales from the scales in Table 1. See [5] for details more information.


Fig 1. The Proposed Methodology [1].

2.2 Significance of Conserved Regions

We analyze all conserved residues and compare the structural environment to amino acids in the naturally occurring proteins in the dataset, using packing density, hydrogen bonding, and solvent accessibility to determine the parameters [5]. We discuss the results in Section 3.

2.3 The Identification of Flexibility Regions

The proposed graph based tool is based on the simplicity of the elastic theory [6], which is based on the coordinates of the Cα atoms serving as nodes. The connectivity within the protein structure is represented as a Kirchhoff matrix Γ where R is the distance between the Cα atoms of residues i and j with rc denoting the distance radius threshold (7 Å).


(1)



The equilibrium-correlated fluctuations between two sites can be obtained by finding the inverse of the Kirchhoff matrix and is represented as:

(2)

where kb is the Boltzmann constant, T is the absolute temperature, and γ is a single-parameter harmonic potential that accounts for the fluctuations of a residue about a mean axis. Cross-correlated fluctuations between residues i and j are defined as:


(3)

Participation in correlated movements was used to define functionally important, flexible regions. See [6] for details.

3 Results


Fig 2. Composition of amino acids in conserved residues of the summary graphs compared with the entire protein representative set. The Y-axis is the percentage of amino acids and the X-axis is: a. hydrogen bonding interactions, b. Ooi number in an 8 Å radius around the amino acid, and c. solvent accessible contact area as a percentage of residue accessibility.

Download

The PC4 tutorial and executable file are available for download. Download Tutorial EXE.

References

  1. Genick, U.K., Borgstahl, G.E., Ng, K., Ren, Z., Pradervand, C., Burke, P.M., Srajer, V., Teng, T.Y., Schildkamp, W., McRee, D.E., Moffat, K., Getzoff, E.D.: Structure of a protein photocycle intermediate by millisecond time-resolved crystallography. Science 275, 1471–1475 (1997).
  2. Volkman, B.F., Lipson, D., Wemmer, D.E., Kern, D.: Two-state allosteric behavior in a single-domain signaling protein. Science 291, 2429–2433 (2001). 
  3. Gerstein, M., Lesk, A.M., Chothia, C.: Structural mechanisms for domain movements. Biochemistry 33, 6739–6749 (1994).
  4. Venkatarajan, M.S., Braun, W.: New Quantitative Descriptors of Amino Acids Based on Multi Dimensional Scaling of a Large Number of Physical-chemical Properties, Journal of Molecular Modeling 7, 445—453 (2001).
  5. Chowriappa, P., Dua, S., Kanno, J., Thompson H.W.: Protein Structure Classification Based on Conserved Hydrophobic Residues. IEEE/ACM TCBB 99, 5555 (2008).
  6. Gu, J., Gribskov, M., Bourne, P.E.: Wiggle—Predicting Functionally Flexible Regions from Primary Sequence. PLoS Comput. Biol. 2, e90 (2006).
spacer
This site is maintained by the Data Mining Research Laboratory. Webmaster: Alan E. Alex & Image Master: Pradeep Chowriappa
spacer