Dr. Sumeet Dua

Max P. and Robbie L. Watson Eminent Scholar Chair

  • Full Screen
  • Wide Screen
  • Narrow Screen
  • Increase font size
  • Default font size
  • Decrease font size

Shirin A. Lakhani (2007)

E-mail Print PDF

Protein Structural Classification Using Mining of Frequent Patterns in Concave Protein Surfaces; MS-CS Thesis, Student: Shirin A. Lakhani (2007).

Protein structural classification is an overriding problem in the field of Bioinformatics, and specifically in the in-silico functional annotation of proteins. Classifying proteins based on sequential and structural features using the conventional methods is known to be arduous and inaccurate, partially due to the weak representation of the subunits of the protein that provide its discriminatory behavior. The availability of high dimensional sequence and structure databases has ignited the demand for computational methods that proficiently evaluate the similarity of protein structures and accurately classify them into their respective classes. In recent years, there has been growing interest in classifying proteins using the surface information of a protein. Protein surface regions, specifically concave surfaces provide specialized regions of biological activity. Well-formed concave surface regions are therefore examined to identify any similarity relationship that might be directly related to protein function.
In this thesis, we propose a new association rule based technique using the concave residues and residue parameters of proteins to find the frequent spatial arrangement of residue which is unique to a particular family of proteins. Association rules for all classes of proteins are discovered that satisfy minimum support and minimum confidence constraints for class-level rule discovery and appraisal. Classification Based Association (CBA) rule mining is used to discover frequent patterns that are present on the concave protein surfaces with an aim to discover a small set of rules satisfying minimum support and minimum confidence.
It is empirically observed that association rules have proved to yield better results than other traditional techniques reviewed. We have also discovered and used the item-sets (attribute aggregates of protein surface) or residue parameters that are frequent for a class. Rules that satisfy minimum thresholds are extracted and employed for classification purposes. A query protein is subjected to the method defined to extract the association rules to compare the protein with the rules generated during the training phase. The protein is classified into a structural class whose rules best satisfy its features with enhanced degrees of specificity and sensitivity of protein structural classification.

You are here: Research Student Thesis