Dr. Sumeet Dua

Max P. and Robbie L. Watson Eminent Scholar Chair

  • Full Screen
  • Wide Screen
  • Narrow Screen
  • Increase font size
  • Default font size
  • Decrease font size

Sridhar Reddy Alluri (2006)

E-mail Print PDF

Fractal-based Method for Dimensionality Reduction of Gene Expression Data; MS-CS Thesis, Student: Sridhar Reddy Alluri (2006)

Gene expression analysis based on microarray data has been one of the emerging areas of research in the filed of bioinformatics. One particular application of microarray data is to uncover the molecular variation among cancers. One feature of microarray data is that the relatively small number of samples collected compared to the number of genes per sample. Many dimensionality reduction techniques like principal component analysis and other regression analysis techniques have been published. However, these techniques do not take into consideration the data set’s intrinsic distribution characteristics, which if properly used along with a good clustering technique could provide promising accuracy and performance. The main idea of our technique is to take into consideration the data set’s intrinsic distribution for dimensionality reduction.
We applied the fractal based clustering analysis tool to the problem of dimensionality reduction in microarray data. We tried to find a critical sized subset based on fractal analysis so that it would preserve the intrinsic dimensionality of the data that could be helpful in revealing biologically important information. Our results showed a 97% dimensionality reduction. Additionally, we were able to calculate the intrinsic dimension of the data and measure data distribution. Another observable advantage was the characterization of the spread of the data, which can be used to aid different data mining tasks. We checked the accuracy of our method by clustering the original and reduced datasets using hierarchical clustering. The observation revealed that, most of the class information was retained in the reduced critical sized subset.  The original dataset provided a clustering accuracy of 82%, while the reduced dataset offered 75% accuracy in retaining the samples in their respective classes. Our framework not only provided us a good dimensionality reduction technique but could also be useful to biologists in revealing important biological information with the intrinsic fractal dimension.

You are here: Research Student Thesis