Dr. Sumeet Dua

Max P. & Robbie L. Watson Eminent Scholar Chair

  • Full Screen
  • Wide Screen
  • Narrow Screen
  • Increase font size
  • Default font size
  • Decrease font size

Afolabi Olomola (2010)

E-mail Print PDF

Unsupervised Similarity Mining in High Dimensional Data

With the incredible growth of high dimensional data like micro-array data (gene expression data), the researchers are faced with the challenges of discovering hidden and useful knowledge from such data. Recent advancement in microarray technology has generated a large output of gene expression data that requires application of data mining tools to extract useful biological information such as disease pathways, diagnosis, prognosis and prediction of therapeutic responsiveness from gene expression data. One of the challenges in the analysis of gene expression data is the discovery of localized structures composed of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide valuable information about the biological processes associated with physiological states. Bi-clustering is a well-studied area of unsupervised data mining for simultaneously mining similarity among genes and experimental conditions for the purpose of identifying candidate subsets of conditions that may be associated with cellular processes or subsets of genes that potentially play a role in a given biological process

In this thesis, present an unsupervised bi-clustering method (BiEntropy) that applies information entropy and closed frequent pattern mining to identify co-expressed gene patterns that are relevant across a subset of conditions. Our goal is to discover different forms of local patterns (constant, additive, and overlapping) in gene expression data. To demonstrate our method’s superiority over the existing methods, we apply our method (BiEntropy) using novel discretization schemes on both synthetic and real data.

The experimental results from synthetic and real data show that BiEntropy discovers highly enriched patterns in respect to Gene Ontology(GO) terms, functional motifs, and KEGG pathways.

You are here: Research Student Thesis