Data Mining Research Laboratory (DMRL) was established in 2002, under the direction of Dr. Sumeet Dua. The laboratory is composed of Postdoctoral researchers, graduate and undergraduate students mostly from Computer Science and related disciplines collaboratively work in the DMRL under the expert guidance of Dr. Dua. Close collaborations with other senior researchers provide the students excellent opportunities to work in a multi-disciplinary and multi-institutional research setting. DMRL members are able to leverage cutting-edge research resources available and an encouraging environment for their research.
Research in the DMRL is focused in three main areas of Computer Science: Knowledge Discovery in High Dimensional Data (Data Mining), Bioinformatics and Biomedical Informatics. Due to the diversified backgrounds of the group members, a wide-range of research problems in these areas are investigated.
Knowledge Discovery in High Dimensional Data Analysis (Data Mining): Data Mining is defined as the design and development of computational frameworks for the extraction of interesting (non-trivial, implicit, previously unknown, and potentially useful) patterns or knowledge from a huge amount of data. Our interests in these areas specifically include the following:
Dimensionality reduction algorithms,
Association-rule discovery algorithms,
Supervised and unsupervised classification algorithms,
Sequential pattern discovery and applications,
Design and development of spatio-temporal data structures,
Bioinformatics:
A biological database is a large, unorganized body of persistent life sciences data, usually associated with an algorithmic computational solution designed to update, query, and retrieve components of the data stored within the system. Protein, gene expression data, and DNA sequences are the primary data that reside in these scientific databases, while various related data such as annotations, mutant information and physico-chemical characteristics are often added as well. All these systems are developed with information retrieval mechanisms to help biologists solve problems, which in-turn provide exciting computer science problems. Bioinformatics is defined as the science of storing, extracting, analyzing, interpreting, and utilizing information from biological databases. An exceedingly huge amount of biological data is being generated and deposited either as semi-structured or unstructured data. Since the traditional focus has been the development of techniques for analyzing relational or transactional schema, the extraction or discovery of useful information from this inconsistently organized data from biological domain, by mere manual analysis, makes the process intricate and infeasible. Hence, the necessity for the development of computational techniques equipped to deal with vast and high-dimensional data, help discover constructive information and would pave a way for further biological exploratory analysis. With this motivation, we focus on designing data mining methodologies to analyze biological data repositories for finding associations and biologically significant information. Though it is biological data that has to be scrutinized, knowledge of biology is not a requirement, as our focus is on the design and development of better computations techniques that involve analysis of high-dimensional data, which is an alpha-numeric manifestation of the genome. Our method involves the analysis of high-dimensional data, which is an alpha-numeric manifestation of the genome. our specific interests in this area include the following:
Gene expression data analysis (using dimensionality reduction and supervised/unsupervised classification methods, gene–marker recognition and mining, and analysis of gene-expressional time series)
Protein structure alignment methods, including computational discovery of embedded sequence-structure-functional relationships.
Biomedical Informatics:
Biomedical informatics is the scientific field that deals with the storage, retrieval, sharing and (sub-) optimal use of biomedical information, data and knowledge for problem solving and decision making.
Data related to these domains are characterized by high-dimensionality and implicit embedded knowledge, which a human may fail to accurately capture. We focus on the development of computational data mining methodologies for the analysis of biomedical datasets such as clinical images (retinal, x-rays) for the extraction of useful features or patterns: that aid in early discernment of the disease presence or status, providing rationale for vital conclusions and hence helping in prevention/cure of the pathology by early diagnosis. Our specific interests in this area include the following:
Design and development of computational frameworks for heterogeneous distributed database integration
Design and development of content-based image retrieval algorithms.
Design and development of image segmentation algorithms for clinical decision support applications.
Identification of pathological features in Biomedical and Clinical images.
More information about the projects can be obtained from our research work page.
The content of this website are solely maintained by members of the DMRL, mainly students. For questions regarding the website, please contact our webmaster. For questions regarding our research endeavors, please contact our Director, Dr. Sumeet Dua.