Dr. Sumeet Dua

Max P. & Robbie L. Watson Eminent Scholar Chair

  • Full Screen
  • Wide Screen
  • Narrow Screen
  • Increase font size
  • Default font size
  • Decrease font size

Vinay Manava (2003)

E-mail Print PDF

Integrating Image and Text for Heterogeneous Data Mining in Biomedical Informatics; MS-CS Practicum; Student: Vinay Manava (2003)

Biomedical Informatics is the science of managing, mining, and interpreting information from data originating in biological and biomedical domains. The problem of heterogeneous data mining deals with the computational challenges of searching multimedia data in a unified computational framework that can answer the similarity queries of data mining by accurate and efficient means. The advances in data collection methodologies have generated large data-warehouses, in an assortment of application domains, including, but not limited to biomedical and multimedia databases. Heterogeneous data indexing has proven to be a valuable tool for complex data mining in large data domains which are inherently semi-structured in nature. We propose a solution to integrate the feature vectors of image and text by cooperatively representing them in a multidimensional spatial data structure, which has previously exhibited superior search performance in image database domains. We have evaluated the results of content-based similarity queries on the indexing schema independently in images and textual domains. We have then studied and represented the effect of the choice of similarity metric on the similarity queries. We then propose an indexing schema that integrates the feature vectors of text and images to answer integrated queries on the unified heterogeneous data space. An added advantage of the proposed methodology is embodied by the fact that a textual feature vector can query a heterogeneous database to retrieve text and images as query results. This feature vector solves the problem of wasted time individually querying each data-domain separately and sequentially scanning the integrated database for similarity results. The proposed methodology is time and space efficient, and is capable of answering complex heterogeneous data mining queries to find sound applications in biomedical and clinical domains.

You are here: Research Student Thesis