Dr. Sumeet Dua

Max P. and Robbie L. Watson Eminent Scholar Chair

  • Full Screen
  • Wide Screen
  • Narrow Screen
  • Increase font size
  • Default font size
  • Decrease font size

Sunil Gokak (2004)

E-mail Print PDF

A Visual Data Mining Framework for Similarity Search in Large Sequential Databases; MS-CS Practicum; Student: Sunil Gokak (2004)

Signals that are dependent on time occur very commonly in our day to day lives and surroundings. Signals may represent acoustic information, stock market data, and biological and clinical data sets, which are dependent upon time. Although sampling and harmonic analysis of signals can enable efficient signal analysis in the frequency domain, similarity search for data mining in signal analysis can address issues like the prediction of values, the classification of items, piece-wise correlation estimation, and the unsupervised clustering of time dependent data sets. Efficient indexing techniques and algorithms are developed in this domain to address the curse of dimensionality evident in this data. This work has built a value-added Visual Data Mining Framework that would enable users, through a web-based interface, to analyze time series data by conveniently interacting with efficient data mining algorithms. The application primarily aims to address descriptive data mining, which would be useful from the point of view of understanding the mechanics governing long term and short term fluctuations in large time-series data. An efficient webcrawler is also designed to access online time-series data through an interactive and user-controlled graphical interface. Additionally, the application demonstrates the fusion of Java technology with Matlab for real-time data interoperability between the two programming tools. The application can assist a non-data mining expert to employ efficient similarity search algorithms for applications in areas including inventory planning and material management, sales forecasting, demand forecasting, market research / business conditions, biomedical signal analysis and classification, protein data mining, and functional classification of genes.

You are here: Research Student Thesis