Dr. Sumeet Dua

Max P. and Robbie L. Watson Eminent Scholar Chair

  • Full Screen
  • Wide Screen
  • Narrow Screen
  • Increase font size
  • Default font size
  • Decrease font size

Harpreet Singh (2010)

E-mail Print PDF

Associative Pattern Mining for Supervised Learning

The Internet era has revolutionized computational sciences and automated data collection techniques, have made large amounts of previously inaccessible data available and, has consequently broadened the scope of exploratory computing research. Data Mining has recently gained renewed importance because of its ability to analyze and discover previously unknown, hidden, and useful associative knowledge from high-dimensional spatio-temporal datasets. Frequent pattern analysis in data mining has an ability to find associative relationships among the parts of data, thereby aiding a type of supervised discovery known as “associative learning”.  

The purpose of this dissertation research is two-fold: to develop and demonstrate supervised associative learning in non-temporal data for multi-class classification and to develop a new frequent pattern mining algorithm for temporal data which alleviates the current challenges in analyzing this data for knowledge discovery. To allow the applicability of associative relationships for classification their discriminatory power has to be algorithmically calculated and calibrated. While it is well known that multiple sets of features work better for classification, we claim that the isomorphic relationships among the features work even better and, therefore, can be used as higher order features. We exploit these relationships as input features for classification instead of using the underlying raw features. The next part of this dissertation focuses on building a new classifier using associative relationships as a basis for the multi-class classification problem. Most of the existing associative classifiers derive “class constrained rules”, which suffer from low support and significance. We argue that this class constrained representation schema lacks critical discriminatory information that is necessary for many multi-class classification problems. Further, most existing works use either the intra-class or inter-class importance of the association rules, both of which sets of techniques offer empirical benefits. We hypothesize that both intra-class and inter-class variations are important for fast and accurate multi-class classification. We also present a novel weighted association rule-based classification mechanism that uses frequent relationships among raw features from an instance as the basis for classifying the instance into one of the many classes. The relationships are weighted according to both their intra-class and inter-class importance.

The final part of this dissertation concentrates on mining time varying data and proposed a new algorithm for mining “inter-transaction association rules”. While most of the existing work transform the time varying data into static format and then use multiple scans over the new data to extract patterns, we present a unique index-based algorithmic framework for inter-transaction association rule mining. Our proposed technique requires only one scan of the original database and offers multi-fold gains in performance. Further, the proposed technique can also provide the location information of each extracted pattern. Mathematical induction is used to prove that the new representation scheme captures all underlying frequent relationships offering completeness. Results are compared with the existing algorithms in the area to demonstrate significance and performance gains.

 

You are here: Research Student Thesis