Dr. Sumeet Dua

Max P. & Robbie L. Watson Eminent Scholar Chair

  • Full Screen
  • Wide Screen
  • Narrow Screen
  • Increase font size
  • Default font size
  • Decrease font size

Manish K. Gupta (2006)

E-mail Print PDF

A Framework or Studying the Efficacy of Parameters Accounting towards Solutions in Data Mining; MS-CS Thesis, Student: Manish K. Gupta (2006).

Classification and prediction are the primary goals of data mining. Classification problems are primarily of missing data, feature redundancy, or high data dimensionality. These problems can cause high inaccuracies, and hence have received a lot of attention in data mining and machine learning communities. In this research, we aim to study the effects of removing the Gaussian attributes in the dataset on the performance of the classifier accounting towards data reduction and study the effects of Independent Component Analysis on the classifier performance accounting towards the feature extraction problem.
We run the classification experiments using a set of selected classifiers on a set of datasets chosen from the UCI data archive. Primarily, we run the chosen classification algorithms on the datasets in the first step. We then perform the statistical tests to determine the Gaussian attributes and study the effects of removing these attributes, moving toward data reduction, on the performance of the classifiers. Then, performing the independent component analysis in order to unearth independent components which are as “independent” as possible in the dataset, moving toward feature extraction, we run all the classification algorithms on these independent components and study the effects in the classification performance of the classifiers being used. Considering the JPSO dataset as a special case, we then classify the JPSO dataset and study the classification performance of all the studied classifiers on the JPSO dataset and also study the effects of removing Gaussian attributes and ICA on the classification models for the JPSO dataset.
We evaluated our framework by running a set of seven different classification algorithms on six datasets chosen from UCI data archive. The elucidation of the data reduction problem by removing Gaussian attributes and the clarification of the feature extraction problem by performing Independent Component Analysis on the datasets does not boost the classification performance of the classifiers. The classification accuracy decreases at an average of 3.34% after removing the Gaussian attributes, and at an average of 5% after performing ICA, though better accuracies than previous benchmarks can be obtained after successfully reducing the data and extracting features as independently as possible.

You are here: Research Student Thesis