Dr. Sumeet Dua

Max P. & Robbie L. Watson Eminent Scholar Chair

  • Full Screen
  • Wide Screen
  • Narrow Screen
  • Increase font size
  • Default font size
  • Decrease font size

Suyang Zhang (2004)

E-mail Print PDF

Fast Web Usage Mining for Automatic Web Personalization; MS-CS Thesis; Student: Suyang Zhang (2004)

The wealth of information on the World Wide Web (WWW) has spurred tremendous interest in the areas of knowledge discovery and data mining. Most web structures are large and complicated, so web users often miss the goal of their inquiry and suffer from information overload due to constantly updating and rapidly evolving data-spaces. On the other hand, the capacity of an individual to access knowledge and digest information is mostly fixed. To help web users access information efficiently and accurately, it is now necessary to anticipate the needs of those users. Currently, it is a trend for large websites to recommend personalized information to particular web users by extracting models of navigational behaviors, by means of web personalization. Aiming at offering a personalized view of the web services to web users, web mining has gained great momentum in both research and commercial areas. Web mining, particularly web usage mining, is considered a main component of efficacious web personalization system.
In this thesis, we describe an automatic and effective personalization system, which uses web log files for data preparation and clustering, to mine the data using an offline computational methodology, followed by an extraction of a model to provide dynamic and real time online recommendations. The data-preparation task processes web access log files using heuristics methods. We describe effective data mining techniques based on a distance-based similarity measure and a hyper graph-based clustering to obtain a uniform representation for transaction results. The developed recommendation engine computes a recommendation set for the current user session. It then returns the personalized pages, which is embed Microsoft’s Component Object Model (COM) to dynamically calculate the matching score.

You are here: Research Student Thesis