Document Type thesis Author Name El-Sayed, Maged F URN etd-0130104-115506 Title An Efficient and Incremental System to Mine Contiguous Frequent Sequences Degree MS Department Computer Science Advisors Carolina Ruiz, Advisor Elke A. Rundensteiner, Reader Michael Gennert, Department Head Keywords Frequent Patterns Traversal Patterns Date of Presentation/Defense 2004-01-30 Availability unrestricted Abstract
Mining frequent patterns is an important component of many
prediction systems. One common usage in web applications is the
mining of users' access behavior for the purpose of predicting and
hence pre-fetching the web pages that the user is likely to visit.
Frequent sequence mining approaches in the literature are often
based on the use of an Apriori-like candidate generation strategy,
which typically requires numerous scans of the potentially huge
sequence database. In this paper we instead introduce a more
efficient strategy for discovering frequent patterns in sequence
databases that requires only two scans of the database. The first
scan obtains support counts for subsequences of length two. The
second scan extracts potentially frequent sequences of any length
and represents them as a compressed frequent sequences tree
structure (FS-tree). Frequent sequence patterns are then mined
from the FS-tree. Incremental and interactive mining
functionalities are also facilitated by the FS-tree. As part of
this work, we developed the FS-Miner, an system that discovers
frequent sequences from web log files. The FS-Miner has the
ability to adapt to changes in users' behavior over time, in the
form of new input sequences, and to respond incrementally without
the need to perform full re-computation. Our system also allows
the user to change the input parameters (e.g., minimum support and
desired pattern size) interactively without requiring full
re-computation in most cases.
We have tested our system using two different data sets, comparing
it against two other algorithms from the literature. Our
experimental results show that our system scales up linearly with
the size of the input database. Furthermore, it exhibits excellent
adaptability to support threshold decreases. We also show that the
incremental update capability of the system provides significant
performance advantages over full re-computation even for
relatively large update sizes.
Files El-Sayed.pdf
Browse by Author | Browse by Department | Search all available ETDs
Questions? Email etd-questions@wpi.edu