Worcester Polytechnic Institute Electronic Theses and Dissertations Collection

Title page for ETD etd-021108-150542


Document Typethesis
Author NameMUKHERJI, ABHISHEK
Email Address abhishek.mukherji at gmail.com
URNetd-021108-150542
TitleSNIF TOOL - Sniffing for Patterns in Continuous Streams
DegreeMS
DepartmentComputer Science
Advisors
  • Professor Elke A. Rundensteiner, Advisor
  • Professor David C. Brown, Reader
  • Professor Michael A. Gennert, Department Head
  • Keywords
  • continuous queries
  • streaming time-series
  • similarity queries
  • pattern matching
  • Date of Presentation/Defense2008-01-22
    Availability unrestricted

    Abstract

    Recent technological advances in sensor networks and mobile devices give rise to new challenges in processing of live streams. In particular, time-series sequence matching, namely, the similarity matching of live streams against a set of predefined pattern sequence queries, is an important technology for a broad range of domains that include monitoring the spread of hazardous waste and administering network traffic. In this thesis, I use the time critical application of monitoring of fire growth in an intelligent building as my motivating example. Various measures and algorithms have been established in the current literature for similarity of static time-series data. Matching continuous data poses the following new challenges: 1) fluctuations in stream characteristics, 2) real-time requirements of the application, 3) limited system resources, and, 4) noisy data. Thus the matching techniques proposed for static time-series are mostly not applicable for live stream matching.

    In this thesis, I propose a new generic framework, henceforth referred to as the n-Snippet Indices Framework (in short, SNIF), for discovering the similarity between a live stream and pattern sequences. The framework is composed of two key phases: (1.) Off-line preprocessing phase: where the pattern sequences are processed offline and stored into an approximate 2-level index structure; and (2.) On-line live stream matching phase: streaming time-series (or the live stream) is on-the-fly matched against the indexed pattern sequences. I introduce the concept of n-Snippets for numeric data as the unit for matching. The insight is to match small snippets of the live stream against prefixes of the patterns and maintain them in succession. Longer the pattern prefixes identified to be similar to the live stream, better the confirmation of the match. Thus, the live stream matching is performed in two levels of matching: bag matching for matching snippets and order checking for maintaining the lengths of the match. I propose four variations of matching algorithms that allow the user the capability to choose between the two conflicting characteristics of result accuracy versus response time.

    The effectiveness of SNIF to detect patterns has been thoroughly tested through extensive experimental evaluations using the continuous query engine CAPE as platform. The evaluations made use of real datasets from multiple domains, including fire monitoring, chlorine monitoring and sensor networks. Moreover, SNIF is demonstrated to be tolerant to noisy datasets.

    Files
  • thesis.pdf

  • Browse by Author | Browse by Department | Search all available ETDs

    [WPI] [Library] [Home] [Top]

    Questions? Email etd-questions@wpi.edu
    Maintained by webmaster@wpi.edu