Worcester Polytechnic Institute Electronic Theses and Dissertations Collection

Title page for ETD etd-042208-194826


Document Typedissertation
Author NameDing, Luping
Email Address lisading at cs.wpi.edu
URNetd-042208-194826
TitleMetadata-Aware Query Processing over Data Streams
DegreePhD
DepartmentComputer Science
Advisors
  • Elke A. Rundensteiner, Advisor
  • Murali Mani, Committee Member
  • George T. Heineman, Committee Member
  • Leonidas Fegaras, Committee Member
  • Keywords
  • metadata
  • constraint
  • data stream
  • continuous query
  • optimization
  • Date of Presentation/Defense2008-04-18
    Availability unrestricted

    Abstract

    Many modern applications need to process queries over potentially infinite data streams to provide answers in real-time. This dissertation proposes novel techniques to optimize CPU and memory utilization in stream processing by exploiting metadata on streaming data or queries. It focuses on four topics: 1) exploiting stream metadata to optimize SPJ query operators via operator configuration, 2) exploiting stream metadata to optimize SPJ query plans via query-rewriting, 3) exploiting workload metadata to optimize parameterized queries via indexing, and 4) exploiting event constraints to optimize event stream processing via run-time early termination.

    The first part of this dissertation proposes algorithms for one of the most common and expensive query operators, namely join, to at runtime identify and purge no-longer-needed data from the state based on punctuations. Exploitations of the combination of punctuation and commonly-used window constraints are also studied. Extensive experimental evaluations demonstrate both reduction on memory usage and improvements on execution time due to the proposed strategies.

    The second part proposes herald-driven runtime query plan optimization techniques. We identify four query optimization techniques, design a lightweight algorithm to efficiently detect the optimization opportunities at runtime upon receiving heralds. We propose a novel execution paradigm to support multiple concurrent logical plans by maintaining one physical plan. Extensive experimental study confirms that our techniques significantly reduce query execution times.

    The third part deals with the shared execution of parameterized queries instantiated from a query template. We design a lightweight index mechanism to provide multiple access paths to data to facilitate a wide range of parameterized queries. To withstand workload fluctuations, we propose an index tuning framework to tune the index configurations in a timely manner. Extensive experimental evaluations demonstrate the effectiveness of the proposed strategies.

    The last part proposes event query optimization techniques by exploiting event constraints such as exclusiveness or ordering relationships among events extracted from workflows. Significant performance gains are shown to be achieved by our proposed constraint-aware event processing techniques.

    Files
  • dissertation.pdf

  • Browse by Author | Browse by Department | Search all available ETDs

    [WPI] [Library] [Home] [Top]

    Questions? Email etd-questions@wpi.edu
    Maintained by webmaster@wpi.edu