Worcester Polytechnic Institute Electronic Theses and Dissertations Collection

Title page for ETD etd-013006-135526


Document Typedissertation
Author NameSu, Hong
Email Address suhong at cs.wpi.edu
URNetd-013006-135526
TitleAutomaton Meet Algebra: A Hybrid Paradigm for Efficiently Processing XQuery over XML Stream
DegreePhD
DepartmentComputer Science
Advisors
  • Elke A. Rundensteiner, Advisor
  • Murali Mani, Committee Member
  • George Heineman, Committee Member
  • Mitch Cherniack, Committee Member
  • Keywords
  • Database
  • XML Stream
  • Algebra
  • Automaton
  • Query Optimization
  • Date of Presentation/Defense2006-01-20
    Availability unrestricted

    Abstract

    XML stream applications bring the challenge of efficiently processing queries on sequentially accessible token-based data streams. The automaton paradigm is naturally suited for pattern retrieval on tokenized XML streams, but requires patches for implementing the filtering or restructuring functionalities common for the XML query languages. In contrast, the algebraic paradigm is well-established for processing self-contained tuples. However, it does not traditionally support token inputs. This dissertation proposes a framework called Raindrop, which accommodates both the automaton and algebra paradigms to take advantage of both.

    First, we propose an architecture for Raindrop. Raindrop is an algebra framework that models queries at different abstraction levels. We represent the token-based automaton computations as an algebraic subplan at the high level while exposing the automaton details at the low level. The algebraic subplan modeling automaton computations can thus be integrated with the algebraic subplan modeling the non-automaton computations.

    Second, we explore a novel optimization opportunity. Other XML stream processing systems always retrieve all the patterns in a query in the automaton. In contrast, Raindrop allows a plan to retrieve some of the pattern retrieval in the automaton and some out of the automaton. This opens up an automaton-in-or-out optimization opportunity. We study this optimization in two types of run-time environments, one with stable data characteristics and one with fluctuating data characteristics. We provide search strategies catering to each environment. We also describe how to migrate from a currently running plan to a new plan at run-time.

    Third, we optimize the automaton computations using the schema knowledge. A set of criteria are established to decide what schema constraints are useful to a given query. Optimization rules utilizing different types of schema constraints are proposed based on the criteria. We design a rule application algorithm which ensures both completeness (i.e., no optimization is missed) and minimality (i.e., no redundant optimization is introduced). The experimentations on both real and synthetic data illustrate that these techniques bring significant performance improvement with little overhead.

    Files
  • hongsu.pdf

  • Browse by Author | Browse by Department | Search all available ETDs

    [WPI] [Library] [Home] [Top]

    Questions? Email etd-questions@wpi.edu
    Maintained by webmaster@wpi.edu