Worcester Polytechnic Institute Electronic Theses and Dissertations Collection

Title page for ETD etd-0505104-120617


Document Typethesis
Author NameSutherland, Timothy Michael
URNetd-0505104-120617
TitleD-CAPE: A Self-Tuning Continuous Query Plan Distribution Architecture
DegreeMS
DepartmentComputer Science
Advisors
  • Elke A. Rundensteiner, Advisor
  • Murali Mani, Reader
  • Keywords
  • data stream
  • data management
  • continuous query
  • distributed
  • Date of Presentation/Defense2004-04-29
    Availability unrestricted

    Abstract

    The study of systems for querying data streams, coined Data Stream Management Systems (DSMS), has gained in popularity over

    the last several years. This new area of research for the database community includes studies in areas such as Sensor Networks, Network Intrusion, and monitoring data such as Medicine, Stock, or Weather feeds. With this new popularity comes increased

    performance expectations, with increased data sizes and speed and

    larger more complex query plans as well as high volumes of possibly small queries. Due to the finite resources on a

    single query processor, future Data Stream Management Systems must distribute their workload to multiple query processors, working

    together in a synchronized manner.

    This thesis discusses a new Distributed Continuous Query System

    (D-CAPE) developed here at WPI that has the ability to distribute

    query plans over a large cluster of machines. We describe the architecture

    of the new system, policies for query plan distribution to improve

    overall performance, as well as techniques for self-tuning query plan

    re-distribution. D-CAPE is designed to be as

    flexible as possible for future research. We include a multi-tiered

    architecture that scales to a large number of query processors. D-CAPE has also been designed to minimize the cost of the communications network by bundling synchronization messages, thus minimizing packets sent between query processors. These messages are also incremental at run-time to aid in minimizing the communication cost of D-CAPE. The architecture allows for the flexible incorporation of different distribution algorithms and operator reallocation policies.. D-CAPE provides an operator reallocation algorithm that is able to seamlessly move an operator(s) across any query processors in our computing cluster. We do so by creating ``pipes" between query processors to allow the data streams to flow, and then filling these pipes with data streams once execution begins. Operator redistribution is accomplished by systematically reconnecting these pipes as to not interrupt the data flow.

    Experimental evaluation using our real prototype system (not just simulation) shows that

    executing a query plan distributed over multiple machines causes no

    more overhead than processing it on a single centralized query processor; even for rather

    lightly loaded machines. Further, we find that distributing a query plan among a cluster of query processors can

    boost performance up to twice that of a centralized DSMS. We conclude that

    the limitation of each query processor within the distributed

    network of cooperating processors is not primarily in the volume of the data nor

    the number of query operators, but rather the number of data connections per processor and the allocation of the stateful

    and thus most costly operators. We also find that the overhead of

    distributing query operators is very low, allowing for a potentially frequent dynamic

    redistribution of query plans during execution.

    Files
  • tims.pdf

  • Browse by Author | Browse by Department | Search all available ETDs

    [WPI] [Library] [Home] [Top]

    Questions? Email etd-questions@wpi.edu
    Maintained by webmaster@wpi.edu