Document Type thesis Author Name Jbantova, Mariana G URN etd-050207-222839 Title State Spill Policies for State Intensive Continuous Query Plan Evaluation Degree ME Department Computer Science Advisors Prof. Elke Rundensteiner, Advisor Prof. David Finkel, Reader Keywords continuous query processing adaptation policies partitioned window join operator Date of Presentation/Defense 2007-04-12 Availability unrestricted Abstract
The needs of new modern day applications such as network monitoring systems, telecommunications
data management, web applications, remote medical monitoring applications
and others for near real time results over continuous data streams have spurred the development
of new data management systems called Data Stream Management Systems
(DSMS). Unlike traditional database systems which answer one-time user queries only
after the finite data has been captured on disk, DSMSs provide on-the-fly answers to user
queries as data is arriving at various rates in the form of continuous, potentially infinite
streams of tuples. To meet the timeliness requirements of applications, DSMSs aim to
keep all data in main memory. Thus queries with multiple stateful operators pose a major
strain on memory.
Existing adaptation techniques designed to address this issue are ineffective when
faced with continuous bursts of high data rates. When system load exceeds system capacity,
a DSMS has three options: 1) discard some new data; 2) crash; or 3) spill data
to disk. Only option three allows it to produce delayed, yet accurate and complete query
results. However, this option involves disk access overhead and change in the natural order
of tuples flowing through the query plan tree. As not all stream operators can process
correctly out of order tuples, data spilling may have a negative impact on the quality of
the final results. Moreover, since operators in a query plan are interconnected, changes in
the order of tuple flows inevitably impact the stages of execution of affected downstream
operators such as for example data purging . Data purging is necessary for processing
continuous queries composed of stateful operators. The state of such operators is divided
into finite non-overlapping sets of tuples called windows. Thus, after all the tuples for a
window have been processed and all results output, these tuples can be discarded to free
memory for new data.
To address these issues, we have redesigned the state structure of continuous operators
into smaller, finite, non-overlapping sets of tuples such as partitioned window groups,
which incur less disk-access overhead. Second, we provide for the capability of continuous
operators to correctly process out of order tuples using punctuation pointers. Third,
we design methods for downstream operators to synchronize their processing stages with
those of upstream operators to achieve optimized query plan throughput. Putting these
techniques together, we have designed a consolidated spilling adaptation strategy which
considers all aspects of operators’ inter-connections in a query plan for making optimal
adaptation decisions.
The effectiveness of our integrated approach was empirically tested in a comparative
evaluation study against several alternate spilling adaptation strategies. We conducted
our experiments on CAPE, a DSMS developed at WPI, using different types of query
plans composed of multiple partitioned window join operators. Our experiments prove
that despite the higher overhead of a more synchronized adaptation approach, our consolidated
strategy provides better query plan performance and higher plan throughput during
periods of continuous bursts of high data rates.
Files jbantova_thesis.pdf
Browse by Author | Browse by Department | Search all available ETDs
Questions? Email etd-questions@wpi.edu