Department(s):

Data Science

Congratulations to Dr. Chuan Lei on a successful Ph.D. Defense.

Ph.D. Dissertation Defense

WPI - Computer Science

COMMITTEE MEMBERS:

  • Professor Elke Rundensteiner, Ph.D. Advisor, WPI - Computer Science

  • Professor Mohamed Y. Eltabakh, Ph.D. Co-Advisor, WPI - Computer Science

  • Professor Emmanuel Agu, Ph.D., WPI - Computer Science

  • Dr. Nesime Tatbul, Ph.D., Intel Lab/MIT CSAIL

ABSTRACT:

MapReduce computing paradigm and its open-source implementation Hadoop is one of the most popular and widely used technologies. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving high-volume data, have become a bedrock component in most analytics applications.

First, I propose a novel scalable infrastructure called Redoop that treats recurring query over big evolving data as first class citizens during query processing. Redoop offers innovative window-aware optimization techniques for recurring query execution including adaptive window-aware data partitioning, window-aware task scheduling, and inter-window caching mechanisms.

Using this platform, I then built a scalable multi-query sharing engine tailored for recurring workloads, called Helix. Helix deploys sliced window techniques to create sharing opportunities. Helix introduces a cost/benefit model for creating a sharing plan among the recurring queries. A scheduling strategy for executing them to maximize the SLA satisfaction is also featured.

Third, I designed an appropriate query engine for recurring workloads, called Faro. Faro profides a deadline-aware sampling strategy that builds samples from the original data with reduced sample sizes. Faro offers adaptive resource allocation strategies that maximally improve the approximate results while meeting the queries' response time requirements.