Skip to main content

WPI _ Computer Science , PhD Defense , Noura Alghamdi " Big Time Series Analytics Using a Distributed Infrastructure"

Wednesday, August 10, 2022
12:00 pm to 1:00 pm

 

Noura Alghamdi

 

Ph.D. Dissertation Defense

Wednesday, August 10, 2022

 

Time:12 p.m. - 1:00 p.m. 

Location: Fuller Labs, Beckett Conference Room 

 

 

Committee Members:

 

Dr. Elke A. Rundensteiner, Professor, WPI. Advisor

Dr. Mohamed Y. Eltabakh, Associate Professor, WPI. Co-Advisor

 

Dr. George T. Heineman, Associate Professor, WPI

Dr. Mirek Riedewald, Associate Professor, Northeastern University

 

 

Abstract:

Over the last few decades, the explosion of the internet-of-things (IoT) has led to the unprecedented growth of time series data. This leads to three phenomena. First, the data is generated at an explosive speed and volume - resulting in big time series. Second, the lifespan of data generated may span months or years – producing exceedingly long time series. Third, devices often produce a sequence of intermittent time series separated by time gaps associated with the same device, i.e., interconnected time series. Systems to scalable process such complex data must leveral indexing techniques. Unfortunately, the state-of-the-art indexing techniques lack both the required functionality, scalability, and desired accuracy to process such big, long, interconnected time series data. In this dissertation, we thus focus on the following open research themes.

1. Indexing and Querying Long Time Series. We propose a lightweight distributed indexing framework, called ChainLink, that supports approximate similarity search for full and subsequence matching over TB-scale datasets of time series objects composed of several 100s of data points. As a foundation of ChainLink, we design a novel hashing technique that as we experimentally demonstrate ensures a compact structure, efficient search and comparison, and efficient index construction.

2. Indexing and Querying Interconnected Time Series Objects. We design a new data model called Time Series Compound (or, TSC) to handle interconnected time series. We tackle the unique challenges that arise when managing, querying, and analyzing repositories of big TSC objects. Our distributed indexing infrastructure features a TSC-aware representation technique, TSC similarity semantics, and efficient processing strategies. Experimental studies show the effectiveness and efficiency of our solution.

3. Advanced Interconnected Time Series Analytics. As the TSC model inherently captures objects’ evolution over time, we propose a new class of queries, called convergence queries. We define their semantics, and design innovative query processing strategies for their scalable execution over TB-scale datasets. Experiments confirm that our strategies are not only significantly faster than the base solution on TB-scale datasets but also consistently achieve excellent accuracy.

DEPARTMENT(S):