Skip to main content

WPI - Computer Science Department , PhD Proposal Defense Noura Alghamdi " Big Time Series Analytics Using a Distributed Infrastructure"

Thursday, May 06, 2021
3:00 pm to 4:00 pm

 

Noura Alghamdi

PhD Candidate

WPI – Computer Science

Thursday, May 6, 2021
Time: 3:00 p.m. - 4:00 p.m.

Zoom Link: https://wpi.zoom.us/j/93399539838

Committee Members:
Dr. Elke A. Rundensteiner, Professor, WPI. Advisor
Dr. Mohamed Y. Eltabakh, Associate Professor, WPI. Co-Advisor
Dr. George T. Heineman, Associate Professor, WPI.
Dr. Mirek Riedewald, Associate Professor, Northeastern University.

Abstract:

Given the prominence of big time series, scalable solutions for processing, querying, and mining complex time series that leverage distributed compute infrastructures become a necessity. The state-of-the-art techniques over big, long, intermittent time series data lack both the required scalability and desired accuracy. Within this scope, we focus on the following themes:

1. Indexing and Matching Long Time Series. We demonstrated that the combination of big time series data and long query sequences introduces real unsolved challenges.

Thus, we proposed ChainLink, a scalable distributed indexing framework. As a foundation of ChainLink, we design a novel hashing technique, called (SPS), that successfully tackles the problem.

 ChainLink is superior to existing technique by orders of magnitudes in terms of index construction overheads.

2. Indexing and Matching Intermittent Time Series Objects. Objects often produce a sequence of intermittent time series associated with the same object. We introduce the first data model, called Time Series Compound (TSC), to explicitly model intermittent time series data. We then propose Sloth, a scalable distributed indexing framework to index and query large TSC datasets. Besides, we design a novel compact representation, called Quantized SAX-Based Histogram, that offers resiliency to misalignments. Our experiments validate that Sloth substantially improves the query accuracy.

 3. Advanced Time Series Compounds Analytics. The goal of this theme is to build on core Sloth infrastructure to become an end-to-end system that supports analytics over big TSC data sets. We plan to conduct experimental performance studies on large-scale datasets to evaluate both the effectiveness and efficiency of this extended Sloth system.

DEPARTMENT(S):