Document Type dissertation Author Name Chen, Songting Email Address chenst at cs.wpi.edu URN etd-122005-193617 Title Efficient Incremental View Maintenance for Data Warehousing Degree PhD Department Computer Science Advisors Elke A. Rundensteiner, Advisor Carolina Ruiz, Committee Member Murali Mani, Committee Member Latha S. Colby, Committee Member Keywords View Matching View Maintenance Materialized View Data Warehouse Information Integration Date of Presentation/Defense 2005-09-06 Availability unrestricted Abstract
Data warehousing and on-line analytical processing (OLAP) are essential
elements for decision support applications. Since most OLAP queries are
complex and are often executed over huge volumes of data, the solution in
practice is to employ materialized views to improve query performance.
One important issue for utilizing materialized views is to maintain the
view consistency upon source changes. However, most prior work focused on
simple SQL views with distributive aggregate functions, such as SUM and COUNT.
This dissertation proposes to consider broader types of views than previous
work. First, we study views with complex aggregate functions such as variance
and regression. Such statistical functions are of great importance in practice.
We propose a workarea function model and design a generic framework to tackle
incremental view maintenance and answering queries using views for such functions.
We have implemented this approach in a prototype system of IBM DB2. An
extensive performance study shows significant performance gains by our techniques.
Second, we consider materialized views with PIVOT and UNPIVOT operators.
Such operators are widely used for OLAP applications and for querying
sparse datasets. We demonstrate that the efficient maintenance of views
with PIVOT and UNPIVOT operators requires more generalized
operators, called GPIVOT and GUNPIVOT. We formally define and prove the
query rewriting rules and propagation rules for such
operators. We also design a novel view maintenance framework for applying
these rules to obtain an efficient maintenance plan. Extensive
performance evaluations reveal the effectiveness of our techniques.
Third, materialized views are often integrated from multiple data sources.
Due to source autonomicity and dynamicity, concurrency may occur
during view maintenance. We propose a generic concurrency control
framework to solve such maintenance anomalies.
This solution extends previous work in that it solves the anomalies under
both source data and schema changes and thus achieves full source
autonomicity. We have implemented this technique in a data warehouse
prototype developed at WPI. The extensive performance study shows that
our techniques put little extra overhead on existing concurrent data
update processing techniques while allowing for this new functionality.
Files schen.pdf
Browse by Author | Browse by Department | Search all available ETDs
Questions? Email etd-questions@wpi.edu