Worcester Polytechnic Institute Electronic Theses and Dissertations Collection

Title page for ETD etd-0506102-113510


Document Typethesis
Author NameDeschler, Kurt W
URNetd-0506102-113510
TitleMASS: A Multi-Axis Storage Structure for Large XML Documents
DegreeMS
DepartmentComputer Science
Advisors
  • Elke A. Rundensteiner, Advisor
  • Carolina Ruiz, Reader
  • Micha Hofri, Department Head
  • Keywords
  • XML
  • path expression
  • axis
  • order
  • indexing
  • inlined
  • compression
  • XPath
  • lossless
  • Date of Presentation/Defense2002-04-29
    Availability unrestricted

    Abstract

    Due to the wide acceptance of the Word Wide Web Consortium (W3C) XPath language specification, native indexing for XML is needed to support path expression queries efficiently. XPath describes the different document tree relationships that may

    be queried as a set of axes. Many recent proposals for XML indexing focus on accelerating only a small subset of expressions possible using these axes. In particular, queries by ordinal position and updates that alter document structure are

    not well supported. A more general indexing solution is needed that not only offers efficient evaluation of all of the XPath axes, but also allows for efficient document update.

    We introduce MASS, a Multiple Axis Storage Structure, to meet the performance challenge posed by the XPath language. MASS is a storage and indexing solution for large XML documents that eliminates the need for external secondary storage. It is designed around the XPath language, providing efficient interfaces for evaluating all

    XPath axes. The clustered organization of MASS allows several different axes to be evaluated using the same index structure. The clustering, in conjunction with an internal compression mechanism exploiting specific XML characteristics, keep the

    size of the structure small which further aids efficiency. MASS introduces a versatile scheme for representing document node relationships that always allows for efficient updates. Finally, the integration of a ranked B+ tree allows MASS to efficiently evaluate XPath axes in large documents.

    We have implemented MASS in C++ and measured the performance of many different XPath expressions and document updates. Our experimental evaluation illustrates that MASS exhibits excellent performance characteristics for both queries and updates and scales well to large documents, making it a practical solution for

    XML storage. In conjunction with text indexing, MASS provides a complete solution from XML indexing.

    Files
  • deschler.pdf

  • Browse by Author | Browse by Department | Search all available ETDs

    [WPI] [Library] [Home] [Top]

    Questions? Email etd-questions@wpi.edu
    Maintained by webmaster@wpi.edu