Tampereen yliopistoInformaatiotieteiden tiedekunta
Informaatiotutkimuksen laitos

Project: repXML – another look at XML representation in heterogeneous XML environments

Description

The origin of XML as a markup language for documents led to their modeling as directed labeled graphs (ordered trees). After that the use of XML was rapidly extended to other purposes, especially to data format for exchanging and sharing data in Web. However the modeling of XML documents as directed labeled graphs has led to several undesirable features such as complex path-oriented XML query languages, problems in supporting both document-centric and data-centric manipulation in an appropriate way, and a mismatch between conventional (e.g. relational) databases and XML data. In order to remove such disadvantages we develop a novel representation for XML documents. We have developed a constructor algebra which results in representing XML data as XML relations with the schema D(C, T, I) where D is the name of an XML document, C describes each meaningful component – an attribute name, an element name, an attribute/element value or a word in an attribute/element value consisting of several words - in D, T describes its type and I its index used for identifying its exact location in D.

In the present project we also demonstrate how this representation supports the treatment of XML documents whose contents and structures are unknown to the user. The representation is further used to solve several kinds of XML processing problems, e.g., in XML information retrieval, OLAP data cube construction, etc.

Duration

2005 - 2008

Researchers

Prof. Timo Niemi (Dept. of Computer and Information Sciences)
Prof. Kalervo Järvelin

Publications

 


Updated 11.03.2008 Responsibility for updating: KJ


Informaatiotutkimuksen laitos