Tampereen yliopistoInformaatiotieteiden tiedekunta
Informaatiotutkimuksen laitos

Project: Structured document management and data mining

Description

Structured document management and data mining seeks to develop the representation of XML documents and other complex objects, as well as query languages, to provide high expressive power in an easy way for the users. This line of research was begun mote than 20 years ago and has produced a number of academic degrees, numerous articles and projects on external funding. We have shown that by proper structured modeling we can empower users with very high-level declarative query languages of great expressive power for better support to their search tasks. This means that the users need not master difficult programming methods (e.g., procedural or recursive definition of queries) for retrieval. We have also shown that our methods support document data mining (Järvelin & al. 2000; Niemi & al. 2003). Our research problems for the five-year period include: novel XML document representations; high-level declarative query language for XML-documents; and document data mining across documents (for novel connections) and for OLAP.

A central theme in this research is information access in unknown and inconsistent data structures. Autonomous and heterogeneous production of XML information makes this information heterogeneous and information seekers cannot be assumed to know all structural variety (structurally different paths to target data), inconsistent naming conventions (element tags), nor inconsistent representations of the data themselves (e.g., 2005-12-05; Dec 5 2005; …).

In XML documents semantically meaningful information can be found both among description components (attribute/element names) and content components (attribute/element values). The contemporary approach to represent semi-structured XML data is based on labeled directed graphs. However, this approach supports well only such information needs, which can be satisfied by extracting and selecting data whose structure and content is known to the user. The support this approach provides for manipulating unknown structures and contents is weak. Moreover, this approach is incompatible with the manipulation of fully structured (e.g. relational databases) data. Therefore one goal of our research efforts is to develop such a representation for XML data, which eliminates these disadvantages.

Duration

2004 - 2009

Researchers

See at individual projects.

Publications

See at individual projects.

Relevant Links

 

 

 

Updated 11.03.2008 Responsibility for updating: KJ


Informaatiotutkimuksen laitos