Tampereen yliopistoInformaatiotieteiden tiedekunta
Informaatiotutkimuksen laitos

Project: Semantic Information Retrieval in Unannotated Document Collections

Description

Ontologies and other conceptual models describe the structure of delimited topical areas. While conceptual models are traditional tools in Information Science, e.g., in the form of thesauri, they have acquired much recent attention in research in several disciplines due to the semantic web. While ontologies greatly resemble traditional thesauri, they can be richer in structural relationships. The most important difference is however the aim toward computational semantics (or inference), which may support more “intelligent” applications and interoperability of IR systems. Through the use of ontologies, the information searcher can avoid (at least greatly reduce) the complexity of natural languages when searching, e.g., in the web. Annotations based on ontologies are supposed to capture the semantic content of documents in a nutshell. As the history of indexing research informs, however, there are problems in cost, quality (consistency), exhaustiveness and specificity of annotation. For example, the most popular metadata format for the Web, the Dublin Core format, was in 2002 employed in 0.3 % of web documents.

Our approach is therefore different: we investigate ontology-based access to unannotated document collections. This line of research was begun more than 10 years ago and has produced several academic degrees, research articles (see the FIRE archive). We have shown that structured queries, based on ontologies, greatly improve performance in ontology-based IR at least in news article collections. Research problems for the five-year period include: ontology-based query formulation in varying types of unannotated document collections (news, research articles, legal documents, image collections) in various languages.

Further research problems include methods for building ontologies and integrating publicly available semantic sources such as semantic web ontologies, professional terminologies, dictionaries and resources like the WordNet. Yet another research theme is the design of search interfaces based on ontologies. An underlying theme is the evaluation of the effectiveness of each method or tool regarding the quality of the response.

We have earlier developed a principle of abstraction levels (Järvelin & al., 1996; 2001), which systematically organizes the ontology level with the corresponding linguistic level (NL expressions for ontological concepts) and the string matching level (patterns for matching expressions in text, inflectional and compound languages included). This supports information retrieval in varying environments without requiring the user to master the details (document indexing, query languages) of the environments. Based on this principle, we have developed the search ontology editor ShOE, which supports semiautomatic construction of search ontologies, and the QUCCOO query constructor, which is based on such ontologies. Further we are developing an ontology-based annotation tool for text documents.

Duration

2003 - 2009

Researchers

Mr. Feza Baskaya – supervisor Prof. Kalervo Järvelin
Mrs. Sari Suomela – 2003 - 2005
Mrs. Anne Kakkonen - 2005 - 2008, supervisor Prof. Jaana Kekäläinen

 

Publications

  1. Järvelin, K. & Kekäläinen, J. & Niemi, T. (2001). ExpansionTool: Concept-Based Query Expansion and construction. Information Retrieval 4(3/4): 231-255.
  2. Airio, E. & Järvelin, K. & Saatsi, P. & Kekäläinen, J. & Suomela, S. (2004). CIRI An ontology-based query interface for text retrieval. In: Hyvönen, E. et al. (Ed.) Web Intelligence: STeP 2004 The 11th Finnish Artificial Intelligence Conference. Helsinki, Finland: Finnish Artificial Intelligence Society, Publications 20, pp. 73-82.
  3. Suomela, S. (2005). User test on multi-lingual ontology interface. In: Bailey, A, Ruthven, I, Azzopardi, L, eds. Proceedings of the Workshop on Evaluating User Studies in Information Access at CoLIS 5, Glasgow, Scotland, June 2005.
  4. Suomela, S. (2005). User study on ontology as query construction tool. In: Bailey, A, Ruthven, I, Azzopardi, L, eds. Proceedings of the Workshop on Evaluating User Studies in Information Access at CoLIS 5, Glasgow, Scotland, June 2005.
  5. Suomela, S & Kekäläinen, J. (2005). Ontology as a search tool: A study of real users' query formulation with and without conceptual support. In: Losada, DE & Fernandez Luna, JM, eds. 27th European Conference on Information Retrieval ECIR05, Santiago de Compostela Spain, March 2005. Heidelberg: Springer, Lecture Notes in Computer Science 3408, 315-329.
  6. Suomela, S. & Kekäläinen, J.: User Study on Ontology as a Query Construction Tool. Information Retrieval 9(xxx): xxx-xxx. Accepted for publication, October 2005.

Relevant links

The project ShOE – search ontology editor
The project QUCCOO – ontology-based search interface

 

Updated 11.03.2008 Responsibility for updating: KJ


Informaatiotutkimuksen laitos