![]() |
Project: Semantic Information Retrieval in Unannotated Document CollectionsDescriptionOntologies and other conceptual models describe the structure of delimited topical areas. While conceptual models are traditional tools in Information Science, e.g., in the form of thesauri, they have acquired much recent attention in research in several disciplines due to the semantic web. While ontologies greatly resemble traditional thesauri, they can be richer in structural relationships. The most important difference is however the aim toward computational semantics (or inference), which may support more “intelligent” applications and interoperability of IR systems. Through the use of ontologies, the information searcher can avoid (at least greatly reduce) the complexity of natural languages when searching, e.g., in the web. Annotations based on ontologies are supposed to capture the semantic content of documents in a nutshell. As the history of indexing research informs, however, there are problems in cost, quality (consistency), exhaustiveness and specificity of annotation. For example, the most popular metadata format for the Web, the Dublin Core format, was in 2002 employed in 0.3 % of web documents. Our approach is therefore different: we investigate ontology-based access to unannotated document collections. This line of research was begun more than 10 years ago and has produced several academic degrees, research articles (see the FIRE archive). We have shown that structured queries, based on ontologies, greatly improve performance in ontology-based IR at least in news article collections. Research problems for the five-year period include: ontology-based query formulation in varying types of unannotated document collections (news, research articles, legal documents, image collections) in various languages. Further research problems include methods for building ontologies and integrating publicly available semantic sources such as semantic web ontologies, professional terminologies, dictionaries and resources like the WordNet. Yet another research theme is the design of search interfaces based on ontologies. An underlying theme is the evaluation of the effectiveness of each method or tool regarding the quality of the response.
We have earlier developed
a principle of abstraction levels (Järvelin & al., 1996;
2001), which systematically organizes the ontology level with the
corresponding linguistic level (NL expressions for ontological concepts)
and the string matching level (patterns for matching expressions
in text, inflectional and compound languages included). This supports
information retrieval in varying environments without requiring
the user to master the details (document indexing, query languages)
of the environments. Based on this principle, we have developed
the search ontology editor ShOE, which supports semiautomatic construction
of search ontologies, and the QUCCOO query constructor, which is
based on such ontologies. Further we are developing an ontology-based annotation tool for text documents. Duration2003 - 2009 Researchers
Mr. Feza Baskaya –
supervisor Prof. Kalervo Järvelin
Publications
Relevant links
The
project ShOE –
search ontology editor
Updated 11.03.2008 Responsibility for updating: KJ |