Tampereen yliopistoInformaatiotieteiden tiedekunta
Informaatiotutkimuksen laitos

Project: Multi-lingual information retrieval

Description

Multi-lingual IR provides the searcher with the possibility of searching in one language (e.g., one’s own) while retrieving information in multiple languages in multi-lingual collections (like the web). One’s competence may be sufficient for reading more than one language but insufficient for specifying information requests (orthography, special terminology) effectively in them. These problems become more pronounced in truly multilingual environments (like the web). To support the searcher, single-language access to multilingual environments should be provided.

This line of research was begun in mid-90s and has produced several academic degrees, articles and projects on external funding (e.g. the EU 5th Framework Project Clarity, http://dis.shef.ac.uk/mark/clarity). We have shown that structured queries based on dictionary translation greatly improve performance in cross-language IR at least in news article collections (between several language pairs). Our research interests for the five-year period include: structured query formulation in varying collections over various language pairs; transitive translation through intermediate languages when direct translation is not possible; expansion of CLIR queries; corpus-based translation; fusion of translation methods; handling of proper names, other out-of-vocabulary words, and phrases; cross-cultural IR; as well as Finnish and cross-language question answering. Again, an underlying theme is the evaluation of the effectiveness of each method or tool. A special theme in evaluation is evaluation by the quality of the retrieved documents, especially highly relevant documents.

Finnish question answering presents novel problems due to the complexity of Finnish language. Given a natural language question, the determination of an answer pattern is more complex in Finnish. In the cross-language case, the answer patterns must be inferred from questions in another language, possibly structurally very different.

Duration

2003 - 2009

Researchers

See at individual projects.
Prof. Kalervo Järvelin
Dr. Ari Pirkola
Mrs. Eija Airio - supervised by Prof. Kal Jarvelin and Prof. Jaana Kekäläinen
Mr. Heikki Keskustalo– supervised by Dr. Ari Pirkola and Prof. Kal Järvelin
Mrs. Raija Lehtokangas– supervised by Prof. Kal Järvelin
Mr. Tuomas Talvensaari– supervised by Prof. Martti Juhola and Prof. Kal Järvelin

Publications

The publications are mainly listed under individual projects. Here are some early / general ones:

  1. Hedlund, T., Airio, E., Keskustalo, H., Lehtokangas, R., Pirkola, A. & Järvelin, K. (2003) Dictionary-based cross-language information retrieval: Learning experiences from CLEF 2000-2002. Information Retrieval 7(1/2): 99-119.

  2. Cosijn, E. & Keskustalo, H. & Pirkola, A. & de Wet, K. & Järvelin, K. (2004). Afrikaans-English cross-language information retrieval. In: Bothma, T. & Kaniki, A. (eds.) Progress in Library and Information Science in Southern Africa, Proceedings of the third biennial DISSAnet Conference, Pretoria, South Africa, October 2004, pp. 97-110.

Relevant links

The project RelFB – simulated and pseudo relevance feedback
The project multiGradeCLIR – direct and transitive cross-language IR evaluated by graded relevance assessments
The project SGRAM – approximate string matching for out-of-vocabulary words in CLIR applications
The project TRT – transliteration-based matching for out-of-vocabulary words in CLIR applications
The project MLIR– multilingual CLIR applications
The project COCOT – corpus-based CLIR methods

 

Updated 29.12.2005 Responsibility for updating: KJ


Informaatiotutkimuksen laitos