![]() |
Project: Multigrade CLIR – Direct and Transitive Cross-Language IR Evaluated by Graded Relevance AssessmentsDescription
Research on cross-language information retrieval (CLIR) has typically been restricted to settings using binary relevance assessments. In this project, we present evaluation results for dictionary-based CLIR using graded relevance assessments in a best match retrieval environment. We use text databases containing newspaper articles, and test topics with graded relevance assessments scaled from 0 (non-relevant) to 3 (highly relevant). We have such collections in Finnish and English, which thus form the target languages in our experiments. As source languages we use Finnish, English, German and Swedish. We study both direct translations from the source languages to the target languages and transitive translations via pivot languages as well. Monolingual baseline queries are also considered. In our tests we employ the UTACLIR query translation system, which is dictionary-based, and query expansion based on pseudo-relevance feedback. Generally we use target queries structured by synonym sets – shown to yield better performance than bag-of-words target queries. CLIR performance is evaluated using three relevance thresholds: stringent, regular, and liberal as well as generalized recall and precision (Kekäläinen & Järvelin 2002).
Duration
2003 – 2007. Project finished.
Researchers
Mrs. Raija Lehtokangas– supervised by Prof. Kal Järvelin Publications
<> Updated 11.03.2008 Responsibility for updating: KJ |