Automatic multi-label subject indexing in a multilingual environment

Automatic multi-label subject indexing in a multilingual environment

Boris Lauser^[1], Andreas Hotho^[2]

^[1] FAO of the UN, Library & Documentation Systems Division, 00100 Rome, Italy [email protected] http://www.fao.org
^[2] University of Karlsruhe, Institute AIFB, 76131 Karlsruhe, Germany [email protected]

Abstract. This paper presents an approach to automatically subject index full-text documents with multiple labels based on binary support vector machines (SVM). The aim was to test the applicability of SVMs with a real world dataset. We have also explored the feasibility of incorporating multilingual background knowledge, as represented in thesauri or ontologies, into our text document representation for indexing purposes. The test set for our evaluations has been compiled from an extensive document base maintained by the Food and Agriculture Organization (FAO) of the United Nations (UN). Empirical results show that SVMs are a good method for automatic multi-label classification of documents in multiple languages.

Table of Contents

1 Introduction

2 Automatic Text categorization

3 Background knowledge in form of ontologies

4 Evaluation

5 Conclusion and outlook

References