3 Evaluation Desiderata

This section discusses how a sensible evaluation of the pruning of a conceptualization can be carried out.

User Parameters

As discussed in the previous section the ontology pruning process may be influenced by the user using several parameters. The evaluation should clarify the effects of the three main parameters frequency weighting measure (TF, TF/IDF), granularity (One, All) and ratio on the output. Second, we may evaluate the effects of the two approaches on concept identification (Vector, Trie).

Resource Selection

Naturally, the effects of pruning highly depends on the used document collections and the input conceptualization. We have to ensure that the document sets contain approximately the same amount of textual data (cf. Section 4.1). This guarantees that the absolute number of terms is comparable in the TF measure. When using the TF/IDF measure the absolute numbers are relativized through the size of the corpus, hence the size of both corpora must not necessarily be similar.

Human Cross-Validation

The evaluation of the results of pruning cannot only be based on measures like size and other statistical characteristics of the output. Instead, an empirical evaluation by subject specialists who assess the output has to be carried out. Only subject experts can evaluate the relevance of the extracted concepts and of their descriptiveness towards the specified domain. It is impossible to evaluate each individual output in practise.

Therefore, we base the assessment on the comparision of the pruning output with a gold-standard ontology which includes only the concepts from the source ontology that have determined to be domain relevant by the subject specialists. Thereby we can study the effects of different parameters with respect to overlap between pruned and assessed ontologies.