In this module, the system detects incorrect and inconsistently applied relationships and suggests the appropriate relationships for expert confirmation. We propose three techniques to handle this process: semantic relationship rules, noun phrase analysis, and WordNet alignment.
The outline of this algorithm is illustrated in Fig. 5, where T1, T2 and Rel denote, respectively, Term1, Term2, and the AGROVOC relationship between them.
AGROVOC Cleaning_& Refinement (T1,
T2, Rel) |
;Return new__relationship |
||||||
1. If (Rel = BT or Rel = NT) |
|
|
|||||
|
Then If Agree_Expert_defined_Rules (T1, T2, Rel) |
|
|||||
|
|
Then return new_refined_relationship. |
; following the rules |
||||
|
|
Else If Headword-Is-Compatible (T1, T2) |
|
||||
|
|
|
Then return subclass/superclass relationship. |
|
|||
|
|
|
Else If Is_Wordnet_HypernymPath (T1,T2) |
|
|||
|
|
|
|
Then return subclass/superclass relationship. |
|
||
|
|
|
|
Else If Agree_Revision_Rules (T1, T2, Rel) |
|
||
|
|
|
|
|
Then return new_relationship |
; following the rules |
|
|
|
|
|
|
Else return U. |
; Un-refined |
|
|
|
||||||
2. Else If (Rel=UF or Rel = USE) |
|
||||||
|
Then If Is_Wordnet_Synset (T1, T2) |
|
|||||
|
|
Then return synonym relationship. |
|
||||
|
|
Else If Agree_Revision_Rules (T1, T2, Rel) |
|
||||
|
|
|
Then return new_relationship. |
; following the rules |
|||
|
|
|
Else return U. |
; Un-refined |
|||
|
|
||||||
3. Else If (Rel=RT) |
|
||||||
|
Then If Agree_Revision_Rules (T1, T2, Rel) |
|
|||||
|
|
Then return new_relationship. |
; following the rules |
||||
|
|
Else return U. |
; Un-refined |
Fig. 5 An Algorithm for Data Cleaning and Relationship Refinement
The relationship revision rules have been discussed in Section 4. Section 5.2 briefly describes the procedures based on noun phrase analysis and WordNet alignment, and Section 5.3 describes the verification tool.
Using noun phrase analysis
The noun phrase analysis technique is used to analyze the surface form of a compound term's head word. If the head word of a term has the same surface form as its broader term, the system will apply the 'subclassOf'/'superclassOf' relationship to them. The system analyzes compound nouns using the following rule:
NP -> MOD NCN
MOD -> NCN, NPN, ADJ, ...
Where MOD is a modifier, NCN is Common Noun, NPN is a proper name, ADJ is an adjective
For example,
From the compound noun analysis, the head word of Cow milk is milk which has the same surface form as Milk, the broader term of Cow milk. Then, the system will apply the <subclassOf> relationship to Cow milk and Milk. The result of the analysis shows that the head word of Milk fat is fat, which is not compatible with the broader term, Milk. In this case, other techniques must be used to refine the relationship.
Using WordNet Relationships
In this step, the hypernym/hyponym relationships of WordNet are used to align the BT/NT relationship in AGROVOC, and the synset of a term in WordNet is used to align the UF/USE relationship in AGROVOC. Since the relationships in WordNet are verified by experts and WordNet contains a great number of general domain terms including agricultural terms, WordNet is a good resource for aligning some AGROVOC relationships such as taxonomic and synonym relationships. (Other verified sources could be used as available, individually or in combination.) The process of this step starts with the system retrieving the synset offset number of the AGROVOC UF/USE term in WordNet. If the system finds these terms and they have the same synset id number, the system will apply the 'synonym' relationship to them. The system will also query the AGROVOC broader term and narrower term in WordNet. If the system finds that the broader term is the ancestor of the narrower term in the WordNet hierarchy, the system will apply the 'subclassOf'/'superclassOf' relationship to them. For example,
Cabbage BT Vegetable
Query results for Cabbage and Vegetable in WordNet show that Cabbage is a hyponym of Cruciferous vegetable and Cruciferous vegetable is a hyponym of Vegetable. Fig. 6 shows the relationship of Vegetable and Cabbage in WordNet and AGROVOC.
Since Vegetable is an ancestor of Cabbage, the system will define Vegetable as superclassOf Cabbage. In the case of Milk NT Milk fat, the relationship is not refined by this technique because Milk and fat are in different hypernym paths in WordNet.
Fig.6. The relationship between Vegetable and Cabbage in WordNet and AGROVOC
After the system has suggested the new relationships for terms, the expert will verify the semantic relationship refinement results and also define the appropriate relationship for the cases that cannot be handled by the system. Fig.7 is the user interface for verifying the output of the system. The expert can verify the terms and relationships by querying by
1. Term to verify each term and its relationships to other terms e.g., rice
2. Semantic relationship e.g., <containsSubstance>
3. Rule e.g., 'If class X is meat#1 and class Y is animal#1, and X RT Y then X <madeFrom> Y'.
Fig.7 Verification Tools