Towards a topic model for Multi-Actor project research and dissemination outputs based on AGROVOC concepts

FAO/Luis Sánchez Díaz

A use case by Hercules Panoutsopoulos, Agricultural University of Athens

In the seven years of implementation of the Horizon 2020 Framework Programme, the investments of the European Commission into Research and Innovation projects focusing on agriculture, forestry, and rural development have totaled up to almost 1 billion euros. These Multi-Actor projects  (details about the concept and scope of H2020 Multi-Actor projects are provided at the EIP-AGRI’s “Horizon 2020 multi-actor projects” report, 2017) have created a large number of research and dissemination outputs (so-called “digital objects”) conveying important information regarding research results, best practices and innovations.

However, the uptake and re-use of these digital objects have been limited. As with many research projects, many of these outputs are no longer available after the projects’ end. As a result, there is limited potential for the various agricultural stakeholders to have access to the available knowledge for further research and development. Within this context the EUREKA project was designed to provide a long-term solution by creating a permanent repository for collecting and further disseminating agriculture-related digital content and datasets. 

EUREKA is an EU-funded H2020 project (funded under the Grant Agreement No 862790) having the primary goal to strengthen and improve the flow of agricultural knowledge and innovation at the European, national, and regional levels. It is developing the “FarmBook”, a digital repository (still in development) for permanently storing and sharing the digital objects created by the previously funded Multi-Actor projects. The repository must be able to handle heterogeneous digital objects (documents, videos, audios, images, datasets of numerical values from in-field measurements or transmitted from sensors, etc.) available in a variety of formats. Τo facilitate the access and re-use of the digital objects, the FarmBook is drawing upon semantic web standards and the FAIR data principles (Wilkinson, et al., 2016).  

The design work in EUREKA has involved the creation of a light-weight (in terms of semantics), graph-based structure to describe the topics of the FarmBook’s digital objects by combining the categorisation of agricultural topics in the EIP-AGRI taxonomy with the AGROVOC thesaurus. EIP-AGRI (i.e. the European Innovation Partnership for Agricultural Productivity and Sustainability) is an initiative aiming to provide farmers and foresters with the support and incentives needed to network, innovate, and share experiences and practices. In this context, the EIP-AGRI’s portal serves as a hub for the European agricultural community enabling access to a variety of resources.

The content available from the portal (e.g. practice abstracts - short textual summaries of practical information or recommendations, which have emerged during the implementation of Multi-Actor projects, aimed to be used by various actors in the value chain) is tagged with the topic categories of the EIP-AGRI’s taxonomy of agricultural topics. The extensive use of this taxonomy by the community has led to the decision to use it as the backbone of our topic model.

A group of project partner organisations including the Agricultural University of Athens, Maastricht University, Institut de l' Elevage, Ghent University, the Institute of Agricultural Economics Nonprofit Kft, and the Leap Forward Group have worked together to develop the topic model. 

Use case description

The methodology that has been followed for the topic model’s development is shown in Figure 1. 

Figure 1. Methodology followed for the creation of the topic model in the EUREKA project. Source: EUREKA project, 2021



Review of the EIP-AGRI’s taxonomy of agricultural topics – The EIP-AGRI’s topic categories have been established following an empirical, bottom-up approach, based on the issues addressed in the outputs of Research and Innovation, Multi-Actor projects. No definitions have been provided to these topic categories. For this reason, a review has been undertaken in order to propose a definition for each EIP-AGRI topic category based on domain literature. The proposed definitions have helped identify the AGROVOC concepts mapped to them and develop the semantic network of each topic category of the EIP-AGRI taxonomy. Figure 2 shows the EIP-AGRI’s topic categories.

Figure 2. The EIP-AGRI’s topic categories adapted from the EIP-AGRI’s portal. Source: EUREKA project, 2021



Creation of a semantic network for each EIP-AGRI topic category  – Each topic category of the EIP-AGRI’s taxonomy has been linked with concepts defined in the AGROVOC thesaurus. The aim of this activity has been to identify the AGROVOC concepts having a broader, narrower and similar scope to each EIP-AGRI topic category. These types of relations have been encoded into our model using the skos:broader, skos:narrower, and skos:closeMatch properties of the SKOS Specification (Miles & Brickley, 2005). Table 1 below lists the AGROVOC concepts that have been identified by the group of domain experts as broader, narrower, and similar to the “Landscape/land management” topic category of the EIP-AGRI taxonomy. The establishment of links modelled with the help of the above-mentioned relation types has been made based on the definitions of both the EIP-AGRI topic categories (produced in the previous step) and the AGROVOC concepts.

 

Table 1. EIP-AGRI topic categories and the AGROVOC terms with a broader, narrower, and similar scope. Source: EUREKA project, 2021



Apart from the identification of the types of relations mentioned above, each of the EIP-AGRI topic categories has been associated with one or more agricultural sectors (i.e. “Crop Farming”, “Livestock”, and “Forestry” - Aquaculture has been intentionally left out because there are no H2020 Multi-Actor projects that have dealt with issues in the specific sector) and/or cross-sectoral themes (namely, “Environment”, “Society”, and “Economics”).

These associations have been identified based on the expertise of the group of domain experts and have been coded into the topic model using SKOS. The concepts of the AGROVOC thesaurus, the agricultural sectors and cross-sectoral themes, as well as the SKOS properties that link them to each of the EIP-AGRI topic categories form the semantic network of each topic category. Figure 3 below shows the semantic network of the “Land/land management” topic category. 

Figure 3. Semantic network of the EIP-AGRI “Landscape/land management” topic category. Source: EUREKA project, 2021



Assembly of the topic model – The integration of the semantic networks of all the EIP-AGRI topic categories into a graph has led to the final construction of the EUREKA FarmBook's topic model.     

Benefits of using the AGROVOC thesaurus

The creation of a topic model based on the use of AGROVOC concepts is intended to improve the search and navigation operations in the FarmBook platform. Specifically, the key benefit coming from the use of AGROVOC is the availability of labels in multiple languages as this can be used for multilingual searches. According to Celli and Keizer (2016), the idea behind the concept of multilingual search is that the user can get results in languages other than the one in which the search operation was executed.

By associating each of the agricultural topics in the EIP-AGRI’s taxonomy with AGROVOC concepts of a broader, narrower and similar scope, the user can indeed receive an enhanced set of results (covering issues of a broader, narrower, or similar scope to a specific topic - to deliver enhanced sets of search results, the AGROVOC concepts have also been used as keywords for the annotation of FarmBook’s digital objects.) in languages other than that of the search terms/query.

This way we can provide solutions to exhaustive research (i.e., allow the user to find anything available on a specific search topic) and exploratory seeking (i.e., provide the user with some “good” results in case of not being sure about what he/she is looking for) needs according to the categorisation of information needs mentioned by Rosenfeld et. al. (2015). 

Next steps

The next step will be the topic model’s evaluation by an expert group, broader than that involved in its creation, by means of an evaluation survey. The aim of this survey will be to collect feedback that will allow for refinements and fine tuning of the model (illustrated in Figure 1 with the arrows pointing from the evaluation step of the methodology back to previous steps). The topic model will be used in the FarmBook’s information architecture for tagging digital objects. The development of its rdf graph with the use of the appropriate software (e.g., Protégé - Brandt, 2011) will allow for its publication and availability via an open access repository (e.g., Zenodo).

Acknowledgements

The work presented in this document was undertaken under the Horizon 2020 EUREKA project, receiving funding from the EU under the No 862790 Grant Agreement.

References

Brandt, S. 2011. A practical guide to building owl ontologies using Protégé 4 and co-ode tools, edition 1.3. The university of Manchester, 107.

Celli, F. and Keizer, J. 2016. Enabling multilingual search through controlled vocabularies: The AGRIS approach. In Research Conference on Metadata and Semantics Research (pp. 237-248). Springer, Cham.

European Innovation Partnership for Agricultural Productivity and Sustainability 2017. Horizon 2020 multi-actor projects [15 June 2021]. 

https://ec.europa.eu/eip/agriculture/en/publications/eip-agri-brochure-horizon-2020-multi-actor 

Miles, A. and Brickley, D. 2005. SKOS Core Vocabulary Specification: W3C Working Draft 2 November 2005 [23 August 2021]. https://www.w3.org/TR/2005/WD-swbp-skos-core-spec-200511 02/ 

Rosenfeld, L., Morville, P. and Arango, J., 2015. Information Architecture: For the Web and beyond. 4th edition. O’Reilly Media: Estados Unidos, California.

Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship  Sci Data 3, 160018 https://doi.org/10.1038/sdata.2016.18