FAO enhances the ways in which organizations can contribute to AGRIS

Pixabay/Pexels

In addition to submitting bibliographic records (metadata) via email to AGRIS, partner organizations (data providers) now can participate in AGRIS by using standards like the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). FAO has recently implemented a software component called “harvester” that can periodically scan a list of target OAI-PMH repositories to check for new records to be ingested. Currently, the AGRIS harvester is scheduled to run every three months.

FAO constantly explores ways to expand and strengthen AGRIS. As information management officer, Imma Subirats-Coll, who leads the AGRIS Programme, says:

One way to improve AGRIS is by making it easier for partner organizations to contribute to it. The AGRIS harvester makes this possible! The net result of this is more knowledge being made accessible to as wide an audience as possible. Starting May 2021, FAO has been adding records to the AGRIS database by automatically harvesting bibliographic data, with 1 124 317 records added in the latest release on 14 June 2021.

What are the main advantages of harvesting?

FAO is strengthening the collaboration between AGRIS and its partner organizations by prioritizing their visibility, as well as that of the original source of any data provided by these organizations.

Harvesting metadata periodically from data providers, whether journals or repositories, enables partner organizations to have up-to-date content in the AGRIS database. Considering the high usage of AGRIS worldwide, this is critical in terms of providing end users accurate metadata and increasing the discoverability of digital collections.    

What is the AGRIS harvester? (For those more technically inclined!)

The AGRIS harvester is essentially a command line Python tool for harvesting OAI-PMH endpoints. It is a stateless tool and can run in parallel, on any server, container, cloud function, etc.

For each execution, the AGRIS harvester downloads a set of XML files. Each execution requires at least an OAI-PMH endpoint (i.e. a base URL) and a target metadata format. If there are no “sets” available, the tool harvests all the data available at that endpoint. Otherwise, if one or more sets are specified, the tool is executed N times on the endpoint, where N is the number of desired sets.

What are the requirements for the harvester to work on the bibliographic data of a data provider?

First and foremost, the data provider must comply with OAI-PMH. Moreover, the AGRIS harvester requires that:

  • there be an open endpoint, with no password and no whitelist of IP addresses;
  • if metadata contains characters not allowed in XML, the usage of CDATA sections should be considered; and
  • files should be encoded as UTF-8.

Another important recommendation is the usage of the “set” argument to organize records by category. This enables AGRIS to execute "selective harvesting," considering only those records that belong to the AGRIS domain. The themes and topics of FAO define the scope of AGRIS.

What are the preferred metadata formats?

OAI-PMH requires, at the very minimum, the use of Dublin Core metadata format. Additionally, other metadata formats that provide more granularity are supported by AGRIS:

  • Mods 3
  • DOAJ
  • EndNote XML
  • AGRIS AP
  • PubMed NLM

We invite data providers who use OAI-PMH compliant systems, such as repositories or journals in the field of food and agricultural sciences, to share their base URLs with us at [email protected]