Caliper - Statistical Classifications in a Linked Open World

Caliper is a website, a suit of tools and a methodology for the management, dissemination and use of statistical classifications.

What is Caliper?

The Food and Agriculture Organization of the United Nations (FAO) provides internationally recognized definitions, concepts and classifications to promote consistency and comparability of information at the global level.

Caliper is a tool developed by FAO primarily for the dissemination of statistical classifications. In Caliper, users can explore, download and query the contents of classifications, such as the “items” they contain or their codes. This tool contributes to publish, compare and link major statistical classification systems that help defining agrifood systems resources, products and services according to different scopes and which are used for the production of FAO statistical information.

Statistical classifications

Statistical classifications are the vocabularies that provide the statistical concepts and definitions underlying data collection, analysis and dissemination. They consist of “items”, often organized into hierarchies for the purpose of data aggregation. Items always have a title (sometimes called descriptor, name or label) and a code, typically used in databases and other information systems. In addition to the title, a definition is usually given, to better specify the “meaning” of the label – interpret the data collected under that item and compare with data collected through different schemes. 

More formally, according to the Expert Group on International Statistical Classifications (2013) (pg. 5), a statistical classification is “a set of categories which may be assigned to one or more variables registered in statistical surveys or administrative files, and used in the production and dissemination of statistics. The categories are defined in terms of one or more characteristics of a particular population of units of observation. A statistical classification may have a flat, linear structure or may be hierarchically structured, such that all categories at lower levels are sub-categories of a category at the next level up. The categories at each level of the classification structure must be mutually exclusive and jointly exhaustive of all objects in the population of interest.” 

 

Why Caliper

The goal of Caliper is twofold - make statistical classifications easily searchable and comparable by users, and efficiently reusable in computer applications. To this end, Caliper offers a browsing interface for navigating (querying and browsing) classifications’ contents, while also offering tools dedicated to software developers, who can query Caliper datasets and include results in their information systems. In this way, Caliper complements the FAO catalogue of statistical standards available at the website Methods and Standards, and supports the Organization's wider effort to promote open data practices and to improve access to data.

Caliper was originally developed with financial support from the Bill and Melinda Gates Foundation, and it is now run by FAO with technical support from the University of Tor Vergata (Rome, Italy).

Technology

Caliper exploits well-known technologies developed for the web. All classifications have been converted into RDF (Resource Description Framework), stored into a triplestore, and exposed to users for browsing and searching through ShowVoc, a web-based software that natively supports RDF and related languages. Beyond RDF, classifications are also made available for download in other widely used formats, namely CSV and JSON. Classifications may also be queried online through a SPARQL public endpoint (SPARQL is the query language for RDF, just like SQL is the query language for relational databases). 

The language (and data model) used to express classifications in Caliper is the Resource Description Framework (RDF). RDF is a widely used language to exchange data on the web, promoted and maintained by the W3C (the World Wide Web Consortium). We have chosen RDF because it is designed specifically for exchanging information over the web, it is widely adopted and implemented in many tools that embrace the FAIR and open data philosophy. 

RDF may be expressed using different syntax (from the most verbose, based on XML, to the more compact Turtle (standing for Terse RDF triple Language)), all expressing the same data model, based on triple statements. It is customary to call the three components of a triple as “subject”, “predicate”, “object”. Each part of the three parts of a triple may be identified by an URI (Uniform Resource Identifiers), while an object may also be a literal value. Triples are collected into graphs (subject and object represent nodes of the graph, predicates are the arcs). One or more graphs together form an RDF dataset. 

While producing RDF from data, any data, is fairly easy, producing RDF that can be effectively exchanged between machines (machine actionable) mostly depends on the adoption of standard and documented vocabularies to express the “predicate” of triples. We widely rely on existing and well-known RDF vocabularies to express all parts of our datasets. For example, constructs to express the hierarchical structure of classifications come from SKOS, while notions that are specific to statistical classifications are taken from XKOS.  

RDF relies on the use of URIs (Uniform Resource Identifiers) to provide unique identifiers to the resources describe using the RDF language. Then, in Caliper, classifications, and items therein, are given URIs that make possible to directly refer to them. URIs are especially useful to develop computer applications