Caliper - Statistical Classifications in a Linked Open World

Can you find your questions? If not, please let us know!
General questions

Caliper is FAO's platform for the dissemination of statistical classifications. Caliper is a website and a suit of tools, and, perhaps most importantly, a methodology and an approach to the management, dissemination and use of statistical classifications.

One of Caliper's main pillar is that statistical classifications are standards of public use and interest, and as such should be treated. In particular, given the importance of data (we are in a "data age"!), it is crucial that what provides the meaning to a piece of data (usually referred to as "dimensions", "classifications" or "codes") is clearly understandable to users, and easily reusable in information systems (interoperability is key).  

If you are a data user, you are probably interested in checking out classifications’ contents (e.g., codes or definitions) or correspondences using our browser ShowVoc

If you are a developer, you may be interested in checking out the online query facilities offered by our SPARQL endpoint. We have developed a number of sample queries for your tests

NO. The classifications published in Caliper are maintained and published by dedicated institutions, sometimes in collaborations with FAO. Go to section Classifications in this website to see all classifications currently included in Caliper, and their custodians. 

Being part of the FAO web content, the FAO Terms and Conditions apply to Caliper.

The statistical classifications disseminated through this website and services are all in the "public domain" (i.e., materials that are not protected by any intellectual property rights such as copyright, trademark or patent laws). To our knowledge, no specific license is defined or adopted for statistical classifications.

FAO has formalized its policies regarding the licensing and the terms of use of the statistical databases it produces, see the policy document on  Open Data Licensing for Statistical Databases, and the page on Statistical Database Terms of Use.

YES. Check out our page Documentation

Open Data

According to the Open Definition, “Open data and content can be freely used, modified, and shared by anyone for any purpose”. The idea of open data has gained momentum with the raise of the Internet and, more strongly with open data initiatives promoted by governments and other large institutions. The general understanding is that open data is distributed with an open license, such as the Creative Commons, and expressed according to standardized (as opposed to proprietary) machine-readable formats. Moreover, it should be registered in appropriate catalogs so as to facilitate its discovery.

Linked Data

Linked Data is structured data that is interlinked with other data, so as to become more useful. Linked data aims at making data more easily consumable by machines, for example by means of semantic queries. It relies on existing web standards such as the HTTP protocol, the RDF data model, and the notion of global identifiers over the web (URIs). 

All. Caliper supports editing, display of and search for information in all languages, including those with non-Latin script (such as the two FAO official languages Russian and Chinese) and with right-to-left orientation (such as Arabic, another FAO official language). 

All tools used in Caliper are free (= formats and code open in the public domain, no software fees). The costs of Caliper reside in hosting, system administration, expertise involved in the conversion of classifications and their maintenance. All the tools adopted have a large community of users to ensure reliability.    

Questions on Classifications and Data Model

The basic data model is RDF

At the most basic level, the data model adopted in Caliper is RDF, a language for the web that expresses data as graphs, whereas a relational data model expresses data as tables. A graph data structure consists of vertices (aka nodes, or points) connected by edges (aka links or lines). The basic unit of a graph is a triple node-edge-node. A relational data model consists of rows and columns.

To understand the "conversion" between a table and a graph, consider the little table below:

IDNameSurnameAge
A25MaryJones89

Roughly (explanation to be refined later on), that table corresponds to a graph that can be expressed using the following 3 triples:

  • A25 "has name" Mary
  • A25 "has surname" Jones
  • A25 "has age" 89

In the RDF language, "A25" is a subject of the triple, "has name" is a predicate, "May" is an object. The advantage of using this approach to data modelling, is that "A25" can be a global ID, i.e., unique over the web, as opposed to unique over a specific database. The other advantage is that the predicate may be defined once and for all in a public vocabulary, expressed in ways that are understandable by machine and people alike, and therefore reusable over the web.  

SKOS provides the construct to express the basic elements of a classification

SKOS stands for Simple Knowledge Organization System. SKOS is a vocabulary for RDF defining the constructs for expressing classification schemes, subject headings, thesauri, taxonomies and the like. We use SKOS to express: hierarchies, classifications entries, labels, explanatory notes, definitions, and correspondences. SKOS is a W3C specification. 

Other standard vocabularies

Other vocabularies are available for unambiguously expressing pieces of information over the web. XKOS (the Extended Knowledge Organization System) is one of those, defining constructs specific to statistical classifications (such as correspondences, classification levels, etc.).

A full account of the modelling adopted in Caliper can be found in Section Documentation of this website.

SKOS

SKOS stands for "Simple Knowledge Organization System (SKOS). It is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data." (From Wikipedia).
SKOS defines classes and properties for representing:
• a "concept scheme" (a set of terms: a classification, a code list, a list of subject headings...)
• its terms / concepts (labels in different languages, definition, notation, editorial notes...),
• the relationships between concepts (generic or hierarchical)
• subsets of concepts (collections)
It is therefore suitable for representing classifications in a semantic, machine readable way.

SPARQL

SPARQL is the query language for RDF. 

Vocabulary

In everyday language, a vocabulary is a set of words, possibly used by a group, individual, or work, or in a field of knowledge (See the definitions given by Merriam-Webster dictionary). Vocabularies are then fundamental to shape the universe of discourse of people, and have a special role in the field of information management, especially in the form of controlled vocabularies, i.e., selected list of words used as "tag" or "classifier" of information unit - numeric or textual data. Because of their role in defining the entities to measure and codifying data, statistical classifications can be considered as special types of vocabularies.
Also in the area of information management and in the semantic web, vocabularies play a very important role. The World Wide Web Consortium (W3C) Vocabularies are defined in this broad sense by the W3C: "On the Semantic Web, vocabularies define the concepts and relationships (also referred to as “terms”) used to describe and represent an area of concern. Vocabularies are used to classify the terms that can be used in a particular application, characterize possible relationships, and define possible constraints on using those terms. In practice, vocabularies can be very complex (with several thousands of terms) or very simple (describing one or two concepts only)." 
Moreover, the W3C usefully distinguishes two types of vocabularies:
1. value vocabularies or sets of controlled values used to categorize and classify things. These are also known as Knowledge Organization Systems (KOS) and include classifications, code lists, thesauri, even certain types of ISO standards that prescribe controlled lists of values;
2. metadata element sets that prescribe what features or properties should be used to describe things. They are also called schemas, or description vocabularies. XML schemas and RDF schema, formal languages to describe entities in XML and RDF respectively. Other example include ontologies, application profiles, and UML models.
The statistical classifications that are the focus of Caliper fall under the first type. SKOS, the formal language we used to express statistical classifications in a machine-readable format, is an example of the second type. Specifically, SKOS is a vocabulary for RDF, tailored to express thesauri on the web.

when I open it with Excel leading zeroes as missing

You probably opened the csv from Excel. If you opened it with a text editor, you would see all leading zeroes correctly in place. To be able to see them in Excel too, follow the instructions given by the Office Support website.

When I open it with Excel I see weird characters instead of an Arabic/Chinese/Russian name

This is because Excel does not recognize UTF-8 encoding.

Caliper works well with all formats commonly used to store or pass classifications around, such as CSV, XLS, DB dump, or JSON.

Yes. The editing tool used in Caliper is VocBench. It is a powerful tool, able to support the editing of both classifications (as RDF vocabularies), and the OWL model (ontologies) the use, when this is the case. VocBench also fully supports editorial workflows, so that some users will only be able to add translations, for example, while other are allowed to approve changes and perform more complicated operations. Therefore, the level of knowledge of RDF required for the maintenance of classifications in VocBench very much depends on your role in the project.