Data collation, harmonisation and sharing

An overview of the data acquisition and sharing process is provided in the figure below.

The FAO/WHO GIFT inventory of dietary surveys, available as a map in the Data section, contains information on known individual quantitative dietary surveys worldwide. Survey details are provided for each survey available in the inventory in a structured format, containing eight main sections covering key survey information. The information presented in the map were gathered through published survey results, non-systematic literature reviews, internet searches, and direct contact with data owners and other key informants, in particular, the Global Dietary Database (GDD), the Institute of Health Metrics and Evaluation (IHME), and the Nutrition and Metabolism Section at the International Agency for Research on Cancer (IARC). The inventory has a particular focus on low- and middle-income countries (LMICs) surveys, for which dietary survey information are often not broadly accessible to researchers, policy makers, and other stakeholders. The inventory is continuously updated to cover new surveys and increase the accuracy of the information presented.

Only dietary surveys matching the following minimum criterion are shared through FAO/WHO GIFT:

Unit of data collection: individuals.
Methods for dietary data collection: 24-hour recalls, food records (weighed or estimated), or other quantitative methods, such as 12-hour recalls and direct food weighing. Only quantitative methods that involve recording of portion sizes are considered.
Coverage of the diet: all foods and beverages consumed (including water if possible, but not compulsory).
Year of data collection: 1980 or later.
Geographical coverage: national and sub-national surveys.
Sample size for which dietary data were collected: 100 or more subjects.
Other representativeness criteria: no evidence of strong selection bias, such as surveys covering only participants with medical conditions.
Additional criteria for high-income countries: nationally representative surveys.

In addition to the minimum criteria, dietary datasets that are potentially suitable for insertion in FAO/WHO GIFT are assessed for the presence of compulsory variables and the level of disaggregation of the food consumption information. Most importantly, datasets must contain a description, amount, and energy and nutrient values of all foods and drinks consumed by each survey participant in each survey day. Other compulsory variables are the sex and age of each subject, geographical location (country, and region(s) if available), type of area (e.g. rural, urban), and the number of survey days if more than one recall is available per subject or for a subset of subjects. Other highly recommended variables include anthropometrics (weight, height), and the physiological condition of women (pregnant, lactating) and infants (breastfeeding status). In addition, recipes and mixed or composite dishes should be disaggregated as far as possible into their respective ingredients (see the Recipe and mixed dishes subsection below for more details).

Data providers willing to share their dataset(s) through FAO/WHO GIFT start a collaborative process with FAO, receiving clear guidance and support throughout. Once the dataset(s) has(ve) been validated by FAO as suitable for sharing, and before starting the data preparation, the data provider signs a legal document called “License to Redistribute Contribution” for each suitable dataset. It is through this document that the data provider formally grants FAO the right to share the dataset and accepts the data sharing terms and conditions.

A major challenge in analysing dietary data relates to the harmonisation of the coding of food items. Foods vary between countries and regions in terms of forms, varieties, preparation methods and many other characteristics. When using and analysing datasets from different locations, it is essential that comparability is maintained, without losing detailed information on what has been consumed. The use of a common food classification and description system among dietary surveys from different countries contributes to the global harmonisation of dietary data. For this reason, all individual quantitative dietary datasets shared through FAO/WHO GIFT are coded with the FoodEx2 system, a comprehensive and flexible food classification and description system developed by the European Food Safety Authority (EFSA). FoodEx2 was first developed to be used at the European level, and was later scaled up to the global level in collaboration with FAO and the World Health Organization (WHO) to enable the description and classification of foods consumed in other regions of the world. The harmonisation of food descriptions with FoodEx2 enables the grouping of foods according to the FAO/WHO GIFT food groups and the computation of the FAO/WHO GIFT indicators and summary statistics. In addition, the sharing of microdata mapped to FoodEx2 facilitates the use of these data in combination with other datasets coded with FoodEx2, such as chemical occurrence data and food composition data.

When assessing dietary intakes, individuals may report the consumption of single foods and beverages but also recipes. Recipes, also known as composite dishes, represent the combination of ingredients from different food sources. Ideally, detailed information from recipes should be gathered during the data collection phase of a dietary survey, which would include collecting details about ingredients and their quantities, the cooking method, total amount prepared, number of people served, and amount of the portion consumed by the participant (or served and leftover). Whenever possible, datasets available in FAO/WHO GIFT contains disaggregated information for recipes, i.e. detailed information about the ingredients used, the cooking method(s) applied, as well as the quantity consumed of each ingredient and their respective energy and nutrient values. The disaggregation of recipes into their ingredients is needed to correctly attribute each ingredient to their appropriate food group when calculating the FAO/WHO GIFT indicators and summary statistics. Moreover, the provision of information at the ingredient level is critical for several non-nutrient analyses, in particular, for food safety, agriculture and the environment.

All datasets are screened, analysed and formatted according to the FAO/WHO GIFT microdata template and FAO/WHO GIFT codebook before inclusion in the platform. This process includes the renaming and recoding of variables according to the codebook, screening for potential errors, and identification of missing values and outliers. The data analysis and formatting are done using the R programming language and RStudio free software.

Country names and administrative region classification

The country name and the name of the administrative regions available in the datasets are mapped to the Global Administrative Unit Layers Administrative Unit Layers (GAUL) initiative.

Identification of outliers

Key variables in the dataset are analysed for the presence of outliers, namely variables for weight, height, amount consumed and energy and nutrient values. This step is aimed at identifying potential errors or extreme values that could impact the results generated from the dataset. Extreme values are discussed with the data provider and an agreement on how to deal with outliers is sought. If an extreme value is considered implausible and likely to affect the statistics presented in the platform, and no solution is identified together with the data provider, FAO may replace the extreme value and impute with a plausible value, specifically for the purpose of calculating the indicators and summary statistics presented in FAO/WHO GIFT. For each dataset, information on how outliers were treated are available in the survey metadata (i.e. survey details). FAO does not remove or treat outliers from the microdata disseminated through the platform, unless the data provider requests it. Users downloading the microdata for further analysis should screen the dataset for potential outliers that could affect their results and treat them accordingly.

Identification of missing values

All variables are screened for missing values. Missing values are not accepted for compulsory variables such as subject ID, food name and amount consumed. Missing values for other variables are discussed with the data provider and an agreement on how to deal with them is reached. If a nutrient variable contains missing values that could influence the results presented in the indicators and summary statistics, and underestimate nutrient intake, the results are not presented for the given nutrient. Information on how missing values were handled are available for each dataset in the metadata (i.e. survey details). Missing values are never imputed by FAO in the microdata disseminated through the platform unless the data provider requests it. Users downloading the microdata for further analysis should screen the dataset for missing values that could affect their results and treat them accordingly.

Food composition values

Energy and nutrient values available in the datasets shared through FAO/WHO GIFT are provided by the data providers. FAO does not match food consumption datasets to food composition data for the FAO/WHO GIFT platform. If issues related to the food composition values are encountered during the data analysis process, they are discussed with the data provider and an agreement reached on how to resolve them.

At the end of the data acquisition process, after the data provider has signed the License to Redistribute Contribution and the dataset has been analysed, harmonised, formatted and finalised, the microdata are ready to be publicly disseminated by FAO. All microdata are processed and anonymised before dissemination according to the FAO Statistical Disclosure Control (SDC) Protocol.

FAO/WHO GIFT | Global Individual Food Consumption Data Tool

Data collation, harmonisation and sharing

Overview of data acquisition process

Inventory of dietary surveys

Criteria for survey and data insertion

Data validation

Formal agreement for data sharing

Data harmonisation with FoodEx2

Recipes and mixed dishes

Data screening, analysis and formatting

Microdata sharing