Chapter 10 Quality considerations in the compilation of a food composition database

This chapter describes the stages involved in database compilation, from the collection of data to their entry into the computerized (or published) database. In most database programmes, this is the process in which the programme's own sampling and analytical procedures for the production of values merge with indirect, literature-based operations.

The compilation process is not merely a clerical task of assembling numerical values into a suitable format. The operation includes the appraisal of all the information entering the database management system. In the process, each item of data is evaluated against a series of criteria. In many cases the compilers must consult individuals with a sound knowledge of foods and nutrients and an understanding of analytical procedures before deciding whether or not to include certain values.

The evaluation of data is an iterative process between the various stages of the database system (Chapter 1). Although the compiler reviews the data at all levels, questions will frequently be raised as the compilation proceeds that require a return to the primary data source. It is therefore essential that the evaluative process be fully documented.

Experiences of compilation within a national food composition programme are described by many authors and published in proceedings of INFOODS Regional Data Centre meetings

(e.g. Aalbersberg, 1999) and in proceedings and special journal issues from national and international food data conferences (Greenfield, 1995; Food Chemistry, 1996; Journal of Food Composition and Analysis, 2000, 2001, 2002, 2003a)

Sources of data

Before outlining criteria for data scrutiny, it is necessary to consider the primary sources of the data. They can be considered to fall into four broad categories (Table 10.1), each with its own characteristics that the compiler must take into account. Although in principle all data should be evaluated against the same criteria, it should be recognized that much existing information on the composition of foods does not fully meet the ideal criteria. The four major categories of sources of data are as outlined below.

Table 10.1 Sources of compositional data
*Source*	*Description*
Primary publications	Articles in the scientific literature containing compositional data for foods
Secondary publications	Reviews or published compilations including compositional data
Unpublished reports	Reports ranging from analytical records to reports prepared for internal use within an organization, but not published in a formal sense
Analytical reports
specific	Analyses carried out specifically within a database programme
non-specific	Analytical work carried out for other purposes

Primary publications

This category includes compositional data in papers published in scientific journals. In addition to journals on food science and nutrition, those concerned with the analysis of food by-products, studies of soil treatment, animal and plant husbandry, and analytical method development, among others, are included.

While these papers have usually been subject to peer review and refereeing, the work will generally have been evaluated with regard to the primary purpose of the study, and not necessarily the quality of the compositional values as such. Thus, the experimental sections of papers may often contain insufficient detail to permit use of these values without application of the formal criteria discussed below. Nevertheless, these data have a clear unequivocal source and can usually be related directly to specific foods and analytical work.

Secondary publications

This category includes reviews, other published compilations of compositional data (including food composition tables and computerized databases) and material published in books or unrefereed journals. The values given in this category may be more difficult to evaluate against the formal criteria. For example, data from other food composition tables should ideally lead the compiler to the sources of the data, published or unpublished, but frequently the source leads only to another set of tables. When compositional values are published in unrefereed publications, the compiler may have to consult the author or the compilers of the database before the values can be properly evaluated.

Some compositional data are published in their original form in food composition tables, as in, for example, The composition of foods (McCance and Widdowson, 1940, 1946, 1960; Paul and Southgate, 1978), where primary analytical values were published. In the 1960 edition, material taken from the literature was fully referenced. The 1978 edition provided keys to the laboratories supplying analytical values specifically obtained for the edition, the methods used and references for material taken from the literature. In subsequent editions (Holland et al., 1991; Food Standards Agency, 2002) and the supplements (Holland, Unwin and Buss, 1992a, 1192b; Holland et al., 1991; Holland, Welch and Buss, 1992; Holland, Brown and Buss, 1993; Chan, Brown and Buss, 1994; Chan et al., 1995, 1997; MAFF, 1998) to the United Kingdom food composition tables (which constitute the primary United Kingdom nutritional data), the publication of the keys was discontinued for reasons of economy, but the information is still available from the publishers. Many countries continue to publish details of their sample and analytical documentation, abridged or complete, and this is to be encouraged. Whether released in printed publications or not, all data compilation centres should be able to make documentation details available to users as required.

Unpublished reports

This category includes compositional data that have been collected into a document prepared for limited circulation, frequently for internal use in commercial companies, institutions or government departments. The application of formal criteria to these data is often difficult and depends on the nature of the document. These reports often contain original analytical data and, as such, can be valuable sources of compositional values. Alternatively, the data may be used as confirmatory or to provide some indication of variation in a particular constituent. The authors should be consulted, where possible, if there is any doubt or confusion about the values.

Unpublished analytical data

This category includes two broad types of data. First are the analytical data that were not generated specifically for a nutrient database (where, for example, the collection of food samples was not designed to be representative and the analyses were not controlled or supervised by the organization or group responsible for the database). In these cases the compiler must carefully scrutinize sampling and analytical procedures, and must also be confident that appropriate quality control procedures were in use. Direct access to records of food samples and analytical notebooks is especially valuable. Also, a proper evaluation can be made if the compiler can discuss the values with the person responsible for sampling and analysis.

The second type is unpublished values obtained specifically for the database programme. These values should be scrutinized, even though the compiling organization controlled the sampling and analytical procedures albeit through contracts. In a strict sense, these new analytical data merely join the existing population of values, and should be compared with other sources of compositional data. Only when there is good evidence that a food has changed (for example, if a new variety has been introduced or changes in agricultural or secondary production practices have been made), or that improved analytical procedures have been used, can older values be rejected (see sections on “Changes to values” and “Obsolete foods” on pages 185–186). Differences not obviously due to these factors must be investigated, and it is often desirable to repeat the sampling and analysis in confirmation.

Table 10.2 Criteria for scrutiny of data
*Parameter*	*Criteria*
Identity	Unequivocal identification of food sampled
Sampling protocol	Collection of representative sample
Preparation of food sample	Cooking method Precautions taken Material rejected as inedible, etc.
Laboratory and analytical sample preparation	Nature of material analysed Methods used for sample preparation
Analytical procedures	Choice of method Compatibility Quality assurance procedures
Mode of expression	Compatibility with that used in the database

Criteria to be applied during data scrutiny

The bases for these criteria have been reviewed in earlier chapters. They are summarized in Table 10.2.

Identity of the food

The compiler must be certain of the identity of the food sampled for analysis. Primary plant foods may need to be identified by both species and variety, while fish and carcass meats may need to be identified by species. Age and maturity will also often be relevant to proper identification. When the food consists of a part of a plant or animal, this must be clearly identified. Proprietary products and cooked dishes are particularly difficult to identify. Foods that cannot be unambiguously identified should be flagged as such in the database. A photo-graph or graphical image may assist in clearer identification in the future (Burlingame et al., 1995b).

Nature of food sample

The food samples must be representative. Thus, scrutiny includes evaluation of the sampling plan used to obtain the food in terms of number/weight of items collected, date and time of collection, geographic location, mode of combination of items, etc. (Chapter 5).

Nature of material analysed

The nature of the material analysed must be clearly established: raw or cooked (with method), how prepared (e.g. with or without peel), edible portion description and weight, refuse description and weight, typical serving description (e.g. one slice for bread) and weight.

Analytical sample preparation and analytical procedures

The preparation of the analytical sample and the analytical procedures are often described together in reports. Their evaluation requires close familiarity with nutrient analyses. First, the protocol for preparation of the analytical sample should be scrutinized to see whether it meets the criteria discussed in Chapter 5. Second, the analytical methods should be evaluated; preference should be given to values obtained by means of validated methods that are compatible with methods in international use (Chapters 6 and 7) and to values whose sources indicate that appropriate quality assurance procedures were in place (Chapter 8).

Mode of expression

The compiler must be able to identify clearly the mode of expression used and, especially, the bases on which analytical values have been expressed. This is particularly important when the published values have been derived from analytical values by the use of conversion factors.

An approach to the formalization of the above criteria is given in Table 10.3.

The compilation process

Assembling data sources

The first stage is the assembling of data sources, including published tables. A rigorous search of the literature is essential. Special care is necessary in designing the search strategies when using computerized searches that are highly dependent on keywords, and some additional manual searching can be useful. Authors' abstracts should not be relied upon as sources of values; the full papers need to be examined. Literature searches usually start with the abstracting journals, and each bibliographic reference normally leads to several others. Recent papers should be sought by regular consultation of abstracting literature and databases. Journals not covered by an abstracting service must be referred to directly. It is desirable also to establish contact with sources of unpublished data: university, government and private laboratories; research institutes; commodity boards and food manufacturers.

The INFOODS Web site (2003) is especially valuable as a source of advice when seeking information on uncommon, and indeed all, foods. This site gives access to the INFOODS mailing list, which provides regular access to queries and responses, and to notices of meetings.

Discretion may be needed in obtaining and using manufacturers' data, as they may insist that the information be treated confidentially. Nevertheless, the data may be valuable for confirming information from other sources.

Where data appear in the sources as mean values of several determinations on replicate food samples, where possible the authors should be asked for the individual replicate values.

Archival stage

All the relevant information obtained should be recorded systematically using one of the many computerized database management systems available. The primary requirement is

Table 10.3 Criteria for acceptance of compositional values into a database
*Criterion*	*Clearly acceptable*	*Progressively decreasing acceptability*	*Usually unacceptable*¹
Sampling criteria
Identity of food	Unambiguous	Identity becomes less clear	Any ambiguity
Representativity	Indigenous to the database population	Less representative of the foods consumed	Not stated
Number of samples	Protocol designed to achieve defined confidence limits	Sample numbers chosen arbitrarily	Selective samples, or very limited in number
Nature of material analysed	Clearly defined	Definitions becoming less clear	Not stated or unclear
Analytical sample preparation	Described in detail and known to conserve nutrients	Described briefly, but still known to conserve nutrients	Not stated, or no evidence of need to protect nutrients in sample
Analytical criteria
Choice of analytical method	W ell established and internationally compatible	Less well described, or unpublished modifications	Not stated
Performance of method	Established, validated in collaborative trials	Established, but not validated in-house	Not stated, or not known to be adequate. Possibly superseded by better method
Quality assurance	Described, or referenced. Use of proper standards and standard reference materials	No record of quality assurance, replicate analyses only.	Not stated
Mode of expression	Units and methods of calculation clearly stated	Progressively less-clearly described	Units and factors not given
Note: 1 Where the values are the only ones available it may be useful to archive the data.

that the system should be very flexible with regard to the number of fields and the facility for interchanging data with other computerized systems. International food composition data interchange formats have been proposed (Klensin, 1992; Schlotke et al., 2000), and development continues as an international effort under INFOODS.

Data from each source should be assessed for general quality and consistency, and entered into the system for easy access. Computer programs should be able to accommodate in specific relational tables all the data and metadata, including source details and notes on methods of analysis, sampling procedures, etc.

Comprehensive compilation at this stage is critical for the quality of the database. This represents the archive or store of all values reported for the composition of foods. It is important to retain historic data in the collection because these provide information that helps in the assessment of whether the composition of a food is changing over time or whether it has a stable composition. The significance of methodological changes can also be assessed from comparisons of data over time. Many users are involved in the analysis of historical records of food intakes and require access to the most relevant compositional data. In the context of this account, an archival database is seen as the computerized store of all available data, both recent and historic.

All the information on food identity, sampling, analysis, quality assurance procedures and modes of expression need to be evident for each record because it will be used in the next stage. Values recorded in the data source will need to be converted to the form in which they will be presented in the reference and user databases.

Bringing together all the values for the foods will identify discrepancies that will require the compilers to return to the original data source and to rescrutinize the data. Very commonly, transcription errors will be found but even when these have been eliminated, discrepancies often remain. These can be due to inconsistencies in the identification of the foods, for example differences in plant varieties. Comparing the values of other foods analysed within the data source with values reported in another data source can provide some idea of the confidence that can be applied to the credibility of the source.

However, even after the strictest scrutiny, differences in the compositional data will persist; these may represent analytical artefacts or reflect the natural variations in composition. In these cases, the ideal, if resources permit, is to set up a sampling and analytical protocol to confirm the values. Failing this, one can only retain the questionable value and assign to it a lower confidence code (Exler, 1982).

Reference stage

The archival stage provides the basis for preparing the reference database. In this, all the acceptable data for each food from the different archival records are combined and presented in a common compatible format with links to the archival records and their metadata.

To do this, the compilers have to review all the available data for each food. Most data sources do not provide coverage of all the constituents required for the database as a whole and typically cover a limited range of components. The compilers must consider whether the different samples of foods are compatible. This requires comparing water and fat contents and considering whether the adjustment of values to a constant base is justified. Each stage in evaluation of the data must be documented so that the logic of any decisions taken or calculations used in the construction of the reference database can be followed later.

This review may require a return to the data sources to check points or confirm that values have been recorded correctly.

It is also necessary to consider which statistical techniques would be appropriate for evaluating the data for the food.

In this, all acceptable data from the archival records are identified and the logic of statistical combinations is recorded together with the average (if this is seen as appropriate), the median, or a selected value based on an assessment of the reliability of the sources (Paul and Southgate, 1978). This last approach may be seen as subjective but if one has an array of values and their number is inadequate for formal statistical combination the compilers must make these judgements if a useful database is to be prepared. At this level, a sensible degree of disaggregation is necessary. For example, a single record for “apple” would not be appropriate when data for individual apple cultivars are available. Final checks for internal consistency are required.

Preparation of the user databases

Dietitians may have requirements for a user database with certain types of food and certain forms of data presentation; agriculture and food industry professionals may require another type of user database. Several different user databases and tables can be prepared from a single, well-constructed reference database. The preparation of user databases requires examination of the food records in the reference database and their combinations (where necessary) and final checks for internal consistency. In many cases, the database for all the foods is provided in the “reference database” for the country or region. In this book we see the “user databases” as those that contain one set of data for each food item, and in which the nutrients and other constituents are given one value per food item. It may be necessary to provide two or more entries for a single food, for example where seasonal differences in composition are sufficient to justify two separate food records. The preparation of user databases should not entail actual data entry. All data to be used in preparing the user databases should have been entered during the archival and/or reference stages.

Scrutiny of values

First, values for each nutrient in each food are subject to a rescrutiny that at least equals that used at earlier stages of the database compilation. The reported values for each nutrient in each food are examined specifically for consistency. The use of objective statistical techniques is preferable where sufficient data are available. Discordant values may be statistical “outliers” that arose at sampling or analytical stages. The tests for outliers (Youden and Steiner, 1975) are designed to eliminate two categories: values that lie outside the measured variability of the values and those in which the measurements themselves show excessive variance. Once outliers have been identified, the mean or median statistics can be recomputed and the variance recalculated without their inclusion.

Outliers should not, however, be deleted from the database per se. They can simply be marked for exclusion from the calculated mean in a user or reference database. Upon returning to the data sources to investigate the values, the compiler may find that the outliers are methodologically distinct and may be preferred, perhaps because they are the product of a more specific procedure or because the analytical sample was better handled (e.g. a preservative was used).

Combination of values

Because individual data sources rarely include the complete range of nutrients for a given food, it is often necessary to combine values from a variety of sources. In combining these values, it is vital to make certain that the various sources are compatible and that there is internal consistency.

Use of average values

When several values exist for the same food and nutrient, the compiler must review the procedures used in the reference records and reconsider how best to derive a single value for use in the database. When a large number of values are available, the use of an arithmetic mean value, or possibly the median, is the preferred approach.

When only a small number of values are available and the values exhibit a wide variance or range, the situation is much more complex. The variability may be due to the presence of outlying values or due to poor quality or non-representative food samples. In many cases, the compiler must judge which values have a higher level of confidence (i.e. better-documented food samples, choice of most appropriate method, or clear evidence of a quality assurance programme). In the United Kingdom food composition tables (Paul and Southgate, 1978), these were called “selected values”. In such cases the compiler must record the evidence used to select the values, so that the decisions can be re-evaluated independently.

In some instances the compiler may employ a weighting procedure. For example, if a value is required for a food with a seasonal variation in consumption or composition, a value reflecting the year-round composition can be calculated by weighting the values in relation to the consumption pattern. Again, documentation of this weighting is essential.

Calculations from analytical values

The database will include some derived values, calculated from analytical data. These have been discussed in Chapter 7. Some points, however, require further emphasis.

Energy value. The values in all nutritional databases are estimates of “metabolizable energy” calculated using energy conversion factors with the energy-yielding constituents in foods – proteins, fats, carbohydrates, alcohol and sometimes organic acids or other constituents. The factors commonly used are the Atwater factors (Merrill and Watt, 1955; Southgate and Durnin, 1970; Allison and Senti, 1983), in their general or specific versions. These were originally expressed as kcal but now more commonly are given as kJ. The kcal factors were rounded by Atwater (Merrill and Watt, 1955) and therefore direct use of the kJ factors is preferred so that this rounding is not carried out twice. In many databases, energy is a dynamic value rather than a fixed value. This allows the compiler to prepare different energy values for the different user databases. For example, a dietitian may prefer energy values calculated from specific Atwater factors, while for food-labelling purposes, food industry personnel may require energy calculated from general Atwater factors. Furthermore, energy calculation recommendations can change over time, requiring recalculation of all energy values in a database. Recommendations from the FAO/WHO Expert Consultation on Carbohydrates in Human Nutrition (1998) suggested that energy factors for dietary fibre should be used. Dealing with such situations is a straightforward data management task when the energy values exist, with simple algorithms programmed into the system to allow energy calculation to take place as needed. Energy conversion factors should be treated in the same way as other numeric data and should be included in the reference database with their INFOODS tagnames.

Protein. Protein values are conventionally calculated by the application of conversion factors to values for total organic nitrogen. More accurate values are produced, however, if conversion factors are applied to amino acid-nitrogen values (see Chapter 9), or by summation of the amino acids. All data and factors used in calculations should be included in the reference database.

Vitamin equivalents. The recommendations to derive values for vitamin equivalents are described in the conventions on nomenclature (IUNS, 1978).

Vitamin A activity. Derived values are usually used for vitamin A activity, since values for pre-formed vitamin A (retinol and its derivatives) and for the provitamin carotenoids may be combined by algorithm at the user database level. The convention is to express vitamin A activity in m g retinol equivalents that equal the sum of m g retinol, and m g b-carotene divided by the factor 6, plus total m g of other carotenes divided by the factor 12. Other conversion systems allow for the contributions of other carotenoids. Data for retinol, all individual provitamin A carotenoids and all activity conversion factors need to be recorded in the reference database with their INFOODS tagnames. It should be noted that the conventional conversion factors are not supported by recent research (van het Hof et al., 2000) and that new factors have already been adopted in some countries for some purposes (Murphy, 2002).

Recalculation of vitamin A activity with updated factors is straightforward when the original values are given, as with energy, and preference should be given to m g values for the individual carotenoids. The international unit conversions for vitamins A and D are given in Chapter 7. The convention adopted for calculation of vitamin A activity should be included in the database documentation.

Niacin activity. Equivalent values for niacin activity are also widely used where the contribution of tryptophan is included. The convention is to express niacin activity (mg) as mg niacin (or nicotinic acid) plus mg tryptophan divided by 60.

Fatty acids. Calculation of fatty acids per 100 g food from data for fatty acids per 100 g total fatty acids is demonstrated in Appendix 5.

Calculation of composition of composite prepared dishes

In the absence of analytical values from representative samples of prepared composite dishes, estimated compositional values for these dishes can be based on recipes and the composition of each ingredient. The yield, or change in weight during cooking (i.e. weight of raw dish and cooked dish) must be known. Several authors have published guidelines on calculation procedures (Rand et al., 1991; Bognár and Piekarski, 2000) The simplest version of the calculation procedure does not acknowledge fat gained (e.g. from frying oil) or lost during cooking, because the calculation assumes that changes in weight reflect only a loss or gain of water. Estimates of vitamin losses can be made using nutrient retention factors (Bergström, 1994; USDA, 2003c), but these values must be assigned lower confidence levels than analytical values. One version of the calculation has the following stages:

From the weights of raw ingredients, calculate the amounts of water and nutrients present in total raw food before cooking.
Sum the nutrients.
Divide the nutrient sums by cooked weight to give the composition of cooked food per 100 g. The water content of the cooked food is calculated (total water in raw ingredients – loss of weight on cooking).

A worked example of this calculation is given in Appendix 6. Table 3.3 on page 41 gives additional information, which may assist in the development of variations to this calculation.

Internal checks on selected values

Internal checks on the nutrient profiles developed for each food are especially important when values from several sources are used for a single food.

For the proximate composition, the sum of the components should ideally equal 100 g; in practice, a range of 97 to 103 g is permissible. If the summated values fall outside this range, one should first rescrutinize the calculation of protein values (was the appropriate factor used?) and mode of expression of starch (as g starch or as monosaccharide?). If the summated values are still outside 97 to 103 g, suspicion must fall on particular analysed values, which must be rescrutinized at the archival and data source levels.

Fatty acids should not exceed 95 percent when expressed as a percentage of total fat (because of the glycerol present in triacylglycerols (triglycerides)); when expressed as g per 100 g of food, they should not exceed total fat multiplied by the appropriate factor (see Table 9.2).

Total amino acids should not exceed 6.25 g per g of nitrogen to a level that is larger than any correction for the gain of water on hydrolysis (see Chapter 7). The total will be considerably less than this in foods with high levels of non-protein nitrogen or large amounts of amide. The checks on the recovery of amino acids may require rescrutiny of the data source, because many published papers do not report analytical recovery, especially of nitrogen from the ion-exchange column.

Summary of the compilation process

An overview of the compilation process is given in Table 10.4. Each stage of preparation demands detailed scrutiny of the preceding stages, and frequently requires a return to the data source level. The quality assessments become more clearly defined and established as the iterative compilation proceeds.

Table 10.4 Summary of compilation process
*Stage*	*Summary of operations*	*Type of scrutiny applied*	*Format*
Data source	Collection of sources containing compositional data	Analogous to reviewing a scientific paper; check on consistency of data; preliminary assessment of data quality	In form published: paper or electronic record
Archival record	Compilation of information from data sources	Scrutiny of data source against formal criteria; refining assessments of data quality	Database format, plus records of sampling protocols; analytical methods; common modes of expression adopted
Reference database	Compilation of data from archival records for each food	Comparison of values from different sources; rescrutiny of archival and data sources to assess inconsistencies; calculation of statistical measures	In database format, with array of all acceptable values for each food item; records of statistical analyses; formal assessments of data quality
User database	Selection and compilation of series of values for each food item in database	Combination of values to give one value for each nutrient per food item; mean, or median, plus suitable measures of variability	In format required by database users

Producing an integrated estimate of dataquality

Many data users require some indication of the quality of the data included in the various databases so that when data are being combined there is confidence that the quality of the data is comparable across different databases. This is particularly important where the data are interchanged electronically between databases.

Producing an integrated assessment of data quality involves a series of judgments about a data source and information about the food in question.

Although both sampling and analytical criteria, in principle, need to be considered, in practice it is often best to start with the analytical aspects.

The use of a well-documented method justifies a high quality rating whereas the use of a method without description or referencing gives the data a low rating. In addition, evidence that the method was controlled by a quality assurance programme, with the use of appropriate standards or SRMs, where available, further supports a high rating, whereas the absence of such evidence leads to a lower rating.

A low rating does not in itself mean that the values with this rating are incorrect, merely that the authors (or the journal) have not presented evidence that their data should inspire confidence.

A well-designed sampling protocol that was planned to meet certain confidence limits, for example 95 percent (implying that 95 percent of samples would have values within 5 percent of the given value) would represent a very high quality of sampling. However, in practice such protocols are extremely rare and will often only apply to a limited range of nutrients. Sampling protocols with confidence limits of 90 percent are probably the highest standard that one can reasonably expect, and for resource reasons will only be available for foods that are major components of the diet.

Most sampling protocols have lower confidence limits, and sample numbers of between 10 and 20 give a reasonable measure of confidence, except for those nutrients that are very variable or unstable such as vitamin C, folates and many trace inorganic constituents.

Analyses of single samples with no evidence of a sampling protocol other than convenience have a very low level of confidence. Many users consider that “any value” for a food or nutrient that forms a minor component of the diet is better than no value. Thus, one could argue, for example, that values on a few samples of caviar or champagne could be used in a database. Analogously, a few analyses on a proprietary product that is subject to rigorous quality control would have a good confidence limit.

Quality assessments and quality codes

Quality or confidence codes are a formalized approach to the acceptance of data (see Table 10.3), originally suggested by Exler (1982) and given in Tables 10.5 and 10.6. In this approach, a numerical value is given to the data for each criterion, and the values are combined and

Table 10.5 Confidence codes and their criteria as used by Exler (1982) and adapted
*Evaluation*	*Documentation of analytical method*	*Analytical sample handling and appropriateness of analytical method*	*Quality control*
0	None	Totally incorrect handling	No duplicate
1	Unpublished but described	No documentation	Duplicate portions
2	Published but modified, modification described	Reasonable, documented, widely used technique	Duplicate portions
3	Complete documentationpublished	,Extensively documented, tested and appropriate	Standard reference materials, spikes, recoveries or blind replicates

Note: The lowest value for each criterion becomes the limiting quality index for the data from each data set. Confidence codes are assigned on the basis of the sum of the quality indices as in Table 10.6.

Table 10.6 Confidence codes and their criteria as used by Exler (1982) and adapted
*Sum of quality indices*	*Confidence code*	*Meaning of confidence code*
>6	a	The user can have confidence in the mean value
3-5	b	The user can have some confidence in the mean value; however, some questions have been raised about the value or the way it was obtained
1-2	c	Serious questions have been raised about this value. It should be considered only as the best estimate of this nutrient in this food

translated into a confidence code. Like all systems, it is arbitrary and can be used only as a guide. The preferred approach is statistically based; an appropriate number of food samples have been collected, and analysis has employed well-documented methods (with defined performance characteristics) that have been subjected to collaborative trial. Holden, Bhagwat and Patterson (2002) have described the evolution of the Exler approach. In this, quality is recognized as an integration of sampling and analysis, with a number of objective questions for each of the original five categories: sampling plan, number of samples, sample handling, analytical method and analytical quality control (see Box 10.1). It should be emphasized that the documentation related to the calculation of quality codes must be available in the archival and/or reference database.

Box 10.1 Evaluation categories and criteria

1. Sampling plan

Evaluation criteria:

Random selection of sampling locations
Number of regions represented
Number of cities/regions
Number of samples taken
Number of seasons covered

2. Number of samples

(Note: this is the number of individual food
samples analysed independently,not the
number of sample unitscollected.)
Evaluation criteria:

Number of independent analyses
Multiple analyses of a single composite or the same sample count as one

3. Sample handling

Evaluation criteria:

Homogenization
Equipment used
-Validation of homogeneity
Analysis of edible portion
Storage conditions
Data on moisture content

4. Analytical method

Evaluation criteria:

Validity of method
-Evaluation of the method against a set of standard criteria
Validity of the method as used by the
laboratory
-Demonstration of the ability of the laboratory to use the method successfully,usually by
analysis of certified reference materials

5. Analytical quality control (QC)

Evaluation criteria:

Resultsfor QC of the material in the analytical batch
Coefficient of variation (CV) for the QC
material
Frequency of use of QC material
-W ith each batch, daily, weekly, occasionally
Recovery resultsfor the batch

Each of the evaluation categories is intended to have clear and objective questions that have Yes/No/Unknown answers. Each of the five is marked out of 20 on a continuous scale, giving a maximum score of 100. The methodological aspects are constructed using the advice of the best current practice by expert groups. The aggregated scores from the evaluation categories are used to provide the confidence codes.

The development of schemes for the evaluation is an ongoing one and, as will be evident, is highly dependent on the proper documentation of compositional studies. It is important to remember that confidence codes are not real numbers but guides for the users of the data. The confidence that can be ascribed to analytical values is, in the final analysis, determined by how accurately the value obtained predicts the value in the food; for this, statistical characterization of the composition of the food is essential. One caveat here is that these codes are categories and should not be manipulated arithmetically as if they were real numbers.

Changes to values

Once a user database has been disseminated and the data are in use, it is important to retain a record of the values even after they have been changed. Burlingame (1992) describes the importance of the “changes” feature of the New Zealand food composition database. There are at least three reasons for changing values in a database: they may be updated with the acquisition of more values for the calculation of a mean; they may be corrected if an incorrect value is identified; or the need for modification may be due to real changes in the composition of the food (e.g. arising from new fortification legislation). In all cases, it is useful to document the reason for change and retain the old values in a “changes” database, representing an audit trail for the database. One example of how this might be useful is when national food composition surveys are conducted over time; if the nutrient intakes of the population vary from one survey to the next the “changes” database will allow the differentiation between true changes in intakes and changes simply related to corrections and updates made to the database.

Obsolete foods

As with the “changes” feature, it is important to keep an audit trail of food records, even when a food is no longer represented in the food supply. The food code is usually used as the “key” in a relational database management system. Often these codes also are used in dietary assessment projects, applications software packages and other important ongoing activities where compositional data are used. It is therefore prudent to maintain original food codes permanently, and never reuse them for other foods, even when the foods to which they were originally assigned become obsolete.