5. Troubleshooting issues

5.1 Character Sets

One of the most important characteristics of the XML mark-up is its portability. In fact, XML documents can contain any Unicode character except some of the control characters. Unfortunately, many databases offer limited or no support for Unicode and require special configuration to handle non-ASCII characters. If your data contains non-ASCII characters, be sure to check how and if both your database and data transfer software handle these characters. Thus, a proper check on how the database (or the system from where data is extracted) handles and manipulates the encoding should be done.

Given that XML documents should be encoded in UTF-8 (preferable), other encoding schemes, such as ISO-8859-1 are also allowed, provided that it is clearly declared in the starting XML header declaration in order for it to be recognized and displayed correctly in the centralized AGRIS XML repository.

5.2 Predefined Entity References

XML predefines the five entity references as shown below. These predefined entity references are used in XML documents in place of specific characters that would otherwise be interpreted as part of mark-up language.

Character	Entity References
&	&
<	<
>	>
“	"
‘	'

5.3 Whitespace

Four characters are treated as white space in XML data: Horizontal Tab, Line-feed, Carriage-return and ASCII space character. None of these should be represented within the same XML tag. The input should be typed as it should be rendered in the AGRIS web database. Therefore, the following two situations should be avoided:

5.4 Repeatable elements

Repeatable elements or qualifiers should all have multiple tags.

For example, the following is not valid:

<ags:subjectClassification scheme="ags:ASC">E20 ; J12</ags:subjectClassification>

whereas the following is the correct encoding:

<ags:subjectClassification scheme="ags:ASC">E20</ags:subjectClassification>
<ags:subjectClassification scheme="ags:ASC">J12</ags:subjectClassification>

5.5 Size of XML documents

Avoid creating and sending large XML documents. If files are bigger than 500Kb, split them into two or more files. When splitting the file, make sure to place correct header in them (See 4.1). However, a preferred approach for handling XML documents in large repositories is splitting each XML resource in each document.

5.6 Scheme for xml:lang

The xml:lang attribute gives XML authors a consistent way to identify the language contained within a particular element. The AGRIS AP uses this attribute for elements for which it was considered necessary to know the language of the content. This extensibility also enables for multiple values of the specified element in different languages. Use ISO639-2 scheme (three letter code) for the xml:lang attribute.

5.7 Empty-Elements Tags

No elements, sub-elements and attributes tags should be present without content. When mapping the structure of the database to an XML document, you should be careful that optional element types and attributes are mapped to nullable columns (that is fields allowing null values) and vice-versa. The result of not doing so is likely to produce invalid documents (when transferring data from the database) and to create unwanted empty elements or qualifiers.

5.8 Mandatory tags

Despite the flexibility of the DTD schema, there are a few rules that should be followed, in order to have a valid XML document. Some mandatory elements, qualifiers and schemes must be entered in the relevant tags. This means that valid XML documents should contain at least the following five core elements for each ags:resource:

• <dc:title>
• <dc:date>
• <dc:subject>
• <dc:language>
• <agls:availability>

5.9 Inconsistency of local metadata with AGRIS AP elements

Local systems often contain fields that are specific to meet local requirements and have no corresponding element in the metadata structure defined by the AGRIS AP. During the mapping and transformation process this metadata should not be considered. This version of the AGRIS AP metadata is quite flexible and its purpose is also to accommodate varied types of information, and, in future versions it might take into consideration possible additions of key qualifiers or schemes.

5.10 Nesting, dumbing-down and DC compliance

The Dublin Core metadata set is not completely adequate to describe resources and in certain cases, the DC element has been qualified for the purposes of AGRIS resources.

The qualification of the Dublin Core elements is guided by a rule known colloquially as the Dumb-Down Principle. According to this rule, a client should be able to ignore any qualifier and use the value as if it were unqualified. While this may result in some loss of specificity, the remaining term value (minus the qualifier) must continue to be generally correct and useful for discovery. Qualification is therefore supposed only to refine, not extend the semantic scope of an element.

Element refinements (qualifiers) share the meaning of the unqualified element, but with a more restricted scope. When necessary, these qualifiers are “dumbed down” to Dublin Core elements by means of nesting to ensure DC compliancy. The following example dumbs down ags:publisherPlace and ags:publisherName into the core element dc:publisher. This practice results in the element refinement being ignored and the value used as content of the unqualified element core element dc:publisher

With qualifiers	dumb-down
<dc:publisher> <ags:publisherPlace>Rome (Italy)</ags:publisherPlace> <ags:publisherName>FAO</ags:publisherName> </dc:publisher>	<dc:publisher>FAO Rome (Italy)</dc:publisher>