1. SUMMARY

The creation of a data bank for the animal genetic resources of developing countries was proposed several years ago. FAD and UNEP have initiated a study of the possibility of creating such a data bank. This study was intended to promote the design of methodologies and test them in practice. Pilot trials have been held in Asia and Latin America, and experts have been assigned to compile lists of genetic descriptions pertaining to Africa.

The consultant was required to attend the Regional Meetings for Evaluation of the Pilot Trials of Data Banks for Animal Genetic Resources, in Bangkok, Thailand (16–18 October 1984) and Maracay, Venezuela (27–29 November 1984). From these meetings a written report was to be submitted to FAO with recommendations on the equipment to be used in the regional data banks, the procedures to be followed in preparing the data for characterizations of breeds of domestic animals and birds, the office routines and staff required, capital and operating costs and any other recommendations relevant to the establishment of regional data banks for animal genetic resources.

All the recommendations are contained within this report. The consultant would, however, like to point out that no allowance has been made for non-computer personnel in estimating costs.

The consultant would also like to take this opportunity to express his thanks to FAO for having been assigned to the project. It is a most interesting undertaking and one which would not have progressed as far as it has without the considerable organization and the single-minded devotion that has been applied to it by persons both within and without FAO.

2. RESULTS OF THE PILOT TRIALS

From a systems point of view, two different approaches to the storage of data clearly became evident. Both approaches used a set of expandable codes to reference individual traits.

The Asian approach was to concentrate on fixed sets of standard traits. All such traits are predefined. To this extent interpretation of traits could have been by position in record and so the use of codes could have been avoided. The various consultants from Africa also used a predefined set of traits.

The Latin American approach was to record data on traits only as they appear in the source documents. As new traits are discovered, so they become a new entry into the data bank. This system requires codes to delineate pieces of information rather than being interpreted by position in the record.

At first glance, the two approaches might appear to be mutually exclusive. The Asian approach implies fixed length fields for every description, and all information must be “present” (even if only to say that it was not recorded in the source document), perhaps leading to much wasted computer storage space. On the other hand, with the Latin American approach, although some arbitrary limit must still be imposed on the length of description fields, the data bank contains only meaningful data; if it does not exist, it will not be in the data bank.

However, the consultant believes that the two methods can be reconciled to a large degree by making use of software packages that are available off-the-shelf at this moment. This concept should be made clear by examining in more detail the pros and cons of the aforementioned approaches (which shall be referred to for simplicity as the Asian and Latin American approach).

Discussion of these pros and cons is to some extent based on the premise that the Asian approach allows adoption of an off-the-shelf package whereas the Latin American approach does not. If a customized solution had to be written in conventional software regardless of which approach was adopted, then the question of which way to go would become purely a matter of personal choice rather than a matter of choosing the most efficient use of software development time. Either approach would take at least twelve man months to develop a system to cover all required animal species.

NOTE: This report was produced by Dale Curtis, Agricultural Research Institute, University of New England, Armidale NSW, Australia 2351.

A general discussion of package versus customized solutions is given later.

2.1 Advantages of the Asian Approach

- There are many packages currently available for use on micros through to mainframes (many fully upgradable) to handle data entry, validation, searching and reporting of fixed format data. Databases can often be set up within a fraction of the time taken with the more conventional software by people with very little computing knowledge. Security and recovery of data is also handled automatically by many packages.

- Search types and report formats need not be predefined when using one of these packages. English like commands can be used to select the required information, and screens can be “painted” at will to produce new printed report formats. This avoids the need to write new software each time a new report format is required.

- Statistical analyses of varying degrees of complexity are often available with these packages.

- Applications programmes can be linked into the package software to handle requirements not easily attainable directly through package commands.

- Packages are often written in more machine-efficient languages, making retrieval faster than customized software written in a high-level language like COBOL.

- Data is positionally dependent, requiring no system of codes.

- Menus can be quickly set up as an aid to database modification, searching and reporting.

- Data entry provides a checklist of all required information.

- Packages exist where non-existent data occupy much less disc space than the actual predefined maximum length of the missing field, greatly saving on disc usage.

- Most packages allow a relatively simple reorganization of data when the initial design proves unworkable e.g. the length of field needs to be increased, new fields need to be added etc.

2.2 Disadvantages of the Asian Approach

- A prompt for many fields in the data bank may have to be made when entering new data, even though information for many of these fields might be absent.

A further ramification of this is that, depending on the type of data bank package, much wasted space may be introduced into the data bank.

- The format of the data bank should be as well defined as possible before the initial entry of data. If new characteristics (descriptors) are desired after the data bank has been defined, a reorganization will be required and new screen formats must be produced for data entry.

- Someone knowledgeable about the mechanics of a data bank package must be readily available should the data bank require restructuring, recovery from hardware failure, monitoring of access, etc. This person may also need to be able to write software to interface with the data bank package should customization beyond that which the package can provide be required.

- Some packages require a lot of computer memory because they have been made general enough to cover a wide variety of applications.

- Large data entry forms, covering all possible traits, are required since data could rarely be entered directly from the source documents.

2.3 Advantages of the Latin American Approach

- Data need only be entered when available. There is no need to fill out large data entry forms when much of the data on these forms will not be present in the source document being considered.

- The data bank need not be fully defined before it is implemented. As new types of information come to hand it is “simply” a matter of defining new codes and formats for the new information to get it into the data bank. No restructuring of the existing data bank is required.

- The data bank only contains relevant information. There is no waste space due to blank fields.

- Predefined lists may stifle addition of valid new genetic resource descriptors.

2.4 Disadvantages of the Latin American Approach

- It is almost certain that this approach can only be catered for by customized software development. Customized solutions are inevitably dearer than application of a generalized package. Software development is an extremely labour intensive occupation (although claims are now being made to the contrary by use of fourth generation programming languages programmes which write programmes). All costs must be absorbed by the one user.

- Each new type of analysis or search of the data bank requires new software and new lead times. Ad hoc enquiries are often difficult to realize.

- Any major changes in the formats of reports, screen designs and characteristics of the data bank would require software changes and the recreation of the data bank. Even what appears externally to be a simple change may involve many hours of programme coding.

- Addition of a new type of information into the data bank, and thus use of a new code, requires software changes to the programme(s) interpreting the codes and the format of the data to which the code pertains.

- The ordering of codes must be well coordinated so that sufficient codes are available and so that one code is not used for more than one characteristic.

- New software will generally be required for each new species, even if it is only a rehash of an existing system.

- For small valued items, the codes themselves may end up occupying a significant proportion of the data bank.

- Codes have to be known when entering and retrieving data. It is easier for the human mind to work with textual descriptions than with codes.