CHAPTER 3 SAMPLE DESIGN

Surveys of woodfuel consumption, supply and provision are basically conducted by means of sampling techniques. This means that by studying a small group (sample) selected at random one obtains information on variables of interest to a larger group (universe⁶), thus permitting inferences as to the behaviour of these variables within the universe. This procedure is adopted because surveying an entire universe (unless very small) entails high costs.

3.1 Universe

The universe must be defined in the light of the objectives of the survey. It can be expressed in geographical terms (locality, municipality, district, province, country or some intermediate category) or in sectoral terms (urban population, pottery manufacturers, fuelwood producers). It is also necessary to place time limits on the definition of the universe, because its composition and characteristics can change over time. It is recommended that the universe be given spatial limits that coincide with standard or official groupings (political, administrative, natural, etc.) in common use in countries, so that its dimensions can be estimated from information already available.

The universe is given a preliminary definition at the start of the methodological design of the survey. It will subsequently be refined once its size and spatial and temporal distribution are known by reviewing existing information. The redefinition may mean extending or reducing the universe. An extension may be called for when it is realized that an area exists with sizeable woodfuel use or where there is real or potential supply. Causes for universe reduction might be that the scarcity of information on supply and demand in a certain area is such that its inclusion in the survey would introduce greater error than its elimination; or the realization that a given locality or area does not form part of the universe because without major users.

3.2 Sampling frame

Once the universe has been defined, information that is a precise as possible has to be sought on its dimensions and spatial and temporal distribution in order to construct the sampling frame, this being the basis on which to develop the sampling design. The sampling frame is the information that locates and defines the dimensions of the universe and may consist of housing censuses and maps grouped by locality, district, quarter, etc.; maps of forest cover with types of vegetation or land use; or housing lists in small localities. Constructing the sampling frame is described in the sections on General Variables – Supply, Demand and Provision (Chapter 2).

3.3 Sampling unit

A basic concept in sampling theory is the sampling unit, which is the minimum unit of observation for information on the operative variables. The sampling unit must be clearly defined for constructing the sampling frame. By convention in statistics, a capital “N” is used to refer to the number of sampling units making up the universe, and a lowercase “n” for the number of sampling units in the sample itself. The sampling unit best suited for the respective sectors is shown in Table 3.1. Other sampling units can be defined as suggested by the objective of the survey.

Table 3.1: Sampling unit for thematic group and sector or branch under examination

Group	Sector/branch	Sampling unit
Demand	Residential - urban - rural	Home
	Industrial	Establishment
	Commercial
	Institutional
Supply	Direct	Plot
Supply	Indirect	Establishment
Provision	Producers	Individual producers, companies
	Transport operators
	Commercial suppliers

Once the universe and sampling unit have been defined, and once the sampling frame is ready, the sample design comprises two major stages: definition of type of sampling and determination of sample size.

3.4 Types of sampling

There are different types of sampling, but all are based on the principle of randomness. In order to be able to make valid inferences from a sample for transposition to the universe, the sample must be representative of the universe; and this is achieved by its randomness and adequate size.

The basis for statistical inference, then, is randomness. This means that all the elements making up the universe have the same chance of being selected to form the sample. If the selection is not random, there is a serious risk that the findings will not be representative of the whole population, but of a section only. This is referred to as bias. An example of bias due to non-random selection in an inventory of wood resources occurs when the plots selected are those in the vicinity of access roads, which are likely to be more heavily visited and have smaller stocks of wood. Extrapolating the results of this non-random sample to the universe would lead to an underestimation of stocks.

Sample size will depend on the variability of the phenomenon under study, the level of confidence set and acceptable error. One common mistake is to think that for a sample to be representative of a universe, it must be directly proportional in size to that of the universe, in other words, the larger the universe, the larger the sample. This is not true and details on how to arrive at the required sample size are given later.

3.4.1 Simple random sampling

This consists in selecting randomly “n” sample units (SU) in the universe, in a way that gives us all the SUs the same opportunity of being selected.

Each SU is assigned a number and the sample is selected randomly from tables of random numbers, calculators, lots, etc. This technique can only be used when there is a complete sampling frame that includes all the sampling units and where these can be readily recognisable and identifiable in the field, for example a telephone directory or a list of homes identified by street and number or the name of the occupant. When constructing a sample of natural resources, it is usually difficult to identify or locate the selected plots accurately, as this requires a detailed map and instruments for precise geographical location.

When simple random sampling must be used:

• When it is known that the variable of greatest interest is randomly distributed within the universe

• With small universes (not more than 200 SUs)

• With universes with little geographical dispersion

• When the pattern of distribution for the variable under study is not known

3.4.2 Stratified random sampling

Stratified random sampling is used when the whole universe of size N is broken down into relatively homogeneous strata for the variable under study. This is advisable provided the variation between strata is greater than the variation within each stratum.

Regarding the selection of sampling units and estimation of parameters, each stratum is treated independently, as if it were a universe on its own. Within each stratum, the sampling units can be selected at random, by clusters or systematically.

Stratified sampling makes it possible to improve the precision of estimates with reduced sampling effort, to characterize each stratum separately and to facilitate field work.

It is most important to realize that the sampling units should belong to only one stratum, that the strata should be recognizable by people outside the survey group, and that the actual size of the stratum should be known. It is not advisable to form a large number of strata, because this would unnecessarily complicate field surveying and data analysis.

When it comes to deciding on a stratified sample there are general criteria that one can apply. In the group on woodfuel demand, the advisability of stratification is defined in the first instance by the patterns of saturation and consumption. In the direct supply group, stratification is done by source and type of land cover or use. For the indirect supply group and providers, producers, transport operators and traders, volume of production or sales is used. Since these are variables that need to be known before the survey takes place, the relevant data can be obtained from secondary sources or from indicator variables, as described in Chapter 2.

When stratified sampling should be used:

• It is recommended for universes where it is supposed or known that distribution of the key variable(s) differs between readily identifiable sub-universes;

• Because of its low sampling efficiency, it is not recommended for small universes with fewer than 200 sampling units and variables showing normal distribution.

3.4.3 Sampling by clusters

Clusters are spatially compact groups of sampling units.

They are selected randomly and within each cluster all the sampling units are studied or subjected to further sampling.

When sampling by clusters should be used:

• when there is considerable difficulty in reaching every sampling unit in the universe, because of wide dispersion or physical barriers to access.

3.4.4 Systematic selection

This is not strictly speaking a type of sampling and is best considered as a frame for regular sample selection.

The first sampling unit is chosen at random, while the remainder are selected at regular intervals of unit, distance or time. Its theoretical limitation lies in the fact that only the first number is chosen at random and the remainder do not have the same probability of being included in the sample. Its advantage is that it facilitates location of the sampling units in areas of difficult access and permits visits to sampling units that are not included in the sampling frame.

When systematic selection should be used:

• Whenever it is not possible to identify every sampling unit within the sample frame, e.g. in large towns where lists of homes are not kept.

• When access to sampling units is difficult because of distance, lack of roads or difficult terrain, e.g. in forest inventories.

Combining several types of sampling

It is possible to combine different types of sampling within the same survey, depending on the characteristics of the sectors or branches concerned and the degree of acceptable trade-off between precision and cost of the exercise. For example, in a residential sector one may opt for a two-stage stratified sample in clusters, whereas for a small homogeneous and compact industrial branch, simple random sampling may be preferred.

3.5 Sample size

Sample size must be determined independently for each universe, according to three factors: the variability of the most important numerical variable, the level of confidence required and the acceptable level of error. This is summarized by the following formula:⁷

n_o = (s² . t², _v)/ e² (1) in terms of variance and absolute error

n_o = (cv² . t², _v)/ e² in terms of variation coefficient and relative error

where:

n_o = size of sample

s² = variance of the sample

t², _v= critical value of Student’s ‘t’ test with significance level and v degrees of freedom

e = acceptable error

cv = variation coefficient = standard deviation of the sample / sample mean

v = degrees of freedom= n – 1

Variance (s²) and variation coefficient (VC) indicate the degree of homogeneity of the variable under consideration in the sample. These are calculated - manually by calculator or with Excel – with the data from a preliminary sample or earlier survey.

Acceptable error (e) refers to the allowable difference between sample mean and mean of the universe. It is set in accordance with previous knowledge of the phenomenon under study, and it is advisable to keep it within 10-20% - which can also be expressed in absolute values with the units of measurement of the variable in question.

The critical value of t is obtained from tables in statistics books or from Excel, selecting first the level of significance () or its complement, the level of confidence (1- ). A level of confidence of 0.95, which is equivalent to a = 0.05 is enough for surveys of this kind. In addition, in order to define the degrees of freedom (v = n-1), a first assessment of the number (n) of cases in the sample is needed. These two values are the entry data for the tables. Subsequently, the sample size is specified by means of an iterative process, where the value of ‘n’ is obtained using Formula (1) to determine the value of ‘t’.

This formula shows that the number of elements making up the sample is directly proportional to the variance and value of t², and inversely proportional to the square of the error. The sample size will be large when: (a) the element under study is highly variable (high variance or variation coefficient); (b) the level of confidence sought is high; and/or (c) the acceptable error is low. Conversely, the sample size will be small if the phenomenon shows little variance, a low level of confidence is set, and a high level of error is accepted.

From this it is clear that the size of a sample does not depend on the size of the universe. Thus, starting with an equal level of confidence and error acceptance in a tropical rainforest covering the same surface area as a temperate pine forest, the sample size will be larger for the rainforest because of its greater heterogeneity in the wood stock variable in relation to the pine forest.

So far no consideration has been given to the size of the universe in determining sample size.

Nevertheless, for a small universe (fewer than 120 sampling units), it is necessary to correct the value of n_o obtained from Formula (1), by using Formula (2)⁸:

n = n_o/(1 + n_o/N) (2)

where:

n_o = sample size obtained from Formula (1)

N = size of universe

n = definitive size of sample

Annex III gives the calculated sample size for the estimation of fuelwood consumption in a residential sector for varying universe size and error margin and corrected for finite population. It applies for the variable “specific fuelwood consumption”, where, due to the abundance of case studies, the variation coefficient is known.

Variables to be used in calculating sample size

• To define sample size of any sector or branch of woodfuel demand, it is best to use the unit consumption variable.

• In the industrial, commercial and institutional sectors it is not always possible to find data on unit consumption, but one can use the volume of production per unit time, which is closely correlated with unit consumption.

• In the case of direct supply (from forest, plantation, etc.) the important variables may be stock or productivity, but the first is recommended as there is more secondary information and it is easier to measure in a preliminary sample. If there are no data on stock, basal area data (G) may be used.

• In sectors or branches of indirect supply (sawmills, carpentry workshops, etc.), volume of production per unit time must be used.

• For provision sectors: in the case of producers, it is best to use volume of woodfuel production; traders, volume of sales; and transport operators, transport capacity, all expressed per unit time.

The final decision on the size of the sample will depend on the agreed trade-off between desired accuracy and availability of monetary, human and time resources for conducting the field survey. It is recommended that sectors or branches having greater importance in woodfuel demand, supply and provision be given priority in the allocation of resources for field surveys so that estimations can be more accurate. In situations where it is not possible to realize the sample size determined by statistical calculation, it is essential to survey at least ten sample units per sector, branch or stratum, and to indicate the error in estimation, finding the e value of Formula (1).

⁶In statistics “universe” is also referred to as “population”.
⁷Formula used to determine the sample size needed to estimate the population mean; for hypothesis testing for differences between means and variances other formulas are available. Useful statistics reference books include Zar 1999, Cochran 1977, and Steel and Torrie 1988.
⁸ Also termed “correction for finite population”.