Large-Scale Forest Management Experiments: Advantages and Problems

0617-C1

Robert A. Monserud[1]


Abstract

In the past decade, several large, integrated forest management experiments have been initiated in the Pacific Northwest, partially in response to contentious resource management debates. Their goal is to use alternative silviculture treatments to enhance wildlife habitat, biodiversity, or the conservation of aquatic resources in a manner that is socially acceptable. These randomized-block experiments have one unusual feature: treatment units are commercially operational (13-20 ha). Because the large-scale context is designed into these experiments, results can be directly interpreted at the scale of management that produced the manipulation, eliminating a change-of-scale bias common in smaller management experiments. The considerable advantages of large, operational treatments are accompanied by their own problems, however. Because of the great expense (~US$106/block) and size (50-200 ha) of the experimental blocks, sample size is usually small. This means that statistical power (the probability of correctly rejecting the null hypothesis) is weak across blocks. With few replicates and high variability both within and among these large-scale treatments, investigators face the possibility that differences might only be detectable at untraditionally high significance levels. A second problem with large-scale experiments is pseudoreplication (lack of independence across replicates), which results in the strength of the experimental evidence being overstated. Meta-analysis (a joint hypothesis test across experiments) is proposed as an effective way to increase sample size - and therefore power - while accounting for the different degrees of variation across studies. A test of a common hypothesis about ecosystem management would greatly increase not only the power of the test but the return on investment from these rather expensive experiments.


1. Introduction: Large-scale field experiments

Even though experimentation is the heart of science, large-scale manipulative field experiments are unusual in forestry and ecology. Nevertheless, they are critically needed to improve environmental management and policies (Carpenter 1998). In such experiments, the treatment unit is large enough to include the relevant physical, chemical, and biotic context of the processes being studied. Because the large-scale context is designed into the experiment, results can be directly interpreted at the scale of management that produced the manipulation (Carpenter 1998). In contrast, comparative or retrospective studies examine context without direct manipulation or control of key factors, and small-scale experiments study mechanisms out of context (Carpenter 1998).

In all experiments the goal is to reduce the number of competing hypotheses and make correct population inferences by controlling important factors, using the classical methods of randomization, replication, and controls due to Fisher (1925). Manipulative experiments, if properly designed and executed, can identify a causal link between a manipulated variable and some measured response (Tilman 1989). Observational, correlative studies cannot, nor can chronosequences. Although the theoretical advantage of large-scale manipulative experiments is great, a plethora of problems awaits the experimenter at the large scale.

Hurlbert (1984) rattled a hornet’s nest by pointing out the prevalence of pseudoreplication in field ecology, and that it invalidated statistical inference. Replicates that are not independent are pseudoreplicates, which artificially increase sample size, bias hypothesis tests, and give a false sense of the power of the test of the experiment (Hurlbert 1984). In short, the strength of the experimental evidence is overstated (Hargrove and Pickering 1992). Although the existence of a spatial environmental gradient is a common cause, a more serious cause rests with the scientist: poor experimental design or execution, and improper statistical analysis (Hargrove and Pickering 1992). Repeated measurements over time also are pseudoreplicates because they are not independent, although sound statistical methods are available for accounting for the time dependency (e.g., Proc MIXED in SAS version 8.1; SAS Institute, Cary, NC).

Randomized block designs (Fisher 1925) are an excellent tool for countering spatial pseudoreplication. Blocking ensures that treatment units are interspersed rather than segregated, replication increases the precision of the estimate, randomization eliminates possible bias and improves the accuracy of the estimate, and controls provide the proper treatment difference.

As we move to broader scales with experimentation, control over individual variables becomes increasingly difficult, true replication becomes almost impossible, covarying factors confound the treatment effects, and the results of an experiment become difficult to interpret (Hobbs 1999; Wiens 1999). As the size of experimental units increases beyond small ecosystems to include entire landscapes, the logistics of experimental manipulation requires some separation over time, creating temporal pseudoreplication. The alternative is to closely space the treatments, which creates spatial pseudoreplication. Because of this dilemma, Hargrove and Pickering (1992) conclude that pseudoreplication is almost inevitable in landscape ecology and in experimentation at broad scales. Because of the difficulty of controlling factors over a large scale, confounding factors threaten to swamp out treatment response even in well-designed experiments.

Long-term study is unusual in science (Taylor 1989). If the hypothesis involves the new state of the system, then the experimenter must wait until the transient dynamics die out and the system stabilizes (Tilman 1989). Tilman (1989) concludes that most manipulative field experiments in ecology are much too short, with less than 2% lasting at least 5 field seasons. In systems with slow dynamics such as forests, silvicultural manipulations can trigger a long-term chain of successional dynamics, the transient response to the manipulation. Results of the experiment are strongly dependent on time, and can change or even contradict earlier results as the successional dynamics progress. Thus, a well-replicated forestry experiment could register a statistically significant short-term response to a manipulation that contradicts the eventually obtained, and equally significant, long-term response (Tilman 1989). Franklin (1989) considers features of forest systems that call for long-term studies: slow dynamics, rare events, episodic phenomena, high variability, subtle processes, and complex phenomena. Time is the answer that solves many of these problems (Franklin 1989).

2. Large-scale management experiments in the Pacific Northwest

Monserud (2002) examined several new large-scale, multidisciplinary silvicultural experiments from the Pacific Northwest:

ATC: Alternatives to Clearcutting (AK);

MASS: Montane Alternative Silvicultural Systems (BC).

OHDS: Olympic Habitat Development Study (WA);

FES: Forest Ecosystem Study (WA);

CFS: Washington DNR Capitol Forest Study (WA);

DEMO: Demonstration of Ecosystem Management (WA, OR);

DMS: Density Management Study (OR);

All are implementing silvicultural alternatives to the widely used plantation management of the previous half-century. Some are attempting to hasten the approach to old-growth structure and composition. All are multidisciplinary, examining some forest value other than wood production. All use replicated randomized block designs with controls (Fisher 1925).

One of the most unusual and important aspects of this collection of silvicultural experiments is that the treatment units are large enough to be commercially operational (size range: 6-32 ha, with most between 13-20 ha). Using large, operational units as treatment areas has several important advantages over small research plots:

(1) Easier to generalize management results to the watershed and landscape because the spatial variation is accurately represented by the experimental units (Carpenter 1998),

(2) Visual acceptance can be determined by direct observation of the treatments on the landscape,

(3) Larger units allow for covering the home range of important animals (e.g., northern flying squirrel) than small research plots, and

(4) Demonstration that the management treatments are both economical and feasible to implement because-by definition-they are operational.

If treatment units are to be commercially operational, the scientist must design treatments that can be easily understood and applied by typical forest operators. This operational requirement generally precludes uniform research plots, and is likely to introduce considerable variation among study sites and treatment replications. Combining the operational size requirement within the context of a well-designed experiment should ensure that sound statistical inferences can be drawn from the results (Carpenter 1998), and that the results can provide a scientific basis for new forest management systems.

In the past, silvicultural experiments were rather small and rarely operational. For example, the 1925 Wind River Douglas-fir plantings were made in 1.1 ha blocks (Reukema 1979), and the Levels-of-Growing-Stock plots were 0.08 ha (Curtis and Marshall 1986). Because small plots are more uniform, experiments that use them should have a smaller variance than the large operational units discussed here. Although this small variance is useful for detecting treatment differences, it may not generalize to the forest landscape with the same precision (Hobbs 1999). Basically, this is a bias due to a change of scale. Experiments that use large, operational plots accept the tradeoff of a larger variance (both within-block and among-block) to gain greater assurance that the experimental results will generalize better to the population as a whole. Results can be directly interpreted at the scale of management that produced the manipulation because the large-scale context is designed into the experiment (Carpenter 1998).

The power of a test is the probability of correctly rejecting the null hypothesis (Cohen 1988). Sample size is a crucial determinant of power in large-scale field experiments. Only the DMS has a sample size (18 blocks) large enough to have strong power for generalizing across a geographic range of site conditions. However, some of these studies are designed to have strong power to test specific hypotheses. The DEMO study, for example, has 36 randomized replicates across 6 blocks. Because each replicate (13 ha) is larger than the home range of the vertebrates that they are investigating (Halpern and Raphael 1999), power should be high for testing the effect of silvicultural treatments on the target wildlife populations. Because the replicates are spread out over 6 widely dispersed blocks, pseudoreplication should not be a serious problem.

When blocks are located in a relatively small geographic area, potential problems with pseudoreplication are a concern. Consider the MASS, CFS, and FES studies, which use randomized block designs in one forest each. The question of independence of blocks should be carefully addressed when stating probability levels for statistical tests.

The ability to detect significant differences depends not only on the number of replicates but also on the variability of the population of interest. Harrington and Carey (1997) expected that 8-10 replications in their OHDS study would allow them to detect treatment-induced differences of 20% or more in small animal populations. Use of fewer replications would require larger differences between treatments to achieve statistical significance and thus confidence in the results.

The level we accept as significant also affects our ability to detect differences. In the face of few replicates and high variability (both within and among treatments), the field scientist must consider the possibility that differences might only be detectable at untraditionally high significance levels, such as "=0.2 or more.

It is usually not possible to randomly choose block locations (sites) in forestry. It is more desirable and realistic to use stratification to locate blocks across a range of some important geographic, physiographic, or other site characteristic. The key to strong statistical inference is to randomly assign all treatments, including controls, and have sufficient replication (blocks) to allow for detecting significant differences among treatments (Fisher 1925). The randomization of treatments is a fundamental attribute of all seven designs reviewed here. All studies have at least three full replications of each treatment, and all but MASS have randomized controls for measuring treatment differences.

Cooper et al. (2000) found that the power of most large-scale manipulative experiments is low, primarily because of the expense and difficulty of establishing an experiment with a large number of replicates. They recommend standardizing methods and designs among studies so that a meta-analysis (Fernandez-Duque and Valeggia 1994) can be performed on common research hypotheses. The meta-analysis effectively increases sample size-and therefore power-while accounting for the different degrees of variation across studies. All seven of these studies are examining the effect of silvicultural treatments on both wildlife habitat and biodiversity, with an emphasis on speeding up the approach to old-growth characteristics and retaining biological legacies. This is the most promising area for a meta-analysis.

A major challenge with large experimental plots is their enormous cost. For example, expenditures since plot establishment are approximately US$6 million (M) for ATC, $12M for DEMO, $1M for OHDS, and $4.5M for FES. One consequence is that scientists must consider a much smaller number of promising contrasts, potential treatments, and number of blocks. Jeffers (1988) emphasizes that the more costly the research, the more important it is that the experimental design should be statistically sound and efficient, and that the results obtained are capable of valid interpretation.

The long-term nature of experiments such as these (slow processes with strong transient dynamics) is a long-standing problem (Walters and Holling 1990; Carpenter 1998). Investigators are faced with institutional and academic demands for short-term results that not only are publishable but also can justify the large investments (Franklin 1989). All of these studies are recent (begun in the 1990's) and have not been completed. Only preliminary reports are available (e.g., Carey et al. 1999; Beese and Bryant 1999; Halpern and Raphael 1999; McClellan et al. 2000).

When the experimental blocks are large enough to be commercially operational (e.g., 50-200 ha), the scientist is faced with the realities of the timber-sale process. The possibility of lawsuits and appeals is perhaps one of the most serious institutional challenges, especially on federal land. An experimental block that is held up in appeal prevents the full experimental design from being implemented, reducing the power of key hypothesis tests. The OHDS and DEMO studies both have two blocks that have not been implemented for this reason.

The concept of randomization takes on serious consequences with large experimental units. The very nature of randomization means that a clearcut or a treatment unit with heavy removal could fall anywhere in the block, and some locations might be far more sensitive or visible than others to the public. Concerns over randomization of treatments were real issues for at least two of these large-scale experiments (ATC, DEMO), and contributed to the clearcut in MASS not being randomized. A related issue with randomization is the risk of wind damage, especially on coastal or exposed stands. A well-designed management experiment should not put any piece of land in a position of unacceptable risk to damage simply because of randomization of treatments. The solution is to find suitable watersheds for experimentation so that no treatment units are at high risk, regardless of the outcome of randomization.

Literature Cited

Beese, W.J. and Bryant, A.A. 1999. Effect of alternative silvicultural systems on vegetation and bird communities in coastal montane forests of British Columbia, Canada. Forest Ecology and Management 115 (1999) 231-242.

Carey, A.B., Thysell, D.R., Brodie, A.W. 1999. The Forest Ecosystem Study: background, rationale, implementation, baseline conditions, and silvicultural assessment. Gen. Tech. Rep. PNW-GTR-457. Portland, OR: USDA Forest Service, Pacific Northwest Research Station. 129 p.

Carpenter, S.R. 1998. The need for large-scale experiments to assess and predict the response of ecosystems to perturbation. Pp. 287-312, In: Successes, Limitations, and Frontiers in Ecosystem Science (M. L. Pace and P. M. Groffman, eds.). Springer-Verlag, New York.

Cohen, J. 1988. Statistical power analysis for the behavioral sciences, 2nd ed. L. Erlbaum Assoc., Hillsdale, NJ.

Cooper, R.J., G.A. Gale, and L.A. Brennan. 2000. Answering questions in management and research using large-scale manipulative experiments. Pp. 220-224. In: Bonney et al. (eds.), Strategies for bird conservation: the Partners in Flight planning process. USDA Forest Service Proceedings RMRS-P-16.

Curtis, R.O. and Marshall, D.D. 1986. Levels-of-growing-stock cooperative study in Douglas-fir: Report No. 8-The LOGS study: twenty-year results. Res. Pap. PNW-356. Portland, OR: USDA Forest Service, Pacific Northwest Research Station. 113 p.

Fernandez-Duque, E. and C. Valeggia. 1994. Meta-analysis: a valuable tool in conservation research. Cons. Biol. 8:555-561.

Fisher, R.A. 1925. Statistical methods for research workers. Oliver and Boyd, London. 318 p.

Franklin, J.F. 1989. Importance and justification of long-term studies in ecology. In: Long-term studies in ecology: approaches and alternatives (G.E. Likens, ed.). Springer-Verlag, New York. Pp. 3-19.

Halpern, C.B. and Raphael, M.G. (eds.). 1999. Retention harvests in Northwestern forest ecosystems: the Demonstration of Ecosystem Management Options (DEMO) study. NW Science #73 (Spec. Iss.). 125 pp.

Hargrove, WW, and J. Pickering. 1992. Pseudoreplication: a sine qua non for regional ecology. Landscape Ecology 6(4):251-258.

Harrington, C.A. and Carey, A.B. 1997. The Olympic Habitat Development Study: conceptual study plan. Unpublished ms. on file at USDA Forest Service, Pacific Northwest Research Station, Olympia, WA. 38 pp.

Hobbs, R.J. 1999. Clark Kent or Superman: where is the phone booth for landscape ecology? Pp. 11-23 in: Landscape Ecological Analysis: Issues and Applications (J.M. Klopatek and R.H. Gardner, eds.). Springer-Verlag, New York.

Hurlbert, S.H. 1984. Pseudoreplication and the design of ecological field experiments. Ecol. Monogr. 54(2):187-211.

Jeffers, J.N.R. 1988. Statistical and mathematical approaches to issues of scales in ecology. In: SCOPE 35: Scales and global change: Spatial And temporal variability in biospheric and geospheric processes (T. Rosswall, R.G. Woodmansee & P.G. Risser eds.)., Wiley, U.K. 376 pp.

McClellan, M.H.; Swanston, D.N.; Hennon, P.E.; Deal, R.L.; De Santo, T.L.; Wipfli, M.S. 2000. Alternatives to clearcutting in the old growth forests of southeast Alaska: study plan and establishment report. Gen. Tach. Rep. PNW-GTR-494. Portland, OR: USDA Forest Service, Pacific Northwest Research Station. 40 p.

Monserud, R.A. 2002. Large-scale management experiments in the moist maritime forests of the Pacific Northwest. Landscape and Urban Planning 59(3): 159-180.

Reukema, D.I. 1979. Fifty-year development of' Douglas-Fir stands planted at various spacings. USDA For. Serv. Res. Pap. PNW-253. Pacific Northwest Forest and Range Experiment Station, Portland, Oregon. 21 pp.

Taylor, L.R. 1989. Objective and experiment in long-term research. In: Long-term studies in ecology: approaches and alternatives (G.E. Likens, ed.). Springer-Verlag, New York. Pp. 20-70.

Tilman, D. 1989. Ecological experimentation: strengths and conceptual problems. In: Long-term studies in ecology: approaches and alternatives (G.E. Likens, ed.). Springer-Verlag, New York. Pp. 136-157.

Walters, C.J., C.S. Holling. 1990. Large-scale management experiments and learning by doing. Ecology 71(6): 2060-2068.

Wiens, J.A. 1999. The science and practice of landscape ecology. Pp. 371-383 in Landscape Ecological Analysis: Issues and Applications (J.M. Klopatek and R.H. Gardner, eds.). Springer-Verlag, New York.


[1] Pacific Northwest Research Station, USDA Forest Service, P.O. Box 3890, Portland, OR 97208-3890 USA.
Fax: +1-503-808-2020; Email: [email protected]