Study design type
Before reading this page, you should read:
When thinking about study design, it helps to understand what type of study is required to meet the monitoring objectives (Table 1).
Monitoring study designs are commonly classified as:
- descriptive (including baseline assessments) studies
- assessment of change studies
- improvement of system understanding studies (e.g. cause and effect).
Table 1 Common types of study design for water quality monitoring
|Feature||Descriptive||Assessment of change||System understanding|
|Purpose||To provide a snapshot of ecosystem condition; typically undertaken at specific points in time.||To track change in water quality over space or time against water quality targets, where:
||To understand key drivers and cause–effect relationships.|
|Typical analysis techniques||
A monitoring study may combine some or all these types, depending on the complexity of the monitoring program objective or the number of objectives. For example, long-term large-scale aquatic monitoring programs often have multiple objectives so all 3 types of study designs may be directly relevant.
How these study types relate to the 7 typical uses of the Water Quality Management Framework is presented in Table 2.
Table 2 Emphasis on common types of monitoring study designs for seven typical uses of the Water Quality Management Framework
|Study design type||Purpose||Developing a water quality management plan||Applying for a development approval||Assessing a waste discharge||Investigating an unexpected event||Assessing a remediation study||Conducting a baseline study||Implementing a broadscale monitoring program|
|Descriptive||Summarise ecosystem condition||Weak emphasis||Weak emphasis||–||Weak emphasis||Weak emphasis||Strong emphasis||Weak emphasis|
|Understand baseline sources of variation||Weak emphasis||Strong emphasis||–||–||–||Strong emphasis||Weak emphasis|
|Assessment of change||Set guideline values and assess condition||Strong emphasis||–||Strong emphasis||–||Weak emphasis||–||Strong emphasis|
|Analyse trend over space and time||Strong emphasis||–||Weak emphasis||Strong emphasis||Strong emphasis||–||Strong emphasis|
|System understanding||Identify drivers of and pressures on water quality||–||Weak emphasis||–||Strong emphasis||–||Weak emphasis||Weak emphasis|
Descriptive studies gather data to assess the state or condition of a system. We typically use them to quantify the spatial or temporal attributes of important variables within the system.
Examples of descriptive studies:
- reconnaissance surveys may use descriptive data to identify prevalence, abundance or other characteristics (e.g. extreme values) of variables in the study region or system, as well as variation in the data
- pilot studies often aim to quantify variables in the data if they seek to design parameter selection (where, what, when and how many) ahead of a more comprehensive study
- baseline studies may seek to quantify the background concentrations and condition at a particular point in time prior to a potential future disturbance or development to the system (e.g. for State of the Environment reporting).
In baseline studies, the aim is to collect sufficient relevant data about particular aspects of the system (e.g. water chemistry) to subsequently determine no disturbance has occurred. Sediment cores may be useful in identifying the magnitude and possibly the timing of some past disturbances.
Baseline designs are provided by some long-term water quality network programs that mainly monitor physical and chemical measurement variables. Such programs are conducted to detect or document any completely unanticipated changes in water quality. In these cases, it is best to decide which measurement variables to monitor, and the directions and sizes of changes or trends that would be important in those variables (Green 1979, ANZECC & ARMCANZ 2000).
When you know what changes to expect in the measurement variable, you can refine the study design to avoid two very common pitfalls:
- collecting insufficient data to reliably detect the trend or change, or
- collecting so much or such inappropriate data that either there is redundancy, or ecologically trivial changes may be detected through analysis.
Well-designed baseline studies — for which the likely nature of the disturbance can be anticipated — are a prerequisite of the strong designs for studies that assess change. If the focus of the study has been descriptive, it may not be possible to analyse the collected data to demonstrate causality. You should determine this need in advance, when defining the monitoring program objectives.
Assessment of change studies
When descriptive monitoring studies are repeated over time at the same locations, they can be used to assess change. Such studies require relatively detailed planning so that locations can be identified and resampled.
Data analyses for descriptive studies can range from comparatively easy calculations of trends and simple correlations, to more complex evaluations that qualify (and quantify) if there has been a change of measurable significance (refer to statistical methods in data analysis).
Central to many typical uses of the Water Quality Management Framework, assessment of change studies:
- help to track waste water discharges against guideline values
- underpin many broadscale monitoring programs or water quality management plans that aim to assess trends over space and time
- help to retrospectively investigate an unexpected event (e.g. algal blooms) or track improved water quality following remediation or management intervention.
In some cases, one of the objectives of monitoring is to evaluate the effects of a particular input or disturbance (which is possibly driven by the underlying conceptual model). If the timing and location of the disturbance are known, 3 categories of design are applicable (modified after Green 1979):
- before–after, control–impact (BACI) designs
- inference from change over time
- inference from change over space.
Before–after, control–impact (BACI) designs
At its simplest, suppose that before the potential effect occurs, two types of site can be identified: those that will be subjected to the disturbance and those that will not.
In the design, the same variable is monitored at both types of site before and after the disturbance to determine whether or not the pattern of behaviourover time at the disturbed sites changes relative to the control sites. After the disturbance starts, if the variable’s pattern of behaviour in the affected areas differs from its pattern of behaviour in the control areas, the differences are relatively unlikely to be due to chance.
BACI designs have evolved in response to the common observation that the values of measurement parameters often differ naturally between any two ostensibly identical sites. The strongest versions of these designs base their inferences on interaction terms in a statistical analysis rather than on simple comparisons of means between sites.
The logic of this procedure is best demonstrated first by discussion of Green’s (1979) formulation of a BACI design, which is now regarded as the weakest of all BACI designs, and then by an outline of subsequent improvements to the basic scheme.
Green (1979) proposed that environmental change would be detected if a measurement variable were sampled from two separate sites, once before and once after a disturbance. One of the sites would be the impact site (subjected to the disturbance and potentially affected by it). The other site would be the control site, which would be similar in all relevant respects to the impact site except that it would not be subject to the disturbance.
If the impact site were affected by the disturbance, then, Green argued, this would be apparent in a significant interaction term in an analysis of variance (ANOVA), where the factors in the analysis would be ‘time’ with 2 levels (‘before’ and ‘after’) and ‘site’ with 2 levels (‘control’ and ‘impact’). In graphical terms, the behaviour of the impact site would change relative to the behaviour of the control site after the disturbance (Figure 1). The values of the measurement variable would not have to be identical in the two sites before the disturbance because the inference would be based on the interaction term in the analysis.
Although Green’s (1979) scheme was an important conceptual advance for environmental scientists, the notion of basing the inference of change on single sampling events from single sites of each type was criticised. The inference would be based exclusively on subsampling within each combination of site type and time (Hurlbert 1984, Stewart-Oaten et al. 1986); another site-specific disturbance event, unrelated to that being monitored, could confound the conclusions from such a design.
Improvements to BACI designs
The preferred approach to circumvent this problem is to monitor more than one control site and to use multiple sampling events before and after the disturbance, as in the so-called multiple before–after control–impact (M-BACI) designs of Keough & Mapstone (1995) and Underwood (1996) (Figure 2).
Important choices need to be discussed when adopting an M-BACI design approach, including:
- locations of sites
- number of ‘before’ sampling events
- sampling effort required to model trends and dependencies through time.
Earlier literature about M-BACI designs often focuses on ANOVA but other statistical procedures, such as generalised linear models (GLMs), may be more appropriate and more flexible for handling data that are not normally distributed.
Consider the data requirements of statistical procedures before collecting any data. Good study designs should always have the subsequent analysis in mind but they really need to be driven by design principles and constructed robustly enough to allow different analysis options. Then you can respond better to the data that is ultimately collected (e.g. due to missing values or errors).
Variants of BACI designs have been proposed and fully discussed by Stewart-Oaten et al. (1986), Underwood (1991, 1992, 1994) and Keough & Mapstone (1995, 1997).
One commonly promoted variant applies to situations where there is a pair of sites — a single control and a single impact site — sampled on many occasions before and after the disturbance, called before–after control–impact paired differences (BACIP) design by Stewart-Oaten et al. (1986) (Figure 3).
For the inference to be strong from a BACIP design, the sites must be closely matched and some restrictive assumptions must be applied to the behaviour of the measurement variable at the two sites.
For example, if the measurement variable was the abundance of fish, then it would be unlikely that the population patterns and dynamics would be identical at the two sites.
The BACIP design approach should be used if only a single control site can be found because localised site-specific events unrelated to the disturbance of interest can become confounded with the effect of most interest.
Osenberg & Schmitt (1994) described salutary examples of the problems inherent in BACIP designs for a marine system. The term ‘randomised intervention analysis’ (RIA) has been applied to BACIP designs (Carpenter et al. 1989).
If detections of unusual substances only linked to human activity (e.g. specialised pesticides or toxicants, unusual isotopes) are made after a disturbance has occurred, then there may be sufficient — if not unequivocal — evidence to infer environmental impact without the need to collect any data from before the disturbance or from spatial control or reference sites. Very good evidence from auxiliary studies would need to be compiled to establish that concentrations of the substances below the detection level of the laboratory analysis were ecologically harmless.
M-BACI designs are a popular extension to BACI that allow multiple control sites and one or more impact sites. Typically there are several sampling times both before and after the impact. M-BACI designs are preferable to the single-control BACIP and BACI designs on account of their stronger characterisation of the natural background spatial variation. If a water quality variable does change for impact sites relative to the control sites after the impact starts, there is greater confidence that the difference is real.
Multivariate extensions of the BACI approach are considered by Faith et al. (1991, 1995) and Kedwards et al. (1999). These use multivariate methods to capture important features of the multivariate observations. An application of a MBACI(P) design in streams for annual biological monitoring (where the ‘M’ denotes multiple control sites and the ‘P’ denotes paired observations) is described in SSD (2013).
Inference from change over time
In this category of monitoring study designs, changes in water quality variables are detected by comparing data from one or more sites before and after the disturbance. Changes in water quality due to the disturbance are confounded with other changes that may occur over time.
In some circumstances, there are no suitable control areas so changes associated with a disturbance can only be inferred by comparing post-disturbance data with pre-disturbance data collected from the same site. With no spatial controls, there is a chance that an unrelated disturbance may have coincided with the disturbance that is being monitored or assessed.
Useful statistical procedures to analyse such data include (but are not restricted to):
- regression analyses
- trend analyses
- time–series analyses.
Sometimes the term ‘intervention analysis’ is used when time–series analysis is applied to a defined disturbance (e.g. Welsh & Stewart 1989).
These statistical procedures constitute a large and complicated area of applied statistics, the detail of which is beyond the scope of the Water Quality Guidelines. You should seek expert statistical advice when planning and analysing these data. Pay particular attention to the modelling of interdependencies between successive sampling events and the selection of appropriate sampling intervals for the disturbance being monitored or assessed (e.g. Millard et al. 1985).
Often, these statistical procedures require data from a large number of sampling events and are most applicable to measurements of physical and chemical variables (e.g. Welsh & Stewart 1989), although biological measurement variables have been used in such designs (e.g. fish ventilation by Thompson et al. 1982). For such long-term designs, pay particular attention to coping with irregular sampling intervals and the inevitable missing data because classical statistical techniques are sensitive to both these occurrences (e.g. Galpin & Basson 1990).
Modern water quality sensors and instruments can often provide monitoring data at high frequencies. For instance, optical sensors may readily produce measurements of nutrient concentrations, dissolved oxygen or suspended sediment concentrations at 1-minute intervals or less. The volume and nature of these data present different challenges in making inferences. It is always important to keep the objective firmly in mind (e.g. high-frequency data may be less useful if the objective is to assess trends over years).
Inference from change over space
In this category of study designs, we have either unaffected control sites or sites that have been affected to varying degrees by the disturbance but no valid comparable data collected before the disturbance.
In some cases, sites used for the comparison may be:
- upstream of the disturbed site
- on unaffected tributaries in river systems or estuaries
- in adjacent water bodies (e.g. wetlands, freshwater or saline lakes), or
- distributed along some disturbance gradient (e.g. increasing distance from a point source) as appropriate to the water bodies and issues of concern.
While the change over space is emphasised, there are often temporal aspects and changes to consider across the different sites.
Inferences should not be based solely on changes over time or changes over space unless there are no valid control sites or pre-disturbance data. Suitable spatial or temporal controls should always be used if they are available.
Often disturbances have already occurred or are alleged to have occurred, and scientists are required to judge the severity of impact or monitor the situation, either to assess whether or not recovery is occurring or to assess the success of remedial actions. Such studies have no useful pre-disturbance data so inferences about the disturbance rely on spatial patterns. These patterns are found either in contrasts between disturbed and undisturbed sites, or in sites chosen to represent a gradient of disturbance.
The disadvantage of this class of design is that the observed pattern may be confounded with other environmental changes that are not related to the disturbance being monitored or assessed.
In rivers, to monitor recovery or dilution of the measurement variable, you can select and sample one control site upstream of a disturbance and multiple sites downstream of the disturbance. This design, while is intuitively appealing, has two challenges:
- If sites are too close together, there may be intercorrelation between them that may mask changes.
- Any considerable natural variation in the measurement variable may not be captured in a single control site so differences between the control and disturbed sites may not be due solely to the disturbance itself.
Multiple control sites — if they can be found — provide a stronger basis for inferring impacts resulting from a disturbance.
Specialists should be asked to clarify whether sites can be chosen to satisfy the assumptions of the analysis, or whether sufficient data are being collected to identify any spatial intercorrelations between the sites, which would allow valid inferences to be drawn.
Sometimes, it is not possible to find control sites that are undisturbed but resemble the disturbed site in all other important respects. Instead, reference sites are identified that are deemed to represent standards. Then values of the chosen measurement variables at the disturbed sites are compared with values of the same parameters at the reference sites. This approach has been used for macroinvertebrate community structure in the Australian River Assessment System (AUSRIVAS) procedure.
Alternatively, a gradient of disturbance (values of measurement variables that increase or decrease with distance from a point or boundary) can be identified:
- within the area surrounding the disturbed site (e.g. seabed surrounding an oil rig), or
- across a number of sites across the landscape (e.g. series of wetlands along a salinity gradient).
For example, Fabricius et al. (2005) considered changes in algal, coral and fish assemblages along water quality gradients on the inshore Great Barrier Reef.
Gradients of disturbance are spatial patterns that are poorly described by classical statistical techniques, such as ANOVA and regression. Spatial statistical tools (e.g. Cressie 1993, Schabenberger & Gotway 2004) should be more appropriate for describing these spatial patterns. For example, concentrations of toxicants in sediments, or abundance of species of benthic animals or plants. Classical spatial statistical tools can require very large areas to be sampled, and often large numbers of sites (Rossi et al. 1992).
In gradient analyses (spatial analyses of data from sites that lie along a gradient of disturbance), some independent measure or surrogate of the disturbance (e.g. distance from source) is correlated with values of the biological measurement parameter.
When an aspect of community structure is being measured, a large range of multivariate techniques (e.g. ordination and clustering) can be used to relate the biological pattern to the spatial pattern. Dissimilarity measures are popular for expressing multivariate biological responses (e.g. Warwick & Clarke 1993, Legendre & Legendre 2012, Andersen et al. 2008).
In practice, most long-term monitoring program changes need to be assessed across both space and time, sometimes simultaneously (e.g. SSD 2013). The relative consideration of the spatial and temporal dimensions will depend on the sampling frequency and duration, and the number and arrangement of sites. These can be viewed as multiple time–series or multiple spatial slices.
Other studies that measure change
Investigative studies are made in response to a perception that some environmental change has occurred. Their goal is to determine the timing or nature of the change.
Examples of investigative studies include:
- studies carried out after unexpected fish kills
- research programs investigating the extent and severity of acid rain.
Most rehabilitation and restoration programs will not have reliable data collected over long time periods before the environmental impact. The main problem in setting decision-making criteria for such programs lies in defining appropriate targets for the selected indicators by which the success of a program can be judged.
If no pre-disturbance data exists, then the sampling program should include appropriate undisturbed sites that can act as control or reference sites for the disturbed area. This entails making assumptions about similarity in behaviour of the indicator over time in the affected area and the control areas in the absence of the disturbance. There is a danger that the reference sites will not represent a realistic target for the affected area (Wiens & Parker 1995). Furthermore, there are likely to be situations where there are no appropriate reference sites, and the target reference condition will need to be set by other means.
Setting targets in these situations is difficult and will often involve subjective judgements from expert panels or stakeholders.
Studies for system understanding
Some studies aim to find out more about a particular system. For example, to better understand the physical, chemical and biological processes that operate in aquatic ecosystems.
A deeper understanding may reveal relationships among variables operating in the system, enabling predictions to be made about the behaviour of the system in situations beyond existing data and experience.
Monitoring for systems understanding requires deeper thinking about indicators that represent the pressures, state and response of the system. Many different frameworks can consider indicators and variables to monitor in that light.
For example, the DPSIR modelling approach (EEA 2007) describes the interactions between society and the environment by accounting for driving forces–pressures–states–impacts–responses. After collecting monitoring data and information on all the different elements in the DPSIR chain, then potential interactions and causations between different aspects of the system can be considered. Hedge et al. (2013) proposed an integrated monitoring framework for the Great Barrier Reef World Heritage Area underpinned by a variant of DPSIR.
If the objective of a study is to establish cause-and-effect relationships, then the sampling program must be designed for this purpose from the start. You may need to run additional experimental studies to manipulate the system in a controlled manner and measure the system’s response. In this case, the sampling regime must be designed so that at least one of the potential outcomes is unequivocal. Manipulative experiments are routinely conducted in laboratories but in the field they can be expensive, and it may be impossible to adequately control all the confounding variables.
In studies of cause and effect, even the best experimental or survey design may be insufficient by itself. No design can completely defend against all unidentified confounding influences (Stewart-Oaten et al. 1986, Eberhardt & Thomas 1991, Underwood 1994). To establish cause and effect, you must assemble independent lines of evidence and circumvent the potential for inferential problems akin to those faced by epidemiologists.
Beyers (1998) attempted to combine epidemiological criteria (Hill 1965) with postulates for environmental toxicology (Suter 1993). Not all these criteria need to be met, but strength, consistency and specificity provide the strongest evidence for causation. Where a disturbance is chemical, indicators of exposure (e.g. contaminant concentrations in tissues) provide strong evidence for causation. Whether or not Beyers’s emphases are appropriate is likely to be debated as investigators try to formalise the ways in which they combine evidence in environmental studies.
The results from a study that measures change contribute to system understanding by demonstrating a link between a particular human activity and a specified effect in the system under consideration. But the results do not establish cause and effect. Some other unknown cause may have resulted in the effect. To establish a cause–effect relationship, some characteristic of the activity needs to be linked to the change observed.
Studies for improving system understanding are not always done to show cause and effect. For example, conceptual process models can outline a system understanding with respect to chemical and physical stressors, such as nutrients and toxicants. A study could monitor the changing significance of processes in these models, over space and time.
Anderson M, Gorley R & Clarke K 2008, Permanova+ for Primer: Guide to Software and Statistical Methods, Primer-E Ltd, Plymouth.
ANZECC & ARMCANZ 2000, Australian Guidelines for Water Quality Monitoring and Reporting, National Water Quality Management Strategy Paper No 7, Australian and New Zealand Environment and Conservation Council & Agriculture and Resource Management Council of Australia and New Zealand, Canberra.
Beyers DW 1998, Causal inference in environmental impact studies, Journal of the North American Benthological Society 17(3): 367–373.
Carpenter SR, Frost TM, Heisey D & Kratz TK 1989, Randomized intervention analysis and the interpretation of whole-ecosystem experiments, Ecology 70: 1142–1152.
Cressie NAC 1993, Statistics for Spatial Data, Revised Edition, John Wiley and Sons, New York.
Eberhardt LL & Thomas JM 1991, Designing environmental field studies, Ecological Monographs 61: 53–73.
EEA 2007, Halting the Loss of Biodiversity by 2010: Proposal for a first set of indicators to monitor progress in Europe, EEA Technical Report no. 11/2007, European Environment Agency, Copenhagen.
Fabricius K, De’ath G, McCook L, Turak E & Williams DM 2005, Changes in algal, coral and fish assemblages along water quality gradients on the inshore Great Barrier Reef, Marine Pollution Bulletin 51: 384–398.
Faith DP, Dostine PL & Humphrey CL 1995, Detection of mining impacts on aquatic macroinvertebrate communities: Results of a disturbance experiment and the design of a multivariate BACIP monitoring programme at Coronation Hill, Northern Territory, Australian Journal of Ecology 20: 167–180.
Faith DP, Humphrey CL & Dostine PL 1991, Statistical power and BACI designs in biological monitoring: comparative evaluation of measures of community dissimilarity based on benthic macroinvertebrate communities in Rockhole Mine Creek, Northern Territory, Australia, Australian Journal of Marine and Freshwater Research 42(5): 589–602.
Galpin JS & Basson B 1990, Some aspects of analysing irregularly spaced time dependent data, South African Journal of Science 86, 458–461.
Green RH 1979, Sampling Design and Statistical Methods for Environmental Biologists, John Wiley and Sons, New York and Toronto.
Hedge P, Molloy F, Sweatman H, Hayes K, Dambacher J, Chandler J, Gooch M, Chinn A, Bax N & Walshe T 2013, An Integrated Monitoring Framework for the Great Barrier Reef World Heritage Area, Department of the Environment, Canberra.
Hill AB 1965, The environment and disease: association or causation? in: Proceedings of the Royal Society of Medicine 58: 295–300.
Hurlbert SH 1984, Pseudoreplication and the design of ecological field experiments, Ecological Monographs 54(2): 187–211
Kedwards TJ, Maund SJ & Chapman PF 1999, Community-level analysis of ecotoxicological field studies: II. Replicated design studies, Environmental Toxicology and Chemistry 18(2): 158–166.
Keough MJ & Mapstone BD 1995, Protocols for Designing Marine Ecological Monitoring Programs, associated with BEK Mills, National Pulp Mills Research Program No. 11, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra.
Keough MJ & Mapstone BD 1997, Designing environmental monitoring for pulp mills in Australia, Water Science and Technology 35: 397–404.
Legendre P & Legendre LFJ 2012, Numerical Ecology, Volume 24, 3rd Edition, Elsevier, Amsterdam and Oxford.
Millard SP, Yearsley JR & Lettenmaier DP 1985, Space–time correlation and its effects on methods for detecting aquatic ecological change, Canadian Journal of Fisheries and Aquatic Sciences 42: 1391–1400.
Osenberg CW & Schmitt RJ 1994, Detecting human impacts in marine habitats, Ecological Applications 4: 1–2.
Rossi RE, Mulla DJ, Journel AG & Franz EH 1992, Geostatistical tools for modeling and interpreting ecological spatial dependence, Ecological Monographs 62: 277–314.
Schabenberger O & Gotway CA 2004, Statistical Methods for Spatial Data Analysis, Chapman and Hall/CRC.
SSD 2013, Environmental monitoring protocols to assess potential impacts from Ranger minesite on aquatic ecosystems: Macroinvertebrate community structure in streams, Internal Report 591, July 2013, Supervising Scientist Division, Darwin.
Stewart-Oaten A, Murdoch WW & Parker KR 1986, Environmental impact assessment: “pseudoreplication” in time? Ecology 67: 929–940.
Suter GW 1993, Retrospective risk assessment, in: GW Suter, LW Barnthouse, SM Bartell et al. (eds), Ecological Risk Assessment, Lewis Publishers, Ann Arbor.
Thompson KW, Deaton ML, Foutz RV, Cairns J Jr & Hendricks AC 1982, Application of time-series intervention analysis to fish ventilatory response data, Canadian Journal of Fisheries and Aquatic Sciences 39(3): 518–521.
Underwood AJ 1991, Beyond BACI: experimental designs for detecting human environmental impacts on temporal variation in natural populations, Australian Journal of Marine and Freshwater Research 42(5): 569–587.
Underwood AJ 1992, Beyond BACI: the detection of environmental impact on populations in the real, but variable, world, Journal of Experimental Marine Biology and Ecology 161(2), 145–178.
Underwood AJ 1994, On beyond BACI: sampling designs that might reliably detect environmental differences, Ecological Applications 4(1): 3–15.
Underwood AJ 1996, Environmental Design and Analysis in Marine Environmental Sampling (PDF, 925KB), IOC Manuals and Guides No. 34, UNESCO.
Warwick RM & Clarke KR 1993, Comparing the severity of disturbance: a meta-analysis of marine macrobenthic community data, Marine Ecology Progress Series (MEPS) 92: 221–231.
Welsh DR & Stewart DB 1989, Applications of intervention analysis to model the impact of drought and bushfires on water quality, Australian Journal of Marine and Freshwater Research 40(3): 241–257.
Wiens JA & Parker KR 1995, Analyzing the effects of accidental environmental impacts: approaches and assumptions, Ecological Applications 5: 1069–1083.