Weight of evidence

Methods and technical guidance for reaching the correct or valid conclusion in water/sediment quality assessments, together with management frameworks that support such evaluations, have steadily improved since the ‘integrated water quality assessment’ concept in the ANZECC & ARMCANZ (2000) guidelines.

Our methodology to incorporate weight of evidence in water/sediment quality assessments is consistent with recent moves internationally (e.g. USEPA 2016, Suter et al 2017).

Integrated environmental assessment models reduce risk of making poor decisions

Government jurisdictions in Australia and New Zealand are developing environmental indicator sets according to issues and the key elements of the conceptual contaminant pathway that depict causal links.

We have adapted the PSR conceptual model used by the Queensland Government (DNRM 2013) and applied it to water/sediment quality assessments in the Water Quality Guidelines; a minor refinement is replacement of ‘response’ (R) with ‘ecosystem receptor’ (ER).

Adoption of the PSER model, with information from lines of evidence drawn from and integrated across each of the pressures, stressors and receptors, reduces the risk of making a wrong decision regarding the cause-and-effect linkages for a particular issue.

Weight-of-evidence evaluations minimise subjective judgements

Evaluating multiple lines of evidence requires a weight-of-evidence assessment.

Assessments may be qualitative, semi-quantitative or quantitative, which entail successively decreasing levels of best professional judgement. Ideally, an assessment should involve minimal best professional judgement, such that independent assessors of the lines of evidence will reach the same conclusions.

Be mindful that an overly complex assessment could be costly and require larger data inputs, which may be impractical to implement.

Criteria-guided judgement and logic tables useful for assessments

Suter & Cormier (2011) provided a framework for performing weight-of-evidence assessments and reviewed a number of different methods available for weighing evidence, including:

criteria-guided judgement (recommended) —evidence is strengthened if it exhibits these criteria: strength of association; temporality; dose-response; biological plausibility; consistency; coherency; specificity; experimental evidence; analogy. The best-known example of this method is the criteria for causation first developed by Hill (1965) for human epidemiology and now applied widely for environmental assessments.
independent applicability — evidence from any line of evidence (exceedance of a water quality objective, toxicity or observed field biological impact) may be sufficient grounds for concluding impaired water/sediment quality. While the approach is inherently conservative, it should not be adopted as a rationale for collection of data from just one line of evidence. Reliance on a single line of evidence without adequate baseline data for other lines of evidence precludes any ability to properly assess water/sediment quality, including identification of cause and extent of possible impact.
logic tables (recommended) — results of an assessment for each of (typically) 3 or more lines of evidence are evaluated against a set of standard conclusions derived from all possible combinations of outcomes. The sediment quality triad approach to determining pollution-induced degradation, developed by Chapman (1990), where evidence is acquired from chemistry, toxicity tests and biological surveys.

Criteria-guided judgement and logic tables are commonly applied in the literature to condition, causal or predictive assessments of ecosystems. They are the 2 approaches recommended in the Water Quality Guidelines.

Logic tables are more readily adapted to the planning phase of water/sediment quality assessments. They can be used to guide the selection of indicators for measurement, not just for the evaluation of evidence, so they have broader applicability across key steps in the Water Quality Management Framework (discussed later).

Logic tables can be improved when the matrix of standard conclusions is adapted for specific cases, which Suter & Cormier (2011) termed ‘case-specific logic’. For these reasons, logic tables are the preferred method for demonstrating weight of evidence in a case-specific manner across 7 typical uses in the Water Quality Guidelines.

If necessary, you can derive even stronger conclusions by applying both logic tables and epidemiological criteria to water/sediment quality assessments, and we provide additional guidance on this approach.

There have been other advances made to the ‘integrated water quality assessment’ concept in the ANZECC & ARMCANZ (2000) guidelines and how we are applying weight of evidence.

Decision trees developed in the ANZECC & ARMCANZ (2000) guidelines for toxicants in waters and sediments were largely based on testing against guideline values and invoked additional investigations if the measured concentrations exceeded the respective guideline values. Those further studies could either:

reach the measured (or modelled) bioavailable forms of the toxicants (assuming that the guideline values were conservative and were true triggers for further investigation), or
if necessary, include site-specific investigations involving toxicity or biodiversity assessment.

The focus on chemical assessments until a problem is encountered presents risks because the information required from the subsequent receptor lines of evidence may not be available through lack of baseline data. Such decision trees most usefully serve the role of reaching the bioavailable fraction within the chemical and physical line of evidence.

The need to anticipate the requirement for more information from additional lines of evidence in future assessments means that the weight-of-evidence process developed for multiple lines of evidence in the Water Quality Guidelines is more than a traditional assessment of water/sediment quality.

Weight-of-evidence evaluations are typically made after data have been gathered and are being assessed for guideline value exceedances (at Step 6 in the Water Quality Management Framework).

We provide you with advice on the desired lines of evidence to consider at the outset of an investigation. This advice is provided by highlighting potential outcomes in the various combinations of lines of evidence selected. This helps you to anticipate possible future assessments, either written into licensing and reporting agreements or arising from unexpected events.

Sometimes the need for additional lines of evidence may not be immediately apparent until some measurements have been made (a decision made at Step 7 in the framework).

Making early decisions about which lines of evidence to measure ensures:

adequate baseline data are available to properly detect and assess possible future effects
management goals for aquatic ecosystems, which are often couched in terms of protection of biodiversity, have suitable surrogate indicators to assess possible change.

Real-world water and sediment quality issues will often focus on:

chemical exceedances
environmental confounding
unknown contaminant mixtures
the need for timely results
improved inference (often poor with single lines of evidence)
efficient targeting of sites for remediation.

These issues are best addressed through weight-of-evidence evaluations and this is why weight of evidence is the central platform for water/sediment quality assessments in the Water Quality Guidelines.

The strongest conclusions arising from a water/sediment quality assessment will be met when lines of evidence are selected from the PSER causal pathway (Figure 1).

Pressures

Pressures are external activities that affect water quality. The consideration of pressure lines of evidence, such as land-based activities, are important but are out of scope for this discussion.

Stressors

For ecosystem protection, the stressors may be:

chemical (e.g. toxicants where a guideline value is based on assessments of toxicity)
physical and chemical (stressors that are either directly or indirectly toxic where guideline values are determined from background or reference-site data), or
associated with other causes (e.g. flow).

Chemical and physical

Chemical and physical line of evidence comprise:

chemical indicators — constitute dissolved contaminant concentrations for waters or particulate or pore water concentrations for sediments. These are fundamental measures for comparison against default guideline values (DGVs) or site-specific guideline values. Ideally, they should be refined to consider chemical speciation and the bioavailable fraction of the measured concentrations.
physical indicators — associated with water quality and include turbidity and water temperature. A number of chemical and physical properties of waters and sediment are also important as co-stressors that modify bioavailability.

Non–water quality

Non–water quality line of evidence measures might affect the ecosystem receptors, particularly biodiversity, such as:

stream flow
invasive species
catchment alteration.

Measurement programs for these non–water quality related stressors should be conducted if there is any chance of confounding (Suter & Cormier 2013) in the interpretation of water quality effects.

Ecosystem receptors

Measures of ecosystem receptors include biodiversity, toxicity, bioaccumulation and biomarkers. Biodiversity — the key line of evidence for ecosystem receptors — is often linked directly to management goals. Toxicity provides a supportive line of evidence for effects on ecosystems, and, where impacts are observed, toxicity, bioaccumulation, and to a lesser extent biomarkers, provide supportive lines of evidence for inference and causation. These collective lines of evidence span a range of levels of biological organisation, encompass indicators of exposure and of effect and, as a result, have either indirect or direct ecological relevance, as shown in Figure 2.

**Figure 2 Relationship of ecosystem receptor lines of evidence to level of biological organisation, exposure/effect detection and ecological relevance**

Biodiversity

Alterations to biodiversity (populations and communities of organisms) reflect direct ecosystem-level effects that may manifest in both the short and longer term, depending on the biotic assemblage examined. In future, effects on populations and communities will be more readily and efficiently assessed using improved ecogenomic techniques.

Toxicity

Toxicity is an important measure of an organism’s response, with acute (short-term) or chronic (long-term) endpoints measured either in the laboratory or in semi-field settings (e.g. mesocosms). The toxicity line of evidence is regarded as having direct ecological relevance because toxicity endpoints such as mortality, growth and reproduction represent measures of the health of whole organisms that link directly to population level effects. Semi-field (e.g. mesocosm) studies can also indicate effects at the community and ecosystem level.

Linking cause and effect is important when evaluating different lines of evidence. Results from single-toxicant toxicity testing can provide that direct link. For more complex waters (mixtures, effluents) direct toxicity assessment (DTA) can be coupled with toxicity identification and evaluation (TIE) can be important in helping to identify a cause where an adverse response is measured. For these reasons, toxicity testing is useful in helping unravel the confounding that can be associated with biodiversity assessments.

Bioaccumulation and biomarkers

Bioaccumulation and biomarkers have been grouped here largely for convenience; however, there is a common link. Bioaccumulation and biomarkers are related to responses that occur at the sub-organismal level and mostly represent indicators of exposure to toxicants (although indicators of effect also exist); consequently, they are indirectly related to ecological relevance.

Bioaccumulation provides an indication of the extent to which contaminants are bioavailable and taken up by organisms, and is best quantified by comparison with controls. The physiological effects of bioaccumulation come from other lines of evidence (toxicity, biodiversity).

Biomarkers can be used as indicators of exposure or cellular/sub-cellular effects. The ecological relevance of biomarkers needs to be considered in relation to the significance of the measured effects and how reversible they are.

More detailed guidance on ecosystem receptor indicators and their selection can be found in ANZG (2021).

The weight-of-evidence process illustrated in Figure 1 depicts a chain of steps from which indicators are selected and then evaluated for their suitability, to help you determine whether or not you have achieved a water/sediment quality objective.

Select indicators for measurement (lines of evidence)

The 2 key components in the indicator selection hierarchy are:

causal pathway — comprises the key elements of the PSER conceptual contaminant pathway that depict causal links: pressure, stressor, ecosystem receptor
line of evidence — contained in each of the elements of the causal pathway:
each pressure represents a single line of evidence, although a water or sediment quality issue may involve more than one pressure (as depicted by Pressure x, Pressure y in Figure 1)
either water/sediment–quality related stressor (e.g. chemical and physical) or non–water quality stressor lines of evidence.
receptor lines of evidence, including biodiversity, toxicity, bioaccumulation and biomarkers.

What’s not illustrated in Figure 1 is that each line of evidence contains broad indicator types, including indicators, from which specific parameters are selected for measurement (Table 1), where:

indicators are parameters that can be used to provide a measure of a pressure, stressor or ecosystem condition response
parameters are measurable or quantifiable characteristics.

Table 1 Examples of indicator types, indicators and parameters relevant to each line of evidence

Line of evidence	Indicator type	Indicator	Parameter
Pressure	Cropping	Pesticide use	Tonnes of insecticide applied per hectare per year
Stressor (chemical and physical)	Toxicants, or physical and chemical stressors (2 fixed types recognised)	Ammonia or dissolved oxygen	Total ammonia or dissolved oxygen in percent saturation
Stressor (non–water quality)	Altered flow or sedimentation	Stream discharge or sediment movement	Total stream volume per unit time, or sediment particle size
Ecosystem receptor (biodiversity)	Biotic assemblages or individual species	Benthic macroinvertebrate communities or species population size	Macroinvertebrate community structure or total species abundance
Ecosystem receptor (toxicity)	In situ or laboratory toxicity	Chronic toxicity to fish	14-day fish growth measurement
Ecosystem receptor (bioaccumulation & biomarkers)	Bioaccumulation, biomarkers of exposure or biomarkers of effect	Metal body burden, genetic biomarker or histopathology	Copper tissue concentration in mg/kg, DNA strand breaks or histological alterations

Collect and analyse the evidence

After indicators and respective parameters are selected for each line-of-evidence investigation, a measurement program ensues, which culminates in analyses to determine whether or not the measured responses indicate a water-quality related change.

Evaluate the results

The mix of results from the different lines of evidence assessed in this way is then evaluated against a set of standard conclusions derived from all possible combinations of outcomes — or against other criteria — to draw conclusions about water-quality related change and, if confirmed, the possible cause and extent of impact.

Work through the Water Quality Management Framework

Weight of evidence is applied at 3 key steps in the Water Quality Management Framework (as well as Steps 2 and 7):

Step 1: Examine current understanding — identify critical PSER causal pathway elements during initial conceptual modelling
Step 3: Define relevant indicators — select lines of evidence and associated indicators (key indicators from the lines of evidence representing PSER are identified to best infer impact)
Step 6: Assess if draft water/sediment quality objectives are met — compile and evaluate lines of evidence through a weight-of-evidence evaluation (evidence arising from the multiple lines of evidence selected at Step 3 are evaluated to draw conclusions about ambient water/sediment quality).

We explain how to use weight of evidence in each of these 3 steps, focusing on the protection of aquatic ecosystems. This approach is also applicable to other community values.

Complete initial steps before defining relevant indicators

For any water/sediment quality assessment, the initial step is to document the current understanding for a particular issue. This involves the identification of pressures and their associated stressors (water-quality and non–water quality related), and likely ecosystem (biological) receptors and their responses (at Step 1 of the framework). This is best achieved by creating a conceptual model that depicts the causal links along the conceptual contaminant pathway.

When defining management aims (at Step 2 of the framework), community values, management goals and levels of protection are established.

Based on your actions at Steps 1 and 2, the relevant lines of evidence are determined in Step 3 of the framework (and revisited in Step 7 of the framework).

Consider which lines of evidence to select

Your choice of lines of evidence — as determined by the PSER causal pathway — is not constrained but it should be chosen to suit the particular issue and its understanding, as captured in the conceptual model.

A single line of evidence cannot address all the desired outcomes from a weight-of-evidence evaluation, such as detecting and determining the extent of impacts, and determining the likely cause.

We have summarised some benefits and weaknesses of different lines of evidence in Table 2.

Table 2 Considerations in the selection of pressure, stressor and ecosystem receptor lines of evidence

Causal pathway element	Lines of evidence	Considerations (in isolation of other lines of evidence)
Pressure	Measures of the pressures (or surrogates) responsible may correlate with such ‘events’ and identify priorities for management	Must be linked to measurement of stressor and community value receptors
Stressor	Chemical and physical stressor: Direct measure of potential cause; significant exceedance could lead directly to management action (e.g. remediation) Sediment chemistry may record past events (archival value)	Uncommonly representative of the management goals (unless ‘no change’) Possible presence of multiple (unmeasured) toxicants responsible Toxicant(s) or nutrients in waters may be transient or taken up in the system; less so for sediments Observations may be unrelated to toxicants
Stressor	Non–water quality related stressor: Eliminate confounding; identify other factors potentially responsible for observations	None
Ecosystem receptor	Biodiversity: Very often directly linked to management goals Magnitude and extent of impact Macrobenthos studies may capture the effects of transient toxicants (pulsed releases) that chemistry and toxicity testing may miss; respond to gradients Some taxa with specific water quality responses (e.g. nutrients/algae) (diagnostic)	Impacts on biodiversity include many non–water quality related stressors
	Toxicity: Identify a water quality problem (and thereby address biodiversity confounding) Toxicity identification and evaluation (TIE), to identify a cause of toxicity response (diagnostic) Identify the amount of toxicity in the event of a spill or other incident	Toxicant(s) in waters may be transient in the system; less so for sediments
	Bioaccumulation & biomarkers: Measure of exposure; may capture transient toxicants (pulsed releases) that chemistry and toxicity testing miss; ecosystem, human health and other community values	Suitable organisms may not be available; stressor may not bioaccumulate

Desirable outcomes and associated lines of evidence from a weight-of-evidence evaluation include:

assessing achievement of management goals, magnitude and extent of impact — biodiversity
capturing the effects of transient toxicants (pulsed releases that chemical and physical stressors and toxicity may miss) — macrobenthos studies (biodiversity), bioaccumulation, biomarkers and, in some cases, sediment chemistry
associating an effect with a water quality cause and possibly identifying the stressor or its type (usually not possible with sole measurement of biodiversity) — toxicity and TIE, combined with chemistry.

These generic benefits and limitations reinforce the need to include a number of lines of evidence in a water quality monitoring program to properly assess potential effects.

Rate the quality of evidence

Our approach to the weight-of-evidence process encourages you to consider the selection of the desired lines of evidence at the outset of an investigation. This is why the process is well suited to all types of uses and situations, including those programs in the planning and baseline data collection phases (refer to our examples).

When selecting suitable lines of evidence, we recommend that you construct a quality of evidence table that will help you to determine the number of lines of evidence needed to satisfactorily reach a conclusion.

Quality of evidence will increase as you add lines of evidence. The examples we provide later use a qualitatative rating. In general, selection of just one line of evidence has potential to generate ‘low’ quality evidence in a water/sediment quality assessment. Inclusion of more lines of evidence improves the quality from moderate to high. However, in many instances, not all lines of evidence will be required.

Your choice might well be a balance between cost and the level of quality associated with the suite of lines of evidence.

Nevertheless, the risks in not acquiring sufficient baseline information for key stressors and ecosystem receptors (e.g. biodiversity) to assess potential effects were discussed earlier. If there are multiple pressures, then you may also need to choose a set of lines of evidence to address each pressure and attribute the observed responses among them.

Issues will generally have associated pressure(s). In some instances, the pressure may be the issue (e.g. acid sulfate soils) or it may relate to a number of actual or potential issues (e.g. multiple agricultural developments in a catchment).

For any matrix combination, the power of detection in field measurement programs will be increased by including more reference or control sites and/or more monitoring sites placed along any putative or probable disturbance gradients. Read about factors dictating lines of evidence and indicator selection.

We provide examples of the quality of evidence associated with selecting different combinations of lines for the 7 typical uses of the Water Quality Management Framework.

Quality of evidence tables assume that:

matrix outcomes are based on adequate field and experimental designs
processing, analyses and reporting of water quality samples (including biological samples) and data are undertaken to high standards and in a timely manner.

Constructing and populating a quality of evidence table for each pressure will provide you with an appropriate mix of lines of evidence for the diagnostic information required for water/sediment quality evaluations.

We have provided quality of evidence tables associated with different combinations of the lines of evidence selected for the 7 typical uses of the Water Quality Management Framework.

One of our quality of evidence tables, investigating an unexpected event, has been populated with as many matrix combinations as practically conceivable. Only a few key matrix combinations are provided for the other typical uses. Therefore, when assessing the quality of evidence for these other uses it is important to apply similar logic to that used for the well-populated example.

For each of the selected lines of evidence, choose which indicator type and specific indicators to select for the issue based on:

type of pressure and associated stressors
management goals
assigned levels of protection
spatial scale (broadscale, site-specific)
water type (freshwater, marine water)
compartment (water or sediment)
ecosystem type (e.g. river or wetland)
location (e.g. wet–dry tropics, temperate coastal rivers of south-eastern Australia).

Expand all

Developing a water quality management plan — quality of evidence

The development of a water quality management plan is an issue that is similar to broadscale water quality monitoring but is usually prepared at a smaller spatial scale.

The defined plan will require monitoring to describe the existing condition of the ecosystem (the stressors that are exceeding accepted quality guideline values and where any exceedances have affected the ecosystem). The improvement plan will then look to manage those pressures and stressors that are identified as significant contributors to these adverse effects.

Regular monitoring is required to confirm that any impaired water quality is not a transitory event, and the provision for biological assessment provides the necessary assurance that water quality has been maintained according to the management goals.

An analysis of the quality of evidence for several key line of evidence combinations is provided in Table 3 . (Our quality of evidence table for investigating an unexpected event has been populated with many more matrix combinations if you need further details.)

Table 3 Example of a quality of evidence table when developing a water quality management plan

Causal pathway elements	Lines of evidence selected	Quality of evidence rating	Reasons for quality assessment
Stressors	Chemical & physical (S)a	Low	Quantify toxicants and other stressors, taking into account effects of events (e.g. high rainfall).
Stressors and ecosystem receptors	Biodiversity (ER) Chemical & physical (S)	Moderate	Biodiversity integrates the broadscale ecosystem response, while chemical & physical line of evidence identifies stressors to be managed.
	Biodiversity (ER) Chemical & physical (S) Toxicity (ER)	High	Addition of toxicity assessment may highlight specific toxicant concerns.
	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical & physical (S) Toxicity (ER)	High	Bioaccumulation may indicate particular toxicant(s) of concern and integrate variable bioavailability of them.
	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical & physical (S) Non-water quality (S) Toxicity (ER)	Very high	Other factors, such as invasive species, cyanobacteria, periphyton blooms, salination and flow would be useful to clearly attribute any biological responses to the correct cause. Addition of biomarkers of effects might provide evidence of stressor exposures to sensitive biota and suborganism responses by them.

a. A single toxicity or biodiversity line of evidence will also have a low quality of evidence rating

Applying for a development approval — quality of evidence

This issue is closely related to our example for conducting a baseline study but it does not necessarily refer to only a greenfields (undisturbed) site. In this example, the pre-development background conditions are being established to determine the potential additive effects from contaminants that might be introduced to the water body as a consequence of the proposed development. This will usually include the development of agreed management goals and appropriate water quality objectives pertinent to the development.

An analysis of the quality of evidence for several key line of evidence combinations is provided in Table 4 . (Our quality of evidence table for investigating an unexpected event has been populated with many more matrix combinations if you need further details.)

Table 4 Example of a quality of evidence table when applying for a development approval

Causal pathway elements	Lines of evidence selected	Quality of evidence rating	Reasons for quality assessment
Stressors	Chemical & physical (S)a	Low	Need background data but these are only part of the story.
Stressors and ecosystem receptors	Chemical & physical (S) Toxicity (ER)	Moderate	Measured toxicant concentrations and potential toxicity will identify the acceptability of additional inputs from the proposed development and their acceptability.
	Biodiversity (ER) Chemical & physical (S)	Moderate, possibly high	Background ecological condition also needed and will likely highlight key potential sensitivities of the receiving ecosystems to the proposed development.
	Biodiversity (ER) Chemical & physical (S) Toxicity (ER)	High	The level of toxic effects and potential sensitivity of the local ecosystem can be used to assess potential added stressor effects.
	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical & physical (S) Toxicity (ER)	Very high	Bioaccumulation provides additional background data for the toxicant(s) of concern and can highlight areas of higher background bioavailability of contaminants.

a. A single toxicity or biodiversity line of evidence will also have a low quality of evidence rating

Assessing a waste discharge — quality of evidence

This issue concerns the effects from exceedances of an agreed waste discharge licence. These typically relate to the chemical and physical line of evidence and the concentrations (or loads) of discharged toxicants or other stressors, such as nutrients, turbidity, microbial inputs or pH, and any associated detriment to biodiversity.

Direct measurement of the discharge might assess compliance with chemistry guideline values but the ultimate concern is for the effects, after dilution, on the receiving ecosystem outside of a mixing zone.

To defend an exceedance in the first instance, the chemical and physical line of evidence would determine the fraction of the licence concentration (usually dissolved filterable) that is bioavailable after mixing.

Chemistry within the chemical and physical line of evidence identifies the potential stressors (cause) but measurement of ecosystem receptor lines of evidence are needed to assess the effects, both with respect to toxicity and to ecosystem health (biodiversity, algal blooms). These are increasingly being required as part of the licence in some jurisdictions and as necessary additional information in the event of non-compliance.

This is why additional lines of evidence become critical if an exceedance occurs, for assessing the consequence of the exceedance, and hence the level of penalty or compensation and the need for remediation or modifying the licence condition.

Ecosystem receptor lines of evidence used for early detection, assessing trends or assessing ecosystem health may require multiple samples and adequate baseline data to achieve necessary statistical power. This is why pre-disturbance data and ongoing monitoring should be conducted in anticipation of potential exceedances.

An analysis of the quality of evidence for several key line of evidence combinations is provided in Table 5 . (Our quality of evidence table for investigating an unexpected event has been populated with many more matrix combinations if you need further details.)

Table 5 Example of a quality of evidence table when assessing a waste discharge

Causal pathway elements	Lines of evidence selected	Quality of evidence rating	Reasons for quality assessment
Stressors	Chemical & physical (S)a	Low	May be sufficient for comparison with licence values but needs translation to the bioavailable fraction in the receiving system and may miss spikes or transient exceedances.
Stressors and ecosystem receptors	Biodiversity (ER) Chemical & physical (S)	Moderate	Longer-term effects suggest action needed. Better to have short-term effects for immediate management.
	Chemical & physical (S) Toxicity (ER)	High	Showing toxicity attributable to contaminants in discharge exceeding guideline values strengthens the identification of non-compliance. Spatiotemporal chronic effects of interest (noting discharges may be intermittent). Biodiversity is a longer-term effect.
	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical & physical (S) Toxicity (ER)	Very high	Add a consideration of ecosystem health for comprehensive evaluation. Bioaccumulation can also be useful to assess the potential for long-term effects.

a. A single toxicity or biodiversity line of evidence will also have a low quality of evidence rating

Investigating an unexpected event — quality of evidence

In the case of a fish kill (or a chemical spill), the initial evidence is a biological effect and the challenge is to define the cause.

Sometimes, events such as fish kills have a likely stressor source causing the effect, or toxicant concentrations or physical and chemical stressors are known to have exceeded their guideline values. In such cases, a full weight-of-evidence investigation to identify cause will not be required.

Nonetheless, a single chemical and physical line of evidence would typically be assigned ‘low’ quality of evidence because:

multiple (unmeasured) toxicants or other stressors are potentially responsible
toxicant(s) or stressor(s) may be transient in the system
observations may be caused by factors unrelated to toxicants.

Usually, characteristics of the specimens involved in the fish kill (species, age classes, locations, general morphology and laboratory pathology examination) will help to narrow down the likely types of toxicants or stressors that could have caused the kill.

The combination of 2 or more lines of evidence selected will be user- and situation-dependent.

There should also be provision in this issue to point the user directly to remediation (with no further evidence required) if exceedance of the guideline value is substantial (assumed ecological impairment of ecosystems, without quantifying this) and the source of the exceedance is known. However, the remediation process in itself would be likely to invoke a separate weight-of-evidence investigation.

Identifying the extent of the problem is likely to be required beyond simply managing the source. This could be a chemistry assessment but might also include an ecological investigation to determine the significance of any effect and to assess conditions before and after remediation, or other management action.

An analysis of the quality of evidence for as many line of evidence matrix combinations as practically conceivable is provided in Table 6 . (The quality of evidence tables in our other examples are not as detailed.)

Table 6 Example of a quality of evidence table when investigating an unexpected event

Causal pathway elements	Lines of evidence selected	Quality of evidence rating	Reasons for quality assessment
Stressors	Chemical & physical (S)a	Generally low	Contaminant not bioavailable or might not be measured and/or transient (pulse) but selection of obvious source/toxicant increases quality.
Ecosystem receptors	Toxicity (ER)	Low to moderate	Source of toxicity not measured. No toxicity may indicate a (missed) pulse but if persistent in the system greater likelihood of inferring a water–quality related stressor.
Ecosystem receptors	Biodiversity (ER)	Low to moderate	No response indicates no long-term effect. Response correlating with a putative (spatial) disturbance gradient increases inference. Lack of pressure and stressor information limits conclusions. Effect could be due to unmeasured toxicant pulse.
Pressure and stressors	Pressure (P) Non-water quality (S)	Moderate to high when combined with evidence from other lines of evidence	Measures of the pressure (or surrogates) responsible may correlate with such ‘events’. Other evidence of stress could be important (e.g. weather, overfishing, freshwater inputs to marine systems, engineering works, heavy rainfall, unusual temperatures).
Stressors and ecosystem receptors	Chemical & physical (S) Toxicity (ER)	Moderate	Identification of potential toxicant but no indication of long-term ecosystem effects.
	Biodiversity (ER) Chemical & physical (S)	Moderate	Potential cause-and-effect information but limited if contaminant not bioavailable or transient (pulse). Other effects may be contributing to biodiversity response. Need to check all pressures and stressors.
	Biodiversity (ER) Chemical & physical (S) Toxicity (ER)	High	Contaminant has potential to cause ecosystem harm. May not be conclusive if contaminant transient.
	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical & physical (S) Toxicity (ER)	High	Bioaccumulation adds evidence of potential toxicant(s).
	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical & physical (S) Non-water quality (S) Toxicity (ER)	High	For fish kills, pathological assessments are also usual and assist with identification of the cause from among various candidates.

a. A single toxicity or biodiversity line of evidence will also have a low quality of evidence rating

Assessing a remediation program — quality of evidence

This issue can include evaluating contaminated sediments, defining an area requiring remediation (or dredging) and possibly assessing the success of the remediation.

Sediments are typically contaminated by multiple chemical and physical stressors. The chemical and physical line of evidence should consider bioavailability, and this could include analysis of pore waters or sediment elutriates.

Sometimes, it might be possible to define the area for remediation based on the distribution of contaminants (considering bioavailability). Ideally, this is best supported by toxicity testing of whole sediment or pore waters (or elutriates).

Additional evidence that contaminants are having a biological impact can be obtained from bioaccumulation studies.

Given the high costs of remediation, it will be essential to accurately define the area and depth where remediation is required.

Biodiversity studies would not typically be used to define a remediation need because the sites are usually degraded. Such studies might be included in a broader-scale survey to identify and delineate areas of degraded ecosystem health, or to identify recovery (recolonisation) after the remediation or at the disposal site. Chemistry and toxicity testing are important in defining remediation success.

In this example, we have followed the hierarchical framework recommended in the National Assessment Guidelines for Dredging 2009.

Otherwise, and where an accidental discharge or dumping has occurred in an ecosystem of greater ecological value, biodiversity measurement would be important to assess recovery.

An analysis of the quality of evidence for specific line of evidence combinations is provided in Table 7. (Our quality of evidence table for investigating an unexpected event has been populated with many more matrix combinations if you need further details.)

Table 7 Example of a quality of evidence table when assessing a remediation study

Causal pathway elements	Lines of evidence selected	Quality of evidence rating	Reasons for quality assessment
Stressors	Chemical & physical (S)a	Low	Unless there is an obvious toxicant whose distribution can be mapped, in which case higher.
Stressors and ecosystem receptors	Biodiversity (ER) Chemical & physical (S)	Moderate	Scale of any biodiversity effects needs to be linked to potentially toxic effects to target remediation.
	Physical & chemical (S) Toxicity (ER)	Moderate-high	Linking contaminant concentrations with toxicity better defines the need for remediation.
	Bioaccumulation & biomarkers (ER) Chemical & physical (S) Toxicity (ER)	High	Better defines the need for remediation. Bioaccumulation may be critical for human health and food-chain assessment. This is often the basis for remediation being implemented.
	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical & physical (S) Toxicity (ER)	Very high	Biodiversity usually does not identify the remedial actions needed pre-remediation but often is a trigger for remediation action. Can contribute to priority for remediation because of the severity of effect and ecosystem consequences (food chain) if not undertaken. Assessment of recruitment or re-establishment of biodiversity post-remediation is the most important measure of remediation success.

a. A single toxicity or biodiversity line of evidence will also have a low quality of evidence rating

Conducting a baseline study — quality of evidence

This issue relates to the gathering of baseline (pre-disturbance) data for a greenfields location in advance of future development. During initial conceptual modelling, this task should include an analysis of existing and future pressures.

Adequate baseline characterisation is essential for defining water quality and ecological health, as well as identifying potential organism sensitivities and assessing other baseline pressures. These baseline needs may also extend to pre-disturbance body-burden data in those organisms that provide a bioaccumulation or biomarker potential.

You could develop and undertake toxicity tests for local species to provide a means of deriving site-specific toxicant guideline values (water and sediment) to be used for assessing future effects.

Both sediment and water quality are important. Sediment quality typically reflects the history of contaminant inputs. Water quality represents the status at the time of sampling only.

Line of evidence indicators that vary seasonally or annually will require more extensive baselines.

Where there are existing pressures, extended baselines will better capture the variability in the indicators as those pressures vary in intensity.

An analysis of the quality of evidence for specific line of evidence combinations is provided in Table 8 . (Our quality of evidence table for investigating an unexpected event has been populated with many more matrix combinations if you need further details.)

Table 8 Example of a quality of evidence table when conducting a baseline study

Causal pathway elements	Lines of evidence selected	Quality of evidence rating	Reasons for quality assessment
Stressors	Chemical & physical (S)a	Low	Only part of the story. Provides no basis of comparison for other lines of evidence should they be added to the assessment of water/sediment quality after development has occurred.
Stressors and ecosystem receptors	Biodiversity (ER) Chemical & physical (S)	Moderate	If biodiversity is similar to reference and a robust baseline can be established at reference/control and potential impact sites, and it is known that the biodiversity indicators selected respond sensitively to the chemical and physical stressor indicators. Assumes that the development is unlikely to affect other stressors and/or not affect other biodiversity indicators preferentially. Need to establish baseline for a wide suite of potential stressors to anticipate future development effects.
	Chemical & physical (S) Biodiversity (ER)	Moderate	Background evidence of toxicity associated with identifiable contaminants improves the assessment.
	Biodiversity (ER) Chemical & physical (S) Toxicity (ER)	High	Important to establish baselines for all lines of evidence. Unlikely to be toxicity for greenfields site but baseline assessments will help to establish potential toxicity of the operating site and site-specific testing will help to determine the sensitivity of the local ecosystems.
	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical & physical (S) Non-water quality (S) Toxicity (ER)	Very high	A bioaccumulation baseline for future assessments is also desirable. Need for future flexibility rather than immediate insight. Establishing baseline habitat condition can assist in later assessment of non-chemistry pressures.

a. A single toxicity or biodiversity line of evidence will also have a low quality of evidence rating

Implementing a broadscale monitoring program — quality of evidence

This issue relates to monitoring over a broad scale to assess the collective effects of a range of receiving water inputs that might include aerial deposition, land-based catchment run-off and industrial or other discharges. It might cover individual water bodies, such as a river, estuary or coastal water, or even a catchment-to-coast assessment of water quality (e.g. for the Great Barrier Reef).

Typically, regular monitoring would be designed to identify the ecosystem responses to contaminants, including nutrients, salinity and turbidity, within specified zones, with the aim to assess the variability of water quality over a large spatial scale to thereby identify regions of management concern.

The frequency of monitoring will differ for sediments (every 1 to 3 years) compared to waters (possibly monthly).

Water quality objectives might be specified in terms of concentrations or loads of stressors entering the system. The weight-of-evidence approach to monitoring will include the incorporation of traditional lines of evidence. However, understanding causes and effects might also require evidence and quantification of management practices in a catchment (land-based and discharges).

While a comparison with water quality objectives for toxicants or PC stressors might be a first step (chemical and physical line of evidence), the broader-scale effects on the ecosystem as a whole are needed to integrate the effects and deliver the results to stakeholders in the form of State of the Environment reporting or report card summaries.

Biodiversity investigations are particularly important for broadscale assessments as condition indicators but take care to select appropriate biologically based guideline values or water quality objectives that will provide appropriate information at the scale of the assessment and in the needed time frame.

An analysis of the quality of evidence for specific line of evidence combinations is provided in Table 9 . (Our quality of evidence table for investigating an unexpected event has been populated with many more matrix combinations if you need further details.)

Table 9 Example of a quality of evidence table when implementing a broadscale monitoring program

Causal pathway elements	Lines of evidence selected	Quality of evidence rating	Reasons for quality assessment
Stressors	Chemical & physical (S)a	Low	Quantify toxicants and other stressors, taking into account the effects of events (e.g. high rainfall). May need integrating samplers. Unlikely to detect all stressors at the scale of interest.
Stressors and ecosystem receptors	Chemical & physical (S) Toxicity (ER)	Moderate	Any combinations of 2 lines of evidence is likely to be insufficient to characterise the ecosystem health of a region to satisfy regulatory agencies.
	Biodiversity (ER) Chemical & physical (S) Non-water quality (S)	Moderate	Biodiversity integrates the broadscale ecosystem response and responses to stressors not measurable with chemistry alone. Chemical and physical line of evidence identifies likely stressors to be managed. For biodiversity, commonly rapid assessment protocols (e.g. AUSRIVAS) are used due to the cost of implementation at the broad scale (time and money). This may limit the sensitivity of the assessment. Could often include other factors, such as invasive species, cyanobacteria, periphyton blooms, salination and flow (at the broad scale there will rarely be a single pressure).
	Biodiversity (ER) Chemical & physical (S) Non-water quality (S) Toxicity (ER)	High	Addition of toxicity assessment may highlight additional concerns and can have better statistical power per unit of expenditure than biodiversity. Habitat alteration can be an important other factor.
	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical & physical (S) Toxicity (ER) Non-water quality (S)	Very high	Bioaccumulation may indicate particular toxicant(s) of concern, particularly if exposure occurs in pulses.

a. A single toxicity or biodiversity line of evidence will also have a low quality of evidence rating

Evaluating multiple lines of evidence

The measurement program for each line of evidence investigation culminates in analyses to determine whether the measured responses indicate a water–quality related change (e.g. guideline value exceedance, observed toxicity, significant statistical test for change in a field biological response). Refer to Data analysis and interpretation template.

For the interpretation of different combinations of results arising from these analyses, ‘default’ interpretative tools provided in the Water Quality Guidelines are:

qualitative tabulation
criteria based on known toxicity-based or other causal responses (e.g. derived from dose–response data from laboratory, field or mesocosm studies).

(For interpretation using criteria, various combinations of these criteria, based on strength, consistency and specificity, provide the strongest evidence for causality).

Different approaches in the literature for more complex multiple line-of-evidence evaluations within and across pressures are provided in Weight-of-evidence evaluation methods.

A weight-of-evidence evaluation of multiple lines of evidence can be undertaken in a few ways, varying from qualitative and semi-quantitative to fully quantitative approaches.

Qualitative assessments

Qualitative approaches involve the use of best professional judgement to determine how the evidence from individual lines of evidence supports a final assessment of cause and effect.

One such approach, in its simplest and most generic form, is illustrated in Table 10 . Possible interpretations of the findings are based on responses recorded for the various lines of evidence (all assumed to have been measured and assessed). A response indicates, for example, guideline value exceedance, observed toxicity or significant statistical test for change in a field biological measurement.

Table 10 Interpretations of likely combinations of line of evidence responses assessed in relation to guideline values and reference-site data

Responses from chemical and physical, toxicity, biodiversity, bioaccumulation and biomarker lines of evidence	Interpretation
No responses	No exceeded guideline values and no effects on the ecosystem.
Chemical and physical response only	Contaminants present at concentrations exceeding guideline values but not bioavailable.
Toxicity response only	Toxic effects due to unmeasured contaminants or an unidentified stressor.
Biodiversity response only	Unmeasured contaminants or other factors (e.g. another pressure) contributing to ecological effects.
Chemical and physical, bioaccumulation and biomarker responses	Contaminants exceeding guideline values and bioaccumulating but not toxic.
Chemical and physical and biodiversity responses	Toxicity not seen using the test organisms but effects are still seen on biodiversity (toxicity testing may not have been representative of sensitive taxa or did not reflect higher-level ecosystem responses).
Chemical and physical and toxicity responses	Some resistance to effects on biodiversity (ecosystem resilience overwhelming toxicity to some species), or test species not representative of receiving ecosystem sensitivity.
Toxicity and biodiversity responses	Unmeasured contaminants or stressors are toxic and affecting ecosystem health.
Chemical and physical, toxicity, bioaccumulation and biomarker responses	Measured contaminants are toxic and accumulating but no significant ecological effects are observed (mitigating processes occurring, or ecosystem may have acquired tolerance).
Chemical and physical, toxicity, biodiversity, bioaccumulation and biomarker responses	Measured contaminants exceed guideline values, are toxic and bioaccumulating, and affecting ecosystem health.

The concept in Table 10 has been extended for the Water Quality Guidelines, to evaluate possible interpretations of the findings from the same 7 typical uses of the Water Quality Management Framework used to demonstrate quality of evidence.

Tables 11 to 18 illustrate possible interpretations of the findings based on positive responses recorded for the various lines of evidence, where a positive response might be, for example, guideline value exceedance, toxicity, contaminants present in body tissues, or change to a biodiversity indicator. Indicators representing all lines of evidence are assumed to have been measured.

As with the quality of evidence tables, our example for investigating an unexpected event (Table 14) has been populated with as many matrix combinations as practically conceivable. For the other typical uses, only several key matrix combinations are provided so it is important to note this and to apply similar logic as used for the well-populated example when considering the other uses.

We again assume that the matrix outcomes are based on adequate field and experimental designs, and that processing, analyses and reporting of water quality samples (including biological samples) and data are undertaken to high standards and in a timely manner.

The greatest confidence in the interpretation of each table representing each typical use is obtained with the maximum number of similar responses. An evaluation rating judgement is deduced (in column 3 of the tables) of the certainty in identifying the cause. This could be strengthened by introducing a consequence column. For example, the proportion of habitat affected in a sediment study was ranked as negligible, minor, moderate, major, severe or catastrophic in a study by MacDiarmid et al. (2014).

Table 11 Lines-of-evidence evaluation using the weight-of-evidence process when developing a water quality management plan

Causal pathway elements	Lines of evidence that responded	Evaluation rating	Evaluation conclusion	Recommended management response
Stressor	Chemical and physical (S)a	Low	Quantify toxicants and other stressors, taking into account effects of events (e.g. high rainfall).	Limited evidence for management.
Stressors and ecosystem receptors	Biodiversity (ER) Chemical and physical (S)	Moderate	Biodiversity integrates the broadscale ecosystem response. Chemical and physical line of evidence identifies stressors to be managed.	Need detail on spatial scale, point sources but evidence for concern.
Stressors and ecosystem receptors	Biodiversity (ER) Chemical and physical (S) Toxicity (ER)	High	Addition of toxicity assessment may highlight additional concerns.	Gives greater certainty to toxicants requiring management.
Stressors and ecosystem receptors	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical and physical (S) Toxicity (ER)	High	Bioaccumulation may indicate particular toxicant(s) of concern and integrate variable bioavailability of them.	Strong evidence for action.
Stressors and ecosystem receptors	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical and physical (S) Non-water quality (S) Toxicity (ER)	Very high	Other factors, such as invasive species, cyanobacteria, periphyton blooms, salination and flow, would be useful to clearly attribute any biological responses to the correct cause. Addition of biomarkers of effects might provide evidence of stressor exposures to sensitive biota and suborganism responses by them.	A complete suite of data to manage the water body. Additional expense needs to be justifiable by the size or value of a healthy ecosystem.

a A single toxicity or biodiversity line of evidence will also have a low evaluation rating.

Table 12 Lines-of-evidence evaluation using a weight-of-evidence process to apply for a development approval

Causal pathway elements	Lines of evidence that responded	Evaluation rating	Evaluation conclusion	Recommended management response
Stressors	Chemical and physical (S)a	Low	Need background data but these are only part of the story.	Inadequate background data to allow development.
Stressors and ecosystem receptors	Biodiversity (ER) Chemical and physical (S)	Possibly high	Background ecological condition also needed and will likely highlight key potential sensitivities of the receiving ecosystems to the proposed development.	Improved information, identifying key species to be protected.
Stressors and ecosystem receptors	Biodiversity (ER) Chemical and physical (S) Toxicity (ER)	High	The level of toxic effects and potential sensitivity of the local ecosystem can be used to assess potential added stressor effects.	Good background to compare with proposed contaminant releases.
Stressors and ecosystem receptors	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical and physical (S) Toxicity (ER)	Very high	Bioaccumulation provides additional background data for the toxicant(s) of concern and can highlight areas of higher background bioavailability of contaminants.	Ideal evidence to decide on development approval.

a A single toxicity or biodiversity line of evidence will also have a low evaluation rating.

Table 13 Lines-of-evidence evaluation using a weight-of-evidence process to assess compliance with a waste discharge licence

Causal pathway elements	Lines of evidence that responded	Evaluation rating	Evaluation conclusion	Recommended management response
Stressors	Chemical and physical (S)a	Low	For comparison with licence values but needs translation to the bioavailable fraction in the receiving system.	If exceeding the licence, is generally sufficient for action.
Stressors and ecosystem receptors	Chemical and physical (S) Toxicity (ER)	High	Adding measure of effects might consider spatiotemporal chronic effects on biota (noting discharges may be intermittent).	Strengthens the case indicating that contaminants are indeed in a bioavailable form.
Stressors and ecosystem receptors	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical and physical (S) Toxicity (ER)	Very high	Add a consideration of ecosystem health for comprehensive evaluation. Bioaccumulation can also be useful to assess the potential for long-term effects.	Normally fast action is required but this provides evidence of longer-term potentially damaging effects and possible need to renegotiate the licence.

a. A single toxicity or biodiversity line of evidence will also have a low quality of evidence rating

Table 14 Lines-of-evidence evaluation using a weight-of-evidence process to investigate an unexpected event (e.g. fish kill)

Causal pathway elements	Lines of evidence that responded a	Evaluation rating	Evaluation conclusion	Recommended management response
None	na	Low	Event remains unexplained although effect was only short-term (no ecosystem response).	Clean up and further research and monitoring.
Ecosystem receptors	Toxicity (ER)	Moderate	Water quality inferred but transient event (no biodiversity effect); toxicity not due to target toxicant(s); conduct TIE.	Remediation, TIE and further research and monitoring.
Ecosystem receptors	Biodiversity (ER)	Low to moderate to high	Only moderate–high if linked to any pressure or non–water quality related stressor, or spatial gradient sourced to plausible cause. Otherwise lack of pressure, stressor or gradient information limits conclusions.	Remediation and mitigation against future occurrences; possibly further research or monitoring to identify the effect pathway.
Pressure	Pressure (P)	Moderate to high	Water–quality related pressure known to result in such events measured or observed (correlated with the event) but no evidence of the contaminant(s) and no ecosystem responses, hence transient (pulse) event.	Remediation and mitigation against future occurrences, continued monitoring if necessary to identify toxicant.
Pressure and stressors	Non-water quality (S) Pressure (P)	Moderate to high	Non–water quality related pressure and associated stressor known to result in such events measured or observed (correlated with the event) (e.g. illegal fishing, dumping of dead fish) but no water quality contaminant(s) and no ecosystem responses.	Clean up and possible litigation or prosecution. Consider increased surveillance to prevent future occurrences.
Stressors	Physical & chemical (S)	Moderate to high	Plausible or likely toxicant measured but no ecosystem responses could suggest modifying (water quality) factors (e.g. low pH) transient in the system.	Remediate if necessary, and further research and monitoring.
Ecosystem receptors	Bioaccumulation & biomarkers (ER)	Moderate to high	Bioaccumulation or biomarker response but no toxicant(s) exceedance, toxicity nor biodiversity response suggests toxicant(s) transient in the system.	Further investigation to identify stressor and contaminant pathway to inform remediation and prevent future occurrences.
Ecosystem receptors	Chemical & physical (S) Toxicity (ER)	Moderate to high	Better identifies the cause of toxicity.	May be sufficient to target source of the event.
Stressors and ecosystem receptors	Biodiversity (ER) Chemical & physical (S) Toxicity (ER)	High	Biodiversity response, toxicant(s) exceedance and toxicity but no bioaccumulation indicating toxicant not one to bioaccumulate.	Remediation and mitigation against future occurrences.
Stressors and ecosystem receptors	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical & physical (S) Toxicity (ER)	Very high	All lines of evidence identifying likely water–quality related cause.	Focused remediation and mitigation against future occurrences.

na = not applicable; TIE = toxicity identification evaluation.
a. All lines of evidence assumed to have been measured. The event or response (e.g. dead fish) is assumed and is not indicated in the evaluation table.

Table 15 Lines-of-evidence evaluation using a weight-of-evidence process to assessing a remediation study (e.g. remediation of contaminated sediments)

Causal pathway elements	Lines of evidence that responded	Evaluation rating	Evaluation conclusion	Recommended management response
Stressors	Chemical and physical (S)a	Lowa	Unless there is an obvious toxicant whose distribution can be mapped, in which case higher.	Insufficient information for remediation unless effects can be demonstrated.
Stressors and ecosystem receptors	Chemical and physical (S) Toxicity (ER)	Moderate to high	Better defines the need for remediation.	Possibly sufficient to justify action.
Stressors and ecosystem receptors	Bioaccumulation & biomarkers (ER) Chemical and physical (S) Toxicity (ER)	High	Better defines the need for remediation. Bioaccumulation may be critical for human health and food-chain assessment. This is often the basis for remediation being implemented.	A better selection of evidence to justify remediation.
Stressors and ecosystem receptors	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical and physical (S) Toxicity (ER)	Very high	Biodiversity usually does not identify the remedial actions needed pre-remediation but often is a trigger for remediation action occurring. Can contribute to priority for remediation because of the severity of impact and ecosystem consequences (food chain) if not undertaken.	Better defines the extent of an area requiring remediation. Assessment of recruitment or re-establishment of biodiversity post-remediation is the most important measure of remediation success.

a. A single toxicity or biodiversity line of evidence will also have a low quality of evidence rating

Table 16 Lines-of-evidence evaluation using a weight-of-evidence process to conduct a baseline study (e.g. greenfields location prior to development)

Causal pathway elements	Lines of evidence that responded	Evaluation rating	Evaluation conclusion	Recommended management response
Stressors	Chemical & physical (S)a	Lowa	Only part of the story and provides no basis of comparison for other lines of evidence should they be added to the assessment of water/sediment quality after development has occurred.	Inadequate baseline information. Need additional lines of evidence.
Stressors and ecosystem receptors	Biodiversity (ER) Chemical & physical (S)	Possibly high	If ecology is similar to reference and a robust baseline can be established at reference/control and potential impact sites, and it is known that the biodiversity indicators selected respond sensitively to the chemical and physical stressor indicators. Assumes that the development is unlikely to affect other stressors and/or not affect other biodiversity indicators preferentially. Need to establish baseline for a wide suite of potential stressors to anticipate future development effects.	Need to confirm that no background toxic effects.
Stressors and ecosystem receptors	Biodiversity (ER) Chemical and physical (S) Toxicity (ER)	High	Important to establish baselines for all lines of evidence. Unlikely to be toxicity for greenfields site but baseline assessments will help to establish potential toxicity of the operating site and site-specific testing will help to determine the sensitivity of the local ecosystems.	Good evidence to assist management but could be strengthened by baseline bioaccumulation.
Stressors and ecosystem receptors	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical and physical (S) Non-water quality (S) Toxicity (ER)	Very high	A bioaccumulation baseline for future assessments is also desirable. Need for future flexibility rather than immediate insight. Establishing baseline habitat condition can assist in later assessment of non-chemistry pressures.	This is the best combination that could be expected.

a A single toxicity or biodiversity line of evidence will also have a low evaluation rating.

Table 17 Lines-of-evidence evaluation using a weight-of-evidence process to assess a broadscale monitoring program

Causal pathway elements	Lines of evidence that responded	Evaluation rating	Evaluation conclusion	Recommended management response
Stressors	Chemical and physical (S)a	Lowa	Quantify toxicants and other stressors, taking into account the effects of events (e.g. high rainfall). May need integrating samplers. Unlikely to detect all stressors at the scale of interest.	Initial survey only but know management actions justifiable.
Stressors and ecosystem receptors	Chemical and physical Toxicity (ER)	Moderate	Evidence of potential toxic effects can help identify sources.	Focus for initial management action.
Stressors and ecosystem receptors	Chemical and physical (S) Biodiversity (ER)	Moderate	Broadscale health is only part of the story. Needs additional short-term effects.	Health ecosystem might not identify localised effects.
Stressors and ecosystem receptors	Biodiversity (ER) Chemical and physical (S) Non-water quality (S)	Moderate to high	Biodiversity integrates the broadscale ecosystem response and responses to stressors not measurable with chemistry alone. Chemical and physical line of evidence identifies likely stressors to be managed. Rapid assessment protocols (e.g. AUSRIVAS) are commonly used for ecology due to cost of implementation at the broad scale in terms of both time and money. This may limit the sensitivity of the assessment. Could often include other factors, such as invasive species, cyanobacteria, periphyton blooms, salination and flow (at the broad scale, there will rarely be a single pressure).	Defines the background status of the system but may not identify point sources.
Stressors and ecosystem receptors	Biodiversity (ER) Chemical and physical (S) Non-water quality (S) Toxicity (ER)	High	Addition of toxicity assessment may highlight additional concerns and can have better statistical power per unit of expenditure than biodiversity. Habitat alteration can be an important other stressor to account for.	Good measures of overall water quality.
Stressors and ecosystem receptors	Biodiversity (ER) Bioaccumulation & biomarkers (ER) Chemical & physical (S) Non-water quality (S) Toxicity (ER)	Very high	Bioaccumulation may indicate particular toxicant(s) of concern, particularly if exposure occurs in pulses.	Complete dataset to identify changes in system status.

a A single toxicity or biodiversity line of evidence will also have a low evaluation rating.

We recommend that qualitative assessments be further strengthened by adding a criteria-based evaluation (Table 18) to address the eco-epidemiological criteria recommended by Hill (1965), as mentioned earlier in criteria-guided judgement).

Our quality of evidence and weight-of-evidence evaluation tables have been populated one pressure at a time. Land clearing vs pesticides, for example, will require different PSER lines of evidence. Additional tables (using the same logic where multiple pressures are evident) will be required for each pressure.

Table 18 Criteria to formalise the use of independent lines of evidence in inferring causation in effect studiesa,b

Criterion	Description	Example
Strength of association	Size of the correlation between the intensity of the disturbance and the response of the measurement parameter	Sites with high concentrations of the toxicant have lower population densities of an organism than sites with low concentrations of the toxicant
Consistency of association	The association between the disturbance and the measurement parameter has been repeatedly observed in different places, circumstances and times	The negative correlation between concentrations of the toxicant and the densities of the organism has been demonstrated in several other studies by other investigators elsewhere
Specificity of association	The observed effect is diagnostic of exposure to the disturbance	In this case, a decrease in density of the organism is not diagnostic of the disturbance because the population density of the organism may be reduced by other natural processes
Presence of stressor in tissues	Measurement parameters of exposure (e.g. residues, breakdown products)	Breakdown products of the toxicant are found in tissues of organisms
Timing	Exposure to the disturbance must precede the effect in time	Accidental spillages of the toxicant are usually followed by sharp declines in the density of the organism
Biological gradient	A dose–response relationship exists (response of measurement parameter is a function of increases in magnitude of disturbance)	Laboratory toxicology tests have established a dose–response relationship
Biological plausibility	There is a biologically plausible explanation for causality, even if the precise mechanism is unknown	The toxicant comes from a group of chemicals known to interfere with respiration in this organism
Coherence	The causal interpretation should not seriously conflict with existing knowledge about the natural history of the organism and the behaviour of any substances associated with the disturbance	The organism is usually common in sites within the study region and is present year-round; the toxicant is readily soluble and does not break down readily while in solution
Experimental evidence	A valid experiment provides strong evidence of causation	A field experiment demonstrated rapid mortality in response to the addition of known concentrations of the toxicant
Analogy	Similar disturbances cause similar effects	Other chemicals related to this toxicant have shown similar dose–response curves and responses in field experiments with different but related species

a. From ANZECC & ARMCANZ (2000) guidelines
b. A hypothetical example of the response of biological measurement parameters to a toxicant, as an illustration

Semi-quantitative assessments

Semi-quantitative (logic table) assessments have been widely adopted for use in sediment quality assessment. Our approach here uses a numerical scoring system as shown in Table 19, where scores indicate a weight-of-evidence assessment of no significant (1), moderate (2) or significant (3) adverse effects. The scores are based on defined measurement responses, as indicated.

Table 19 Proposed scoring system for lines of evidences in a sediment quality weight-of-evidence assessment

Line of evidence	Indicator type	Score 3	Score 2	Score 1
Chemical and physical	Sediment chemistry	Concentration > SQGV-high	Concentration > GV, < SQGV-high	Concentration < GV
	Pore water chemistry	Concentration > WQGV-HC10a	Concentration > WQGV-HC5a, < WQGV-HC10	Concentration < WQGV-HC5
	Toxicity	≥ 50% effect vs control	20–50% effect vs control	< 20% effect vs control
Bioaccumulation & biomarkers	Bioaccumulation	Significantly different (p < 0.05) and > 3 × controlb	Significantly different (p < 0.05) and £ 3× control	Not significantly different from control
Biodiversity	Biodiversity	Significant and high effects on abundance or diversity	Significant but moderate effects on abundance or diversity	No significant effects on abundance or diversity

GV = guideline value, SQGV = sediment quality guideline value, WQGV = water quality guideline
a. HC5 and HC10 are the guideline values for 90% and 95% species protection, respectively.
b. For essential substances that are well regulated, significant difference from control/reference will be the most important characteristic to consider.

Based on the rankings, a score of 3 in any line of evidence is sufficient to score 3 in the overall assessment of significant adverse effects. Scores of 3 from more than one line of evidence obviously enhance the confidence in the overall assessment. An equivalent scoring system for water quality is provided in Table 20.

Table 20 Possible scoring system for lines of evidence in a water quality weight-of-evidence assessment

Line of evidence	Indicator type	Score 3	Score 2	Score 1
Chemical and physical	Chemistrya	Bioavailable concentration > HC10b	Bioavailable concentration > HC5b, < HC10	Bioavailable concentration < HC5
Chemical and physical	Toxicity	≥ 50% effect vs control	20–50% effect vs control	< 20% effect vs control
Bioaccumulation & biomarker	Bioaccumulation	Significantly different (p < 0.05) and > 3 × control	Significantly different (p < 0.05) and £ 3 × control	Not significantly different from control
Biodiversity	Biodiversity	Significant and high effects on abundance or diversity	Significant but moderate effects on abundance or diversity	No significant effects on abundance or diversity

a. A separate chemical and physical line of evidence might consider other stressors (e.g. nutrients), using exceedance of the default guideline value as a measure.
b. HC5 and HC10 are the guideline values for 90% and 95% species protection, respectively.

An example of a semi-quantitative approach to a weight-of-evidence assessment of contaminated sediments is presented in Table 21 (Simpson et al. 2013).

For a series of hypothetical case studies, a range of lines-of-evidence score combinations have been allocated according to the proposed scoring system in Table 19. The highest scoring assessment for any line of evidence is recorded against that line of evidence. The final assessment score is dictated by the maximum scores in any line of evidence leading to an overall assessment. Table 21 helps you to assess the lines of evidence resulting from the summation of the effects of multiple pressures. It indicates the weight-of-evidence scores derived for each combination together with an overall assessment based on these scores for each case. This approach represents an easily adaptable approach that requires a minimum of professional judgement.

Table 21 Weight-of-evidence scores and assessments for 14 examples of contaminated sediments using a semi-quantitative approach to assess different lines of evidence (LOEs) for a single pressure within an ecosystema

Example	Chemical and Physical LOE	Toxicity LOE	Biodiversity LOE	Bioaccumulation & biomarkers LOE	Score	Overall assessment
A	3	3	3	2 or 3	3	Significant adverse effects from sediment contamination
B	3	3	2	2 or 3	3	Significant adverse effects from sediment contamination
C	2 or 3	3	2	2	3	Significant adverse effects from sediment contamination
D	2 or 3	2	2	1 or 2	2	Possible adverse effects from sediment contamination
E	2	2 or 3	2	1or 2	2	Possible adverse effects from sediment contamination
F	2	2	2 or 3	1 or 2	2	Possible adverse effects from sediment contamination
G	2 or 3	2 or 3	1	2 or 3	2	Toxic chemical stressing system but resistance may have developed at community level
H	1	2 or 3	2 or 3	1	2	Unmeasured toxic chemicals causing effects on communities is possible
I	1	2 or 3	1	1	2	Unmeasured physical or chemical causes of toxicity
J	2 or 3	1	2 or 3	1	2	Chemicals are not bioavailable or community change may not be due to chemicals
K	1	1	2 or 3	1	1	Changes probably not due to measured contaminants
L	1 or 2	1	1	1 or 2	1	No adverse effects
M	1	1	1	1	1	No adverse effects
N	2 or 3	1	1	1	1	Contaminants unavailable

a. Values listed in each LOE category are the highest scoring assessment in that category. For example, under ‘Chemical and Physical LOE’, metals may score 2 and organics may score 3 so the ‘3’ is recorded. The greater the number of 3s recorded in a category, the greater is the weight that LOE category assumes.

Quantitative assessments

Quantitative approaches to weight-of-evidence assessment largely involve more complex statistical analyses (Reynoldson et al. 2002, Smith et al. 2002).

Different approaches include using:

multivariate statistics to cluster sites into groups of similar impact (a common approach)
meta-analysis to pool empirically derived hypothesis-testing P-values
a quantitative estimation of probability of impairment derived from odds ratios (Bayesian analysis is showing some promise here but has yet to be widely adopted; refer to Smith et al. 2002 and Linkov et al. 2015).

Common themes in these 3 strategies include the critical issue of defining an appropriate set of reference or control conditions.

Quantitative measures are more appropriate for large datasets of both test and reference sites, unlike more focused examples with fewer than 10 impacted sites and even fewer reference sites.

There are a number of variants on the approaches for weight-of-evidence assessment we have discussed. For example:

inclusion and weighting for the number of contaminants exceeding the guideline values
colour-coding the data presentation to provide a better visual indication of the level and extent of contamination
inclusion of a human health aspect to the bioaccumulation assessment (a major endpoint of concern for many contaminated site assessments).

The water/sediment quality management goals are deemed to be met when:

water quality objectives for those lines of evidence considered essential for informing acceptable water/sediment quality are met
results for other supportive lines of evidence are consistent with no compromise to current or future water/sediment quality according to the selected level of protection.

In this case, management should focus on maintaining or improving that quality. This will require a check of any possible improvements to management strategies at Step 8 of the framework, and then implementation at Step 10 of the framework.

Should the weight-of-evidence evaluation conclude that the objectives are not met, adverse trends are evident, or the result is inconclusive (e.g. the stressor was transient in the system), or if there is conflicting evidence from separate lines of evidence, then 3 options are available:

formulate, assess and prioritise management strategies to improve water/sediment quality (Steps 8 to 10 of the framework), or
reassess the appropriateness of the water/sediment quality guideline values (Step 7 of the framework), and/or
consider selection of additional or alternative indicators or lines of evidence (Step 7 of the framework).

ANZECC & ARMCANZ 2000, Australian and New Zealand Guidelines for Fresh and Marine Water Quality, Australian and New Zealand Environment and Conservation Council and Agriculture and Resource Management Council of Australia and New Zealand, Canberra.

Chapman PM 1990, The sediment quality triad approach to determining pollution-induced degradation, Science of the Total Environment 97–98: 815–825.

DNRM 2013. Queensland integrated waterways monitoring framework. Prepared by Water Monitoring and Reporting within the Queensland Department of Natural Resources and Mines, Brisbane.

Hill, AB 1965, The Environment and Disease: Association or Causation? Proceedings of the Royal Society of Medicine 58 (5): 295–300
Linkov I, Massey O, Keisler J, Rusyn I & Hartung T 2015, From "weight of evidence" to quantitative data integration using multicriteria decision analysis and Bayesian methods, Alternatives to Animal Experimentation 32(1): 3–8.

MacDiarmid A, Boschen R, Bowden D, Clark M, Hadfield M, Lamarche G, Nodder S, Pinkerton M & Thompson D 2014, Environmental risk assessment of discharges of sediment during prospecting and exploration for seabed minerals (PDF, 2.2MB), NIWA report WLG2013-66, Hamilton, page 53.

Reynoldson TB, Thompson SP, & Milani D 2002, Integrating multiple toxicological endpoints in a decision-making framework for contaminated sediments, Human and Ecological Risk Assessment 8(7): 1569–1584.Simpson SL, Batley GE & Chariton AA 2013, Revision of the ANZECC/ARMCANZ Sediment Quality Guidelines, CSIRO Land and Water Report 8/07, Canberra.

Smith EP, Lipkovich I & Ye K 2002, Weight of evidence (WOE): Quantitative estimation of probability of impairment for individual and multiple lines of evidence, Human and Ecological Risk Assessment 8(7): 1585–1596

Suter GW & Cormier SM 2011, Why and how to combine evidence in environmental assessments: Weighing evidence and building cases (PDF, 782KB), Science of the Total Environment 409: 1406–1417.

Suter GW & Cormier SM 2013, A method for assessing the potential for confounding applied to ionic strength in central Appalachian streams, Environmental Toxicology and Chemistry 32: 288–295.

Suter G, Cormier S & Barron M 2017, A weight of evidence framework for environmental assessments: inferring qualities. Integrated Environmental Assessment and Management 13(6): 1038–1044.

USEPA 2016, Weight of Evidence in Ecological Assessment, US Environmental Protection Agency Office of Research and Development, Washington DC, EPA100R16001.