Weight of evidence

Weight of evidence describes the process to collect, analyse and evaluate a combination of different qualitative, semi-quantitative or quantitative lines of evidence to make an overall assessment of water/sediment quality and its associated management. It is the central platform for water/sediment quality assessments in the Water Quality Guidelines.

Applying a weight-of-evidence process incorporates judgements about the quality, quantity, relevance and congruence of the data contained in the different lines of evidence.

The Water Quality Guidelines recommends measuring indicators from multiple lines of evidence across the pressure–stressor–ecosystem receptor (PSER) causal pathway. This will give greater weight (or certainty) to your assessment conclusions — and subsequent management decisions to meet the water/sediment quality objective — than basing your evaluation on a single line of evidence.

Our approach for weight of evidence:

  • harmonises with existing pressure–state–response (PSR) management models that include indicator sets selected across the cause-and-effect pathway
  • encompasses a broad set of line of evidence indicators, including those with interpretative and diagnostic value (e.g. toxicity, biomarkers), as well as non–water quality related stressors
  • integrates into the Water Quality Management Framework at 3 key steps
  • adapts to many typical uses of the Water Quality Management Framework.

Strengthening conclusions from water/sediment quality assessments

Methods and technical guidance for reaching the correct or valid conclusion in water/sediment quality assessments, together with management frameworks that support such evaluations, have steadily improved since the ‘integrated water quality assessment’ concept in the ANZECC & ARMCANZ (2000) guidelines.

Our methodology to incorporate weight of evidence in water/sediment quality assessments is consistent with recent moves internationally (e.g. USEPA 2016, Suter et al 2017).

Integrated environmental assessment models reduce risk of making poor decisions

Government jurisdictions in Australia and New Zealand are developing environmental indicator sets according to issues and the key elements of the conceptual contaminant pathway that depict causal links.

We have adapted the PSR conceptual model used by the Queensland Government (DNRM 2013) and applied it to water/sediment quality assessments in the Water Quality Guidelines; a minor refinement is replacement of ‘response’ (R) with ‘ecosystem receptor’ (ER).

Adoption of the PSER model, with information from lines of evidence drawn from and integrated across each of the pressures, stressors and receptors, reduces the risk of making a wrong decision regarding the cause-and-effect linkages for a particular issue.

Weight-of-evidence evaluations minimise subjective judgements

Evaluating multiple lines of evidence requires a weight-of-evidence assessment.

Assessments may be qualitative, semi-quantitative or quantitative, which entail successively decreasing levels of best professional judgement. Ideally, an assessment should involve minimal best professional judgement, such that independent assessors of the lines of evidence will reach the same conclusions.

Be mindful that an overly complex assessment could be costly and require larger data inputs, which may be impractical to implement.

Criteria-guided judgement and logic tables useful for assessments

Suter & Cormier (2011) provided a framework for performing weight-of-evidence assessments and reviewed a number of different methods available for weighing evidence, including:

  • criteria-guided judgement (recommended) —evidence is strengthened if it exhibits these criteria: strength of association; temporality; dose-response; biological plausibility; consistency; coherency; specificity; experimental evidence; analogy. The best-known example of this method is the criteria for causation first developed by Hill (1965) for human epidemiology and now applied widely for environmental assessments.
  • independent applicability — evidence from any line of evidence (exceedance of a water quality objective, toxicity or observed field biological impact) may be sufficient grounds for concluding impaired water/sediment quality. While the approach is inherently conservative, it should not be adopted as a rationale for collection of data from just one line of evidence. Reliance on a single line of evidence without adequate baseline data for other lines of evidence precludes any ability to properly assess water/sediment quality, including identification of cause and extent of possible impact.
  • logic tables (recommended) — results of an assessment for each of (typically) 3 or more lines of evidence are evaluated against a set of standard conclusions derived from all possible combinations of outcomes. The best-known example of logic tables is the sediment quality triad (SQT) developed by Chapman (1990), where evidence is acquired from chemistry, toxicity tests and biological surveys.

Criteria-guided judgement and logic tables are commonly applied in the literature to condition, causal or predictive assessments of ecosystems. They are the 2 approaches recommended in the Water Quality Guidelines.

Logic tables are more readily adapted to the planning phase of water/sediment quality assessments. They can be used to guide the selection of indicators for measurement, not just for the evaluation of evidence, so they have broader applicability across key steps in the Water Quality Management Framework (discussed later).

Logic tables can be improved when the matrix of standard conclusions is adapted for specific cases, which Suter & Cormier (2011) termed ‘case-specific logic’. For these reasons, logic tables are the preferred method for demonstrating weight of evidence in a case-specific manner across 7 typical uses in the Water Quality Guidelines.

If necessary, you can derive even stronger conclusions by applying both logic tables and epidemiological criteria to water/sediment quality assessments, and we provide additional guidance on this approach.

Selecting multiple lines of evidence at the outset of an investigation

There have been other advances made to the ‘integrated water quality assessment’ concept in the ANZECC & ARMCANZ (2000) guidelines and how we are applying weight of evidence.

Decision trees developed in the ANZECC & ARMCANZ (2000) guidelines for toxicants in waters and sediments were largely based on testing against guideline values and invoked additional investigations if the measured concentrations exceeded the respective guideline values. Those further studies could either:

  • reach the measured (or modelled) bioavailable forms of the toxicants (assuming that the guideline values were conservative and were true triggers for further investigation), or
  • if necessary, include site-specific investigations involving toxicity or biodiversity assessment.

The focus on chemical assessments until a problem is encountered presents risks because the information required from the subsequent receptor lines of evidence may not be available through lack of baseline data. Such decision trees most usefully serve the role of reaching the bioavailable fraction within the chemical and physical line of evidence.

The need to anticipate the requirement for more information from additional lines of evidence in future assessments means that the weight-of-evidence process developed for multiple lines of evidence in the Water Quality Guidelines is more than a traditional assessment of water/sediment quality.

Weight-of-evidence evaluations are typically made after data have been gathered and are being assessed for guideline value exceedances (at Step 6 in the Water Quality Management Framework).

We provide you with advice on the desired lines of evidence to consider at the outset of an investigation. This advice is provided by highlighting potential outcomes in the various combinations of lines of evidence selected. This helps you to anticipate possible future assessments, either written into licensing and reporting agreements or arising from unexpected events.

Sometimes the need for additional lines of evidence may not be immediately apparent until some measurements have been made (a decision made at Step 7 in the framework).

Making early decisions about which lines of evidence to measure ensures:

  • adequate baseline data are available to properly detect and assess possible future effects
  • management goals for aquatic ecosystems, which are often couched in terms of protection of biodiversity, have suitable surrogate indicators to assess possible change.

Real-world water and sediment quality issues will often focus on:

  • chemical exceedances
  • environmental confounding
  • unknown contaminant mixtures
  • the need for timely results
  • improved inference (often poor with single lines of evidence)
  • efficient targeting of sites for remediation.

These issues are best addressed through weight-of-evidence evaluations and this is why weight of evidence is the central platform for water/sediment quality assessments in the Water Quality Guidelines.

Defining components in the process

The strongest conclusions arising from a water/sediment quality assessment will be met when lines of evidence are selected from the PSER causal pathway (Figure 1).

Figure 1 Weight-of-evidence process across the pressure–stressor–ecosystem receptor (PSER) causal pathway

Figure 1 Weight-of-evidence process across the pressure–stressor–ecosystem receptor (PSER) causal pathway

Pressures

Pressures are external activities that affect water quality. The consideration of pressure lines of evidence, such as land-based activities, are important but are out of scope for this discussion.

Stressors

For ecosystem protection, the stressors may be:

  • chemical (e.g. toxicants where a guideline value is based on assessments of toxicity)
  • physical and chemical (stressors that are either directly or indirectly toxic where guideline values are determined from background or reference-site data), or
  • associated with other causes (e.g. flow).

Chemical and physical

Chemical and physical line of evidence comprise:

  • chemical indicator types — constitute dissolved contaminant concentrations for waters or particulate or pore water concentrations for sediments. These are fundamental measures for comparison against default guideline values (DGVs) or site-specific guideline values. Ideally, they should be refined to consider chemical speciation and the bioavailable fraction of the measured concentrations.
  • physical indicator types — associated with water quality and include turbidity and water temperature. A number of chemical and physical properties of waters and sediment are also important as co-stressors that modify bioavailability.

Non–water quality

Non–water quality line of evidence measures might affect the ecosystem receptors, particularly biodiversity, such as:

  • stream flow
  • invasive species
  • catchment alteration.

Measurement programs for these non–water quality related stressors should be conducted if there is any chance of confounding (Suter & Cormier 2013) in the interpretation of water quality effects.

Ecosystem receptors

Measures of ecosystem receptors include biodiversity, toxicity and biomarkers. Biodiversity — the key line of evidence for ecosystem receptors — is often linked directly to management goals. Toxicity and biomarkers provide supportive lines of evidence for inference and causation.

Biodiversity

Alterations to biodiversity (populations and communities of organisms) may manifest in both the short and longer term, depending on the biotic assemblage examined. In future, effects on populations and communities will be more readily and efficiently assessed using improved ecogenomic techniques.

Toxicity

Toxicity is an important measure of short-term response (days), with acute or chronic endpoints measured either in the laboratory or in semi-field settings.

Linking cause and effect is important when evaluating different lines of evidence. For toxicity testing, toxicity identification and evaluation (TIE) can be important in helping to identify a cause where an adverse laboratory response is measured (compared to confounding with biodiversity observations).

Biomarkers

Bioaccumulation provides an indication of the extent to which contaminants are bioavailable and taken up by organisms, and is best quantified by comparison with controls. The physiological effects of bioaccumulation come from other lines of evidence.

Biomarkers can be used as indicators of exposure or histopathological effects. The ecological relevance of biomarkers is an important consideration in defining the significance of the measured effects and how reversible they are.

Understanding stages of the process

The weight-of-evidence process illustrated in Figure 1 depicts a chain of steps from which indicators are selected and then evaluated for their suitability, to help you determine whether or not you have achieved a water/sediment quality objective.

Select indicators for measurement (lines of evidence)

The 2 key components in the indicator selection hierarchy are:

  • causal pathway — comprises the key elements of the PSER conceptual contaminant pathway that depict causal links: pressure, stressor, ecosystem receptor
  • line of evidence — contained in each of the elements of the causal pathway:
  • each pressure represents a single line of evidence, although a water or sediment quality issue may involve more than one pressure (as depicted by Pressure x, Pressure y in Figure 1)
  • either water/sediment–quality related stressor (e.g. chemical and physical) or non–water quality stressor lines of evidence.
  • receptor lines of evidence, including biodiversity, toxicity and biomarkers.

What’s not illustrated in Figure 1 is that each line of evidence contains broad indicator types, including indicators, from which specific parameters are selected for measurement (Table 1), where:

  • indicators are parameters that can be used to provide a measure of a pressure, stressor or ecosystem condition response
  • parameters are measurable or quantifiable characteristics.

Table 1 Examples of indicator types, indicators and parameters relevant to each line of evidence

Line of evidence Indicator type Indicator Parameter
Pressure Cropping Pesticide use Tonnes of insecticide applied per hectare per year
Stressor (chemical and physical) Toxicants, or physical and chemical stressors (2 fixed types recognised) Ammonia or dissolved oxygen Total ammonia or dissolved oxygen in percent saturation
Stressor (non–water quality) Altered flow or sedimentation Stream discharge or sediment movement Total stream volume per unit time, or sediment particle size
Ecosystem receptor (biodiversity) Biotic assemblages or individual species Benthic macroinvertebrate communities or species population size Macroinvertebrate community structure or total species abundance
Ecosystem receptor (toxicity) In situ or laboratory toxicity Chronic toxicity to fish 14-day fish growth measurement
Ecosystem receptor (biomarkers) Bioaccumulation, biomarkers of exposure or biomarkers of effect Metal body burden, genetic biomarker or histopathology Copper tissue concentration in mg/kg, DNA strand breaks or histological alterations

Collect and analyse the evidence

After indicators and respective parameters are selected for each line-of-evidence investigation, a measurement program ensues, which culminates in analyses to determine whether or not the measured responses indicate a water-quality related change.

Evaluate the results

The mix of results from the different lines of evidence assessed in this way is then evaluated against a set of standard conclusions derived from all possible combinations of outcomes — or against other criteria — to draw conclusions about water-quality related change and, if confirmed, the possible cause and extent of impact.

Applying the process to water/sediment quality assessments

Work through the Water Quality Management Framework

Weight of evidence is applied at 3 key steps in the Water Quality Management Framework (as well as Steps 2 and 7):

  • Step 1: Examine current understanding — identify critical PSER causal pathway elements during initial conceptual modelling
  • Step 3: Define relevant indicators — select lines of evidence and associated indicators (key indicators from the lines of evidence representing PSER are identified to best infer impact)
  • Step 6: Assess if draft water/sediment quality objectives are met — compile and evaluate lines of evidence through a weight-of-evidence evaluation (evidence arising from the multiple lines of evidence selected at Step 3 are evaluated to draw conclusions about ambient water/sediment quality).

We explain how to use weight of evidence in each of these 3 steps, focusing on the protection of aquatic ecosystems. This approach is also applicable to other community values.

Complete initial steps before defining relevant indicators

For any water/sediment quality assessment, the initial step is to document the current understanding for a particular issue. This involves the identification of pressures and their associated stressors (water-quality and non–water quality related), and likely ecosystem (biological) receptors and their responses (at Step 1 of the framework). This is best achieved by creating a conceptual model that depicts the causal links along the conceptual contaminant pathway.

When defining management aims (at Step 2 of the framework), community values, management goals and levels of protection are established.

Based on your actions at Steps 1 and 2, the relevant lines of evidence are determined in Step 3 of the framework (and revisited in Step 7 of the framework).​

Consider which lines of evidence to select

Your choice of lines of evidence — as determined by the PSER causal pathway — is not constrained but it should be chosen to suit the particular issue and its understanding, as captured in the conceptual model.

A single line of evidence cannot address all the desired outcomes from a weight-of-evidence evaluation, such as detecting and determining the extent of impacts, and determining the likely cause.

We have summarised some benefits and weaknesses of different lines of evidence in Table 2.

Table 2 Considerations in the selection of pressure, stressor and ecosystem receptor lines of evidence

Causal pathway element Lines of evidence Considerations (in isolation of other lines of evidence)
Pressure Measures of the pressures (or surrogates) responsible may correlate with such ‘events’ and identify priorities for management Must be linked to measurement of stressor and community value receptors
Stressor Chemical and physical stressor:

 

  • Direct measure of potential cause; significant exceedance could lead directly to management action (e.g. remediation)
  • Sediment chemistry may record past events (archival value)
  • Uncommonly representative of the management goals (unless ‘no change’)
  • Possible presence of multiple (unmeasured) toxicants responsible
  • Toxicant(s) or nutrients in waters may be transient or taken up in the system; less so for sediments
  • Observations may be unrelated to toxicants
Non–water quality related stressor:

 

  • Eliminate confounding; identify other factors potentially responsible for observations
None
Ecosystem receptor Biodiversity:

 

  • Very often directly linked to management goals
  • Magnitude and extent of impact
  • Macrobenthos studies may capture the effects of transient toxicants (pulsed releases) that chemistry and toxicity testing may miss; respond to gradients
  • Some taxa with specific water quality responses (e.g. nutrients/algae) (diagnostic)
Impacts on biodiversity include many non–water quality related stressors
Toxicity:

 

  • Identify a water quality problem (comparable with biodiversity confounding)
  • Toxicity identification and evaluation (TIE), to identify a cause of toxicity response (diagnostic)
  • Identify the amount of toxicity in the event of a spill or other incident
Toxicant(s) in waters may be transient in the system; less so for sediments
Biomarkers:

 

  • Measure of exposure; may capture transient toxicants (pulsed releases) that chemistry and toxicity testing miss; ecosystem, human health and other community values
Suitable organisms may not be available; stressor may not bioaccumulate

Desirable outcomes and associated lines of evidence from a weight-of-evidence evaluation include:

  • assessing achievement of management goals, magnitude and extent of impact — biodiversity
  • capturing the effects of transient toxicants (pulsed releases that chemical and physical stressors and toxicity may miss) — macrobenthos studies (biodiversity), biomarkers and, in some cases, sediment chemistry
  • associating an effect with a water quality cause and possibly identifying the stressor or its type (usually not possible with sole measurement of biodiversity) — toxicity and TIE, combined with chemistry.

These generic benefits and limitations reinforce the need to include a number of lines of evidence in a water quality monitoring program to properly assess potential effects.

Rate the quality of evidence

Our approach to the weight-of-evidence process encourages you to consider the selection of the desired lines of evidence at theoutset of an investigation. This is why the process is well suited to all types of uses and situations, including those programs in the planning and baseline data collection phases (refer to our examples).

When selecting suitable lines of evidence, we recommend that you construct a quality of evidence table that will help you to determine the number of lines of evidence needed to satisfactorily reach a conclusion.

Quality of evidence will increase as you add lines of evidence. The examples we provide later use a qualititative rating. In general, selection of just one line of evidence has potential to generate ‘low’ quality evidence in a water/sediment quality assessment. Inclusion of more lines of evidence improves the quality from moderate to high. However, in many instances, not all lines of evidence will be required.

Your choice might well be a balance between cost and the level of quality associated with the suite of lines of evidence.

Nevertheless, the risks in not acquiring sufficient baseline information for key stressors and ecosystem receptors (e.g. biodiversity) to assess potential effects were discussed earlier. If there are multiple pressures, then you may also need to choose a set of lines of evidence to address each pressure and attribute the observed responses among them.

Issues will generally have associated pressure(s). In some instances, the pressure may be the issue (e.g. acid sulfate soils) or it may relate to a number of actual or potential issues (e.g. multiple agricultural developments in a catchment).

For any matrix combination, the power of detection in field measurement programs will be increased by including more reference or control sites and/or more monitoring sites placed along any putative or probable disturbance gradients. Read about factors dictating lines of evidence and indicator selection.

We provide examples of the quality of evidence associated with selecting different combinations of lines for the 7 typical uses of the Water Quality Management Framework.

Quality of evidence tables assume that:

  • matrix outcomes are based on adequate field and experimental designs
  • processing, analyses and reporting of water quality samples (including biological samples) and data are undertaken to high standards and in a timely manner.

Constructing and populating a quality of evidence table for each pressure will provide you with an appropriate mix of lines of evidence for the diagnostic information required for water/sediment quality evaluations.

We have provided quality of evidence tables associated with different combinations of the lines of evidence selected for the 7 typical uses of the Water Quality Management Framework.

One of our quality of evidence tables, investigating an unexpected event, has been populated with as many matrix combinations as practically conceivable. Only a few key matrix combinations are provided for the other typical uses. Therefore, when assessing the quality of evidence for these other uses it is important to apply similar logic to that used for the well-populated example.

For each of the selected lines of evidence, choose which indicator type and specific indicators to select for the issue based on:

  • type of pressure and associated stressors
  • management goals
  • assigned levels of protection
  • spatial scale (broadscale, site-specific)
  • water type (freshwater, marine water)
  • compartment (water or sediment)
  • ecosystem type (e.g. river or wetland)
  • location (e.g. wet–dry tropics, temperate coastal rivers of south-eastern Australia).
Expand all

Evaluating multiple lines of evidence

The measurement program for each line of evidence investigation culminates in analyses to determine whether the measured responses indicate a water–quality related change (e.g. guideline value exceedance, observed toxicity, significant statistical test for change in a field biological response). Refer to Data analysis and interpretation template.

For the interpretation of different combinations of results arising from these analyses, ‘default’ interpretative tools provided in the Water Quality Guidelines are:

  • qualitative tabulation
  • criteria based on known toxicity-based or other causal responses (e.g. derived from dose–response data from laboratory, field or mesocosm studies).

(For interpretation using criteria, various combinations of these criteria, based on strength, consistency and specificity, provide the strongest evidence for causality).

Different approaches in the literature for more complex multiple line-of-evidence evaluations within and across pressures are provided in Weight-of-evidence evaluation methods.

A weight-of-evidence evaluation of multiple lines of evidence can be undertaken in a few ways, varying from qualitative and semi-quantitative to fully quantitative approaches.

Qualitative assessments

Qualitative approaches involve the use of best professional judgement to determine how the evidence from individual lines of evidence supports a final assessment of cause and effect.

One such approach, in its simplest and most generic form, is illustrated in Table 10 . Possible interpretations of the findings are based on responses recorded for the various lines of evidence (all assumed to have been measured and assessed). A response indicates, for example, guideline value exceedance, observed toxicity or significant statistical test for change in a field biological measurement.

Table 10 Interpretations of likely combinations of line of evidence responses assessed in relation to guideline values and reference-site data

Responses from chemical and physical, toxicity, biodiversity and biomarker lines of evidence Interpretation
No responses No exceeded guideline values and no effects on the ecosystem.
Chemical and physical response only Contaminants present at concentrations exceeding guideline values but not bioavailable.
Toxicity response only Toxic effects due to unmeasured contaminants or an unidentified stressor.
Biodiversity response only Unmeasured contaminants or other factors (e.g. another pressure) contributing to ecological effects.
Chemical and physical and biomarker responses Contaminants exceeding guideline values and bioaccumulating but not toxic.
Chemical and physical and biodiversity responses Toxicity not seen using the test organisms but effects are still seen on biodiversity (toxicity testing may not have been representative of sensitive taxa or did not reflect higher-level ecosystem responses).
Chemical and physical and toxicity responses Some resistance to effects on biodiversity (ecosystem resilience overwhelming toxicity to some species), or test species not representative of receiving ecosystem sensitivity.
Toxicity and biodiversity responses Unmeasured contaminants or stressors are toxic and affecting ecosystem health.
Chemical and physical, toxicity and biomarker responses Measured contaminants are toxic and accumulating but no significant ecological effects are observed (mitigating processes occurring, or ecosystem may have acquired tolerance).
Chemical and physical, toxicity, biodiversity and biomarker responses Measured contaminants exceed guideline values, are toxic and bioaccumulating, and affecting ecosystem health.

The concept in Table 10 has been extended for the Water Quality Guidelines, to evaluate possible interpretations of the findings from the same 7 typical uses of the Water Quality Management Framework used to demonstrate quality of evidence.

Tables 11 to 18 illustrate possible interpretations of the findings based on positive responses recorded for the various lines of evidence, where a positive response might be, for example, guideline value exceedance, toxicity, contaminants present in body tissues, or change to a biodiversity indicator. Indicators representing all lines of evidence are assumed to have been measured.

As with the quality of evidence tables, our example for investigating an unexpected event (Table 14) has been populated with as many matrix combinations as practically conceivable. For the other typical uses, only several key matrix combinations are provided so it is important to note this and to apply similar logic as used for the well-populated example when considering the other uses.

We again assume that the matrix outcomes are based on adequate field and experimental designs, and that processing, analyses and reporting of water quality samples (including biological samples) and data are undertaken to high standards and in a timely manner.

The greatest confidence in the interpretation of each table representing each typical use is obtained with the maximum number of similar responses. An evaluation rating judgement is deduced (in column 3 of the tables) of the certainty in identifying the cause. This could be strengthened by introducing a consequence column. For example, the proportion of habitat affected in a sediment study was ranked as negligible, minor, moderate, major, severe or catastrophic in a study by MacDiarmid et al. (2014).

Table 11 Lines-of-evidence evaluation using the weight-of-evidence process when developing a water quality management plan

Causal pathway elements Lines of evidence that responded Evaluation rating Evaluation conclusion Recommended management response
Stressor
  • Chemical and physical (S)a
Low Quantify toxicants and other stressors, taking into account effects of events (e.g. high rainfall). Limited evidence for management.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Chemical and physical (S)
Moderate Biodiversity integrates the broadscale ecosystem response. Chemical and physical line of evidence identifies stressors to be managed. Need detail on spatial scale, point sources but evidence for concern.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Chemical and physical (S)
  • Toxicity (ER)
High Addition of toxicity assessment may highlight additional concerns. Gives greater certainty to toxicants requiring management.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Biomarkers (ER)
  • Chemical and physical (S)
  • Toxicity (ER)
High Bioaccumulation may indicate particular toxicant(s) of concern and integrate variable bioavailability of them. Strong evidence for action.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Biomarkers (ER)
  • Chemical and physical (S)
  • Non-water quality (S)
  • Toxicity (ER)
Very high Other factors, such as invasive species, cyanobacteria, periphyton blooms, salination and flow, would be useful to clearly attribute any biological responses to the correct cause. Addition of biomarkers of effects might provide evidence of stressor exposures to sensitive biota and suborganism responses by them. A complete suite of data to manage the water body. Additional expense needs to be justifiable by the size or value of a healthy ecosystem.

a A single toxicity or biodiversity line of evidence will also have a low evaluation rating.

Table 12 Lines-of-evidence evaluation using a weight-of-evidence process to apply for a development approval

Causal pathway elements Lines of evidence that responded Evaluation rating Evaluation conclusion Recommended management response
Stressors
  • Chemical and physical (S)a
Low Need background data but these are only part of the story. Inadequate background data to allow development.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Chemical and physical (S)
Possibly high Background ecological condition also needed and will likely highlight key potential sensitivities of the receiving ecosystems to the proposed development. Improved information, identifying key species to be protected.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Chemical and physical (S)
  • Toxicity (ER)
High The level of toxic effects and potential sensitivity of the local ecosystem can be used to assess potential added stressor effects. Good background to compare with proposed contaminant releases.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Biomarkers (ER)
  • Chemical and physical (S)
  • Toxicity (ER)
Very high Bioaccumulation provides additional background data for the toxicant(s) of concern and can highlight areas of higher background bioavailability of contaminants. Ideal evidence to decide on development approval.

a A single toxicity or biodiversity line of evidence will also have a low evaluation rating.

Table 13 Lines-of-evidence evaluation using a weight-of-evidence process to assess compliance with a waste discharge licence

Causal pathway elements Lines of evidence that responded Evaluation rating Evaluation conclusion Recommended management response
Stressors
  • Chemical and physical (S)a
Low For comparison with licence values but needs translation to the bioavailable fraction in the receiving system. If exceeding the licence, is generally sufficient for action.
Stressors and ecosystem receptors
  • Chemical and physical (S)
  • Toxicity (ER)
High Adding measure of effects might consider spatiotemporal chronic effects on biota (noting discharges may be intermittent). Strengthens the case indicating that contaminants are indeed in a bioavailable form.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Biomarkers (ER)
  • Chemical and physical (S)
  • Toxicity (ER)
Very high Add a consideration of ecosystem health for comprehensive evaluation. Bioaccumulation can also be usefulto assess the potential for long-term effects. Normally fast action is required but this provides evidence of longer-term potentially damaging effects and possible need to renegotiate the licence.

a. A single toxicity or biodiversity line of evidence will also have a low quality of evidence rating

Table 14 Lines-of-evidence evaluation using a weight-of-evidence process to investigate an unexpected event (e.g. fish kill)

Causal pathway elements Lines of evidence that responded a Evaluation rating Evaluation conclusion Recommended management response
None
  • na
Low Event remains unexplained although effect was only short-term (no ecosystem response). Clean up and further research and monitoring.
Ecosystem receptors
  • Toxicity (ER)
Moderate Water quality inferred but transient event (no biodiversity effect); toxicity not due to target toxicant(s); conduct TIE. Remediation, TIE and further research and monitoring.
Ecosystem receptors
  • Biodiversity (ER)
Low to moderate to high Only moderate–high if linked to any pressure or non–water quality related stressor, or spatial gradient sourced to plausible cause. Otherwise lack of pressure, stressor or gradient information limits conclusions. Remediation and mitigation against future occurrences; possibly further research or monitoring to identify the effect pathway.
Pressure
  • Pressure (P)
Moderate to high Water–quality related pressure known to result in such events measured or observed (correlated with the event) but no evidence of the contaminant(s) and no ecosystem responses, hence transient (pulse) event. Remediation and mitigation against future occurrences, continued monitoring if necessary to identify toxicant.
Pressure and stressors
  • Non-water quality (S)
  • Pressure (P)
Moderate to high Non–water quality related pressure and associated stressor known to result in such events measured or observed (correlated with the event) (e.g. illegal fishing, dumping of dead fish) but no water quality contaminant(s) and no ecosystem responses. Clean up and possible litigation or prosecution. Consider increased surveillance to prevent future occurrences.
Stressors
  • Physical & chemical (S)
Moderate to high Plausible or likely toxicant measured but no ecosystem responses could suggest modifying (water quality) factors (e.g. low pH) transient in the system. Remediate if necessary, and further research and monitoring.
Ecosystem receptors
  • Biomarkers (ER)
Moderate to high Bioaccumulation or biomarker response but no toxicant(s) exceedance, toxicity nor biodiversity response suggests toxicant(s) transient in the system. Further investigation to identify stressor and contaminant pathway to inform remediation and prevent future occurrences.
Ecosystem receptors
  • Chemical & physical (S)
  • Toxicity (ER)
Moderate to high Better identifies the cause of toxicity. May be sufficient to target source of the event.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Chemical & physical (S)
  • Toxicity (ER)
High Biodiversity response, toxicant(s) exceedance and toxicity but no bioaccumulation indicating toxicant not one to bioaccumulate. Remediation and mitigation against future occurrences.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Biomarkers (ER)
  • Chemical & physical (S)
  • Toxicity (ER)
Very high All lines of evidence identifying likely water–quality related cause. Focused remediation and mitigation against future occurrences.

na = not applicable; TIE = toxicity identification evaluation.
a. All lines of evidence assumed to have been measured. The event or response (e.g. dead fish) is assumed and is not indicated in the evaluation table.

Table 15 Lines-of-evidence evaluation using a weight-of-evidence process to assessing a remediation study (e.g. remediation of contaminated sediments)

Causal pathway elements Lines of evidence that responded Evaluation rating Evaluation conclusion Recommended management response
Stressors
  • Chemical and physical (S)a
Lowa Unless there is an obvious toxicant whose distribution can be mapped, in which case higher. Insufficient information for remediation unless effects can be demonstrated.
Stressors and ecosystem receptors
  • Chemical and physical (S)
  • Toxicity (ER)
Moderate to high Better defines the need for remediation. Possibly sufficient to justify action.
Stressors and ecosystem receptors
  • Biomarkers (ER)
  • Chemical and physical (S)
  • Toxicity (ER)
High Better defines the need for remediation. Bioaccumulationmay be critical for human health and food-chain assessment. This is often the basis for remediation being implemented. A better selection of evidence to justify remediation.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Biomarkers (ER)
  • Chemical and physical (S)
  • Toxicity (ER)
Very high Biodiversity usually does not identify the remedial actions needed pre-remediation but often is a trigger for remediation action occurring. Can contribute to priority for remediation because of the severity of impact and ecosystem consequences (food chain) if not undertaken. Better defines the extent of an area requiring remediation. Assessment of recruitment or re-establishment of biodiversity post-remediation is the most important measure of remediation success.

a. A single toxicity or biodiversity line of evidence will also have a low quality of evidence rating

Table 16 Lines-of-evidence evaluation using a weight-of-evidence process to conduct a baseline study (e.g. greenfields location prior to development)

Causal pathway elements Lines of evidence that responded Evaluation rating Evaluation conclusion Recommended management response
Stressors
  • Chemical & physical (S)a
Lowa Only part of the story and provides no basis of comparison for other lines of evidence should they be added to the assessment of water/sediment quality after development has occurred. Inadequate baseline information. Need additional lines of evidence.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Chemical & physical (S)
Possibly high If ecology is similar to reference and a robust baseline can be established at reference/control and potential impact sites, and it is known that the biodiversity indicators selected respond sensitively to the chemical and physical stressor indicators. Assumes that the development is unlikely to affect other stressors and/or not affect other biodiversity indicators preferentially. Need to establish baseline for a wide suite of potential stressors to anticipate future development effects. Need to confirm that no background toxic effects.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Chemical and physical (S)
  • Toxicity (ER)
High Important to establish baselines for all lines of evidence. Unlikely to be toxicity for greenfields sitebut baseline assessments will help to establish potential toxicity of the operating site and site-specific testing will help to determine the sensitivity of the local ecosystems. Good evidence to assist management but could be strengthened by baseline bioaccumulation.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Biomarkers (ER)
  • Chemical and physical (S)
  • Non-water quality (S)
  • Toxicity (ER)
Very high A bioaccumulation baseline for future assessments is also desirable. Need for future flexibility rather than immediate insight. Establishing baseline habitat condition can assist in later assessment of non-chemistry pressures. This is the best combination that could be expected.

a A single toxicity or biodiversity line of evidence will also have a low evaluation rating.

Table 17 Lines-of-evidence evaluation using a weight-of-evidence process to assess a broadscale monitoring program

Causal pathway elements Lines of evidence that responded Evaluation rating Evaluation conclusion Recommended management response
Stressors
  • Chemical and physical (S)a
Lowa Quantify toxicants and other stressors, taking into account the effects of events (e.g. high rainfall). May need integrating samplers. Unlikely to detect all stressors at the scale of interest. Initial survey only but know management actions justifiable.
Stressors and ecosystem receptors
  • Chemical and physical
  • Toxicity (ER)
Moderate Evidence of potential toxic effects can help identify sources. Focus for initial management action.
Stressors and ecosystem receptors
  • Chemical and physical (S)
  • Biodiversity (ER)
Moderate Broadscale health is only part of the story. Needs additional short-term effects. Health ecosystem might not identify localised effects.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Chemical and physical (S)
  • Non-water quality (S)
Moderate to high Biodiversity integrates the broadscale ecosystem response and responses to stressors not measurable with chemistry alone. Chemical and physical line of evidence identifies likely stressors to be managed. Rapid assessment protocols (e.g. AUSRIVAS) are commonly used for ecology due to cost of implementation at the broad scale in terms of both time and money. This may limit the sensitivity of the assessment. Could often include other factors, such as invasive species, cyanobacteria, periphyton blooms, salination and flow (at the broad scale, there will rarely be a single pressure). Defines the background status of the system but may not identify point sources.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Chemical and physical (S)
  • Non-water quality (S)
  • Toxicity (ER)
High Addition of toxicity assessment may highlight additional concerns and can have better statistical power per unit of expenditure than biodiversity. Habitat alteration can be an important other stressor to account for. Good measures of overall water quality.
Stressors and ecosystem receptors
  • Biodiversity (ER)
  • Biomarkers (ER)
  • Chemical & physical (S)
  • Non-water quality (S)
  • Toxicity (ER)
Very high Bioaccumulation may indicate particular toxicant(s) of concern, particularly if exposure occurs in pulses. Complete dataset to identify changes in system status.

a A single toxicity or biodiversity line of evidence will also have a low evaluation rating.

We recommend that qualitative assessments be further strengthened by adding a criteria-based evaluation (Table 18) to address the eco-epidemiological criteria recommended by Hill (1965), as mentioned earlier in criteria-guided judgement).

Our quality of evidence and weight-of-evidence evaluation tables have been populated one pressure at a time. Land clearing vs pesticides, for example, will require different PSER lines of evidence. Additional tables (using the same logic where multiple pressures are evident) will be required for each pressure.

Table 18 Criteria to formalise the use of independent lines of evidence in inferring causation in effect studiesa,b

Criterion Description Example
Strength of association Size of the correlation between the intensity of the disturbance and the response of the measurement parameter Sites with high concentrations of the toxicant have lower population densities of an organism than sites with low concentrations of the toxicant
Consistency of association The association between the disturbance and the measurement parameter has been repeatedly observed in different places, circumstances and times The negative correlation between concentrations of the toxicant and the densities of the organism has been demonstrated in several other studies by other investigators elsewhere
Specificity of association The observed effect is diagnostic of exposure to the disturbance In this case, a decrease in density of the organism is not diagnostic of the disturbance because the population density of the organism may be reduced by other natural processes
Presence of stressor in tissues Measurement parameters of exposure (e.g. residues, breakdown products) Breakdown products of the toxicant are found in tissues of organisms
Timing Exposure to the disturbance must precede the effect in time Accidental spillages of the toxicant are usually followed by sharp declines in the density of the organism
Biological gradient A dose–response relationship exists (response of measurement parameter is a function of increases in magnitude of disturbance) Laboratory toxicology tests have established a dose–response relationship
Biological plausibility There is a biologically plausible explanation for causality, even if the precise mechanism is unknown The toxicant comes from a group of chemicals known to interfere with respiration in this organism
Coherence The causal interpretation should not seriously conflict with existing knowledge about the natural history of the organism and the behaviour of any substances associated with the disturbance The organism is usually common in sites within the study region and is present year-round; the toxicant is readily soluble and does not break down readily while in solution
Experimental evidence A valid experiment provides strong evidence of causation A field experiment demonstrated rapid mortality in response to the addition of known concentrations of the toxicant
Analogy Similar disturbances cause similar effects Other chemicals related to this toxicant have shown similar dose–response curves and responses in field experiments with different but related species

a. From ANZECC & ARMCANZ (2000) guidelines
b. A hypothetical example of the response of biological measurement parameters to a toxicant, as an illustration

Semi-quantitative assessments

Semi-quantitative (logic table) assessments have been widely adopted for use in sediment quality assessment. Our approach here uses a numerical scoring system as shown in Table 19, where scores indicate a weight-of-evidence assessment of no significant (1), moderate (2) or significant (3) adverse effects. The scores are based on defined measurement responses, as indicated.

Table 19 Proposed scoring system for lines of evidences in a sediment quality weight-of-evidence assessment

Line of evidence Indicator type Score 3 Score 2 Score 1
Chemical and physical Sediment chemistry Concentration > SQGV-high Concentration > GV, < SQGV-high Concentration < GV
Pore water chemistry Concentration > WQGV-HC10a Concentration > WQGV-HC5a, < WQGV-HC10 Concentration < WQGV-HC5
Toxicity ≥ 50% effect vs control 20–50% effect vs control < 20% effect vs control
Biomarkers Bioaccumulation Significantly different (p < 0.05) and > 3 × controlb Significantly different (p < 0.05) and £ 3× control Not significantly different from control
Biodiversity Biodiversity Significant and high effects on abundance or diversity Significant but moderate effects on abundance or diversity No significant effects on abundance or diversity

GV = guideline value, SQGV = sediment quality guideline value, WQGV = water quality guideline
a. HC5 and HC10 are the guideline values for 90% and 95% species protection, respectively.
b. For essential substances that are well regulated, significant difference from control/reference will be the most important characteristic to consider.

Based on the rankings, a score of 3 in any line of evidence is sufficient to score 3 in the overall assessment of significant adverse effects. Scores of 3 from more than one line of evidence obviously enhance the confidence in the overall assessment. An equivalent scoring system for water quality is provided in Table 20.

Table 20 Possible scoring system for lines of evidence in a water quality weight-of-evidence assessment

 

Line of evidence Indicator type Score 3 Score 2 Score 1
Chemical and physical Chemistrya Bioavailable concentration > HC10b Bioavailable concentration > HC5b, < HC10 Bioavailable concentration < HC5
Toxicity ≥ 50% effect vs control 20–50% effect vs control < 20% effect vs control
Biomarker Bioaccumulation Significantly different (p < 0.05) and > 3 × control Significantly different (p < 0.05) and £ 3 × control Not significantly different from control
Biodiversity Biodiversity Significant and high effects on abundance or diversity Significant but moderate effects on abundance or diversity No significant effects on abundance or diversity

a. A separate chemical and physical line of evidence might consider other stressors (e.g. nutrients), using exceedance of the default guideline value as a measure.
b. HC5 and HC10 are the guideline values for 90% and 95% species protection, respectively.

An example of a semi-quantitative approach to a weight-of-evidence assessment of contaminated sediments is presented in Table 21 (Simpson et al. 2013).

For a series of hypothetical case studies, a range of lines-of-evidence score combinations have been allocated according to the proposed scoring system in Table 19. The highest scoring assessment for any line of evidence is recorded against that line of evidence. The final assessment score is dictated by the maximum scores in any line of evidence leading to an overall assessment. Table 21 helps you to assess the lines of evidence resulting from the summation of the effects of multiple pressures. It indicates the weight-of-evidence scores derived for each combination together with an overall assessment based on these scores for each case. This approach represents an easily adaptable approach that requires a minimum of professional judgement.

Table 21 Weight-of-evidence scores and assessments for 14 examples of contaminated sediments using a semi-quantitative approach to assess different lines of evidence (LOEs) for a single pressure within an ecosystema

Example Chemical and Physical LOE Toxicity LOE Biodiversity LOE Biomarkers LOE Score Overall assessment
A 3 3 3 2 or 3 3 Significant adverse effects from sediment contamination
B 3 3 2 2 or 3 3 Significant adverse effects from sediment contamination
C 2 or 3 3 2 2 3 Significant adverse effects from sediment contamination
D 2 or 3 2 2 1 or 2 2 Possible adverse effects from sediment contamination
E 2 2 or 3 2 1or 2 2 Possible adverse effects from sediment contamination
F 2 2 2 or 3 1 or 2 2 Possible adverse effects from sediment contamination
G 2 or 3 2 or 3 1 2 or 3 2 Toxic chemical stressing system but resistance may have developed at community level
H 1 2 or 3 2 or 3 1 2 Unmeasured toxic chemicals causing effects on communities is possible
I 1 2 or 3 1 1 2 Unmeasured physical or chemical causes of toxicity
J 2 or 3 1 2 or 3 1 2 Chemicals are not bioavailable or community change may not be due to chemicals
K 1 1 2 or 3 1 1 Changes probably not due to measured contaminants
L 1 or 2 1 1 1 or 2 1 No adverse effects
M 1 1 1 1 1 No adverse effects
N 2 or 3 1 1 1 1 Contaminants unavailable

a. Values listed in each LOE category are the highest scoring assessment in that category. For example, under ‘Chemical and Physical LOE’, metals may score 2 and organics may score 3 so the ‘3’ is recorded. The greater the number of 3s recorded in a category, the greater is the weight that LOE category assumes.

Quantitative assessments

Quantitative approaches to weight-of-evidence assessment largely involve more complex statistical analyses (Reynoldson et al. 2002, Smith et al. 2002).

Different approaches include using:

  • multivariate statistics to cluster sites into groups of similar impact (a common approach)
  • meta-analysis to pool empirically derived hypothesis-testing P-values
  • a quantitative estimation of probability of impairment derived from odds ratios (Bayesian analysis is showing some promise here but has yet to be widely adopted; refer to Smith et al. 2002 and Linkov et al. 2015).

Common themes in these 3 strategies include the critical issue of defining an appropriate set of reference or control conditions.

Quantitative measures are more appropriate for large datasets of both test and reference sites, unlike more focused examples with fewer than 10 impacted sites and even fewer reference sites.

There are a number of variants on the approaches for weight-of-evidence assessment we have discussed. For example:

  • inclusion and weighting for the number of contaminants exceeding the guideline values
  • colour-coding the data presentation to provide a better visual indication of the level and extent of contamination
  • inclusion of a human health aspect to the bioaccumulation assessment (a major endpoint of concern for many contaminated site assessments).

Taking management steps after a weight-of-evidence evaluation

The water/sediment quality management goals are deemed to be met when:

  • water quality objectives for those lines of evidence considered essential for informing acceptable water/sediment quality are met
  • results for other supportive lines of evidence are consistent with no compromise to current or future water/sediment quality according to the selected level of protection.

In this case, management should focus on maintaining or improving that quality. This will require a check of any possible improvements to management strategies at Step 8 of the framework, and then implementation at Step 10 of the framework.

Should the weight-of-evidence evaluation conclude that the objectives are not met, adverse trends are evident, or the result is inconclusive (e.g. the stressor was transient in the system), or if there is conflicting evidence from separate lines of evidence, then 3 options are available:

References

ANZECC & ARMCANZ 2000, Australian and New Zealand Guidelines for Fresh and Marine Water Quality, Australian and New Zealand Environment and Conservation Council and Agriculture and Resource Management Council of Australia and New Zealand, Canberra.

Chapman PM 1990, The sediment quality triad approach to determining pollution-induced degradation (PDF, 781KB), Science of the Total Environment 97–98: 815–825.

DNRM 2013. Queensland integrated waterways monitoring framework (PDF, 827KB). Prepared by Water Monitoring and Reporting within the Queensland Department of Natural Resources and Mines, Brisbane.

Hill, AB 1965, The Environment and Disease: Association or Causation? Proceedings of the Royal Society of Medicine 58 (5): 295–300
Linkov I, Massey O, Keisler J, Rusyn I & Hartung T 2015, From "weight of evidence" to quantitative data integration using multicriteria decision analysis and Bayesian methods, Alternatives to Animal Experimentation 32(1): 3–8.

MacDiarmid A, Boschen R, Bowden D, Clark M, Hadfield M, Lamarche G, Nodder S, Pinkerton M & Thompson D 2014, Environmental risk assessment of discharges of sediment during prospecting and exploration for seabed minerals (PDF, 2.2MB), NIWA report WLG2013-66, Hamilton, page 53.

Reynoldson TB, Thompson SP, & Milani D 2002, Integrating multiple  toxicological endpoints in a decision-making framework for contaminated sediments, Human and Ecological Risk Assessment 8(7): 1569–1584.Simpson SL, Batley GE & Chariton AA 2013, Revision of the ANZECC/ARMCANZ Sediment Quality Guidelines, CSIRO Land and Water Report 8/07, Canberra.

Smith EP, Lipkovich I & Ye K 2002, Weight of evidence (WOE): Quantitative estimation of probability of impairment for individual and multiple lines of evidence, Human and Ecological Risk Assessment 8(7): 1585–1596

Suter GW & Cormier SM 2011, Why and how to combine evidence in environmental assessments: Weighing evidence and building cases (PDF, 782KB), Science of the Total Environment 409: 1406–1417.

Suter GW & Cormier SM 2013, A method for assessing the potential for confounding applied to ionic strength in central Appalachian streams, Environmental Toxicology and Chemistry 32: 288–295.

Suter G, Cormier S & Barron M 2017, A weight of evidence framework for environmental assessments: inferring qualities. Integrated Environmental Assessment and Management 13(6): 1038–1044.

USEPA 2016, Weight of Evidence in Ecological Assessment, US Environmental Protection Agency Office of Research and Development, Washington DC, EPA100R16001.