
Statistical FAQs
This is a list of commonly asked (and answered) statistical
questions that apply to both
DUMPStat and CARStat. Answers to statistical questions
that apply only to DUMPStat appear
below. DUMPStat and CARStat
 What do manual
reporting limits do?
 What is the
difference between tolerance limits and prediction limits?
 Should I use
MDLs or PQLs for statistical analysis?
 How is the
median value of the reporting limit for nondetect samples computed
when there is an even number of samples?
 Is the original
sample included in the verification resampling numbers?
DUMPStat
 When can I use
intrawell comparisons?
 What can I
do if I have only one upgradient well?
 What is the Control
chart factor?
 Do I have to
take four independent samples from downgradient wells per semiannual
monitoring event?
 Over what period
of time can I take my background samples?
 What is the minimum
background sample size required to compute detection monitoring statistics?
 If I am using
intrawell comparisons should I continue to monitor the upgradient
well(s)?
 When are nonparametric
prediction limits appropriate?
 What should
I do for VOCs?
 How do control
charts deal with multiple comparisons?
 When computing
tests of normality and lognormality what data should be used?
 How do I adjust
for seasonal variability?
 Should I ever
use ANOVA?
 Does nonparametric
ANOVA correct the limitations of its parametric counterpart?
 Are different
methods required for comparison to ACLs and MCLs?
 If I have constituents
with detection frequencies less than 25% for intrawell or less than
50% for interwell comparisons do I have to wait until I have a minimum
of 13 background samples before I begin computing statistics?
 Do I have to
conduct a statistical analysis if VOCs are detected only in the downgradient
wells?
 Do I need
to compute statistics when all of the background data are below the
MDL/PQL/LOQ?
 Do I need
to compute statistics when all of the downgradient data are below
the MDL/PQL/LOQ?
 My regulator
doesn't want to see nonparametric limits, but DUMPStat automatically
uses them when the data are neither normally nor lognormally distributed  what
can I do?
 Why do I see
trends on my intrawell control charts that aren't there on my time
series graphs for the same wells and constituents?
 What would be sufficient
for pairs with insufficient data?
 Twosided
prediction limits & pH
DUMPStat and CARStat

Manual reporting limits replace the numerical
values of all laboratoryreported detection limits, which are used for
quantifying nondetects. When enabled, they are used in calculations and
also appear on graphs in place of the laboratory results. You can enable
manual limits in Statistical Options or from Set Manual Reporting Limits
on the Constituents menu. To specify a limit for a certain constituent,
select Set Manual Reporting Limits. Choose a constituent from the Reporting
Limits list, enter the desired value in the Manual Limit box and ensure
that Enable Manual Reporting Limits is set. Then click OK to save your
changes. Constituents that do not have values in the Limit column will
continue to use the laboratory values even if Manual Reporting Limits
are selected.

Tolerance limits provide coverage of a percentage
of the total distribution of measurements (e.g., 95%) with a certain
degree of confidence (e.g., 95%). Prediction limits provide coverage
of 100% of the next k measurements with a given level of confidence
(e.g., 95%). With 95% coverage, tolerance limits should be exceeded
by 5% of the measurements with 95% confidence whereas prediction limits
should fail for none of the next k measurements with 95% confidence.

The detection limit is used to determine if an analyte
is present in a sample and the quantification limit is used to make
a quantitative determination of the amount of the analyte in the
sample. USEPA has used the terms MDL (method detection limit) and PQL (practical quantification
limit) to describe two specific approaches of estimating the detection
and quantification limits respectively. If we are comparing a concentration directly
to a standard then it must be greater than the quantification limit in order to provide
a reliable estimate of whether or not the standard has actually been exceeded.
If all that we care about is whether or not the analyte is present or absent in the
sample, then measurements above the detection limit provide that information. Measurements
above the quantification limit can be used directly in the previously described
statistical methods, however, measurements below the quantification limit are
considered to be censored and the appropriate adjustments for censored data should be used.
Both DUMPStat and CARStat use Aitchison's method to adjust for nondetects in computing
normal and lognormal prediction limits. No statistical adjustment is required
for nonparametric or Poisson prediction limits. The primary advantage of Aitchison's method
over other alternatives (e.g., Cohen's method) is that it can accommodate varying reporting
limits which are quite common in practice.

The median always works out unambiguously when
the number of data is odd. When there is an even number of data, there
is more than one definition of the median in common use. The definition
we use is that the median is the smallest value that divides the data
into two equal parts. So for example the set of numbers {0, 0.5 , 1,
2, 2, 5} would have 1 as its median.

No. DUMPStat and CARStat use "Pass
1 of 1", "Pass
1 of 2", and "Pass 2 of 2" and CARStat uses 'None' to
refer only to the numbers of resamples. Thus, choosing Pass 1 of 1
means
that
both
the
initial
sample AND the (single) resample must exceed the limit for the exceedance
to be verified. An alternative terminology exists where the initial
sample is included as part of the 'm' in "Pass n of m" in
which case Pass 1 of 1 would mean that no resampling is being used.
Both terminologies are widely used and care must be taken to ensure
that you are following the strategy detailed in your permit.
DUMPStat

Intrawell comparisons should always be used
when predisposal data are available. When no data prior to disposal
of waste are available, then the owner/operator must provide empirical
justification that use of intrawell comparisons will not mask existing
contamination at the facility. One good approach is to show that constituents
of concern (e.g., VOCs) are not present in the wells and that naturally
occurring constituents show no evidence of increasing trend (e.g.,
using Sen's test).

With only one upgradient well, spatial variability
and potential contamination are completely confounded (i.e. you
can't tell one from the other). To perform upgradient versus downgradient
comparisons and consider spatial variability you need a minimum of
two
upgradient wells.

The Control chart factor is the multiplier that
determines how many standard deviations above the mean the control chart
limit is: SCL = mean + (factor * SD). You can modify the Control chart
factor in the Statistical Options dialog box. There are two settings
for the Control chart factor based on the number of samples. Also, factors
that vary significantly from the default values will be highlighted
with a cautionary color and/or limited to a range.

The requirement for four semiannual samples
is for ANOVA only, which is a technique that DUMPStat does not use because
it is inappropriate for groundwater monitoring. All other methods require
a single semiannual sample once the background is established.Find
out more.

A minimum of eight background samples must be
taken for prediction limits, tolerance limits and control charts. The
samples must be independent and representative of seasonal and spatial
variability at the site. Spatial and seasonal variability apply to naturally
occurring constituents only (e.g., inorganics). Spatial variability
is addressed by either using intrawell comparisons and/or having multiple
upgradient wells. Seasonal variability is addressed by collecting samples
over a period of time that includes the seasons at which downgradient
samples will be collected. For this reason, the eight background samples
should be collected over a period of no less than one year, and preferably
over a two year period in which a constant sampling interval is used
(e.g., quarterly sampling over a two year period for intrawell comparisons
and quarterly sampling over a one year period from at least two upgradient
wells for interwell comparisons). However, all samples required to
establish background should be collected prior to the date of statistical
comparison as required by the regulations.

A minimum of eight background samples (e.g.,
eight samples in each well for intrawell comparisons or four samples
in each of two upgradient wells for interwell comparisons) are required
for a meaningful statistical evaluation.

Yes. It is always wise to perform intrawell
comparisons on both upgradient and downgradient wells. If an exceedance
is seen both in upgradient and downgradient wells, it is usually good
evidence that the potential impact is not from the site. Any data which
helps in evaluating offsite and/or seasonal, regional and climactic
changes should be collected and investigated.

Nonparametric prediction limits are optimal
in the sense that they make no assumptions regarding the specific form
of the underlying distribution. However, as the number of wells and
constituents increase, large numbers of background measurements are
required in order to have reasonable confidence (e.g., 16 or more).
When the sitewide confidence level is poor (i.e. lower than
90%) alternatives based on Poisson prediction limits are often useful.
Poisson prediction
limits can be used regardless of detection frequency and their associated
level of confidence is independent of number of background measurements.
Note that Poisson prediction limits are approximate in that many constituents
will not have a Poisson distribution. For this reason, Poisson prediction
limits should only be used when statistical power analysis reveals
that
there is an insufficient number of background measurements to justify
the nonparametric approach. In addition, Poisson prediction limits
should
only be used with constituents with detection frequencies of less than
50% whereas nonparametric prediction limits are valid regardless of
detection frequency.

VOCs are not naturally occurring and therefore
they should not be found in background groundwater samples. For VOCs,
verified exceedance of the appropriate quantification limit is an indication
of a significant exceedance. Do not apply the previously described statistical
methods to VOCs unless you are doing assessment or corrective action
monitoring and are attempting to determine if a known release of these
compounds is getting better or worse or exceeds a standard. Alternatively,
if VOCs are detected in upgradient wells due to an offsite source, statistical
comparison (i.e. up vs. down) may be appropriate.

As described, combined ShewhartCUSUM control
charts do not explicitly adjust for multiple comparisons. The effects
of verification resampling and increasing number of comparisons produced
by multiple wells and constituents generally balance the sitewide false
positive and false negative rates at reasonable levels, however, there
is no statistical guarantee that they will. Please note that when using
control charts it is particularly important to determine sitewide false
positive and false negative rates via simulation. Certain states (e.g.,
California) require that you select the control chart factor based on
generating a 5% sitewide false positive rate. DUMPStat allows the user
to input the factor in the Statistical Options item of the Settings
Menu and the Intrawell Control Chart Power Analysis can be used to
determine the sitewide false positive rate for varying choices of the
control chart factor.

Tests of distributional form should only be
performed on background data or data that are known with certainty not
to be influenced by the facility. This would typically exclude use of
downgradient data.

In general, you can't adjust for seasonal
variability because you typically do not have a large enough number
of samples in each season to provide a reliable estimate of the effect.
This is not a big problem because seasonal variability is incorporated
into the usual estimate of the background standard deviation, even if
it is not explicitly modeled as a separate variance component. Gibbons
(1994a) and Gilbert (1987) provide methods for seasonally adjusted trend
estimators and this topic is also discussed in the new ASTM
guidance D631298. Note that collecting
samples over a 12 month period is generally sufficient to incorporate
seasonal variability into the background standard deviation.

ANOVA is an extremely useful statistical tool
for designed experiments with random sampling. Unfortunately groundwater
monitoring data do not enjoy such luxuries. Spatial variability becomes
confounded with upgradient versus downgradient comparisons and in general,
ANOVA can be more sensitive to spatial variability (i.e. small
but consistent differences) than a real release (i.e. a large
but highly
variable increase). The reason is that ANOVA compares between well
variability to within well variability. In the absence of contamination,
withinwell
variability is a combination of temporal variability and analytic variability
whereas between well variability is due to spatial variability. Since
spatial variability is invariably large relative to the combination
of temporal and analytic variability, the ANOVA will conclude that
the
ratio of betweenwell variability to withinwell variability is significantly
larger than zero. Of course, the assumption of ANOVA is that under
the
null hypothesis (i.e. no contamination) all wells are drawn
from the same distribution with the same population mean. This assumption
is
justifiable under random sampling. However, this assumption is not
justified
in natural systems in which initial conditions are already different,
for example due to natural spatial variability. One good application
of ANOVA is in testing whether or not the amount of spatial variability
is statistically significant. Here we simply restrict the analysis
to
the upgradient or background wells (which could not be affected by
a release from the site) and if a significant Fstatistic results then
we can conclude that there is significant spatial variability. However,
even in the absence of a significant ANOVA, spatial variability may
still be appreciable but simply not present in the small number of available
upgradient or background wells.

The only difference between nonparametric and
parametric ANOVA is that the nonparametric ANOVA does not assume a specific
distributional form for the concentration measurements whereas the parametric
ANOVA assumes normality. Both models assume independence of the measurements
and homogeneity of variance and both models are severely compromised
by spatial variability.

When comparing measurements to a standard, the
same approach is used (e.g., a 95% upper confidence limit for the mean
of the last four measurements) regardless of how the standard was derived.

No. For interwell comparisons remember that
the number of background samples is pooled over all upgradient wells
so with eight samples in each of two wells you have 16 background samples.
For intrawell comparisons 13 background samples are required for a
nonparametric prediction limit with one verification resample but only
eight background samples are required with two verification resamples
(i.e. fail the first and pass either one of two verification
resamples). Alternatively, Poisson prediction limits can be used with
as few as
four background samples regardless of detection frequency.

Verified quantification of VOCs in a downgradient
well is a statistical exceedance in and of itself. No statistical comparisons
are required.

The LOQ and PQL are both quantification limit
estimates whereas the MDL is an estimate of a detection limit. For statistical
purposes, the smallest measured concentration is the quantification
limit (e.g., PQL or LOQ) therefore if all values in the upgradient wells
are nonquantifiable, the prediction limit becomes the QL. Our level
of confidence in this decision rule is based on the number of background
measurements, the number of comparisons and the verification resampling
strategy. If we have a small background sample size (e.g., the minimum
of eight background measurements) and nothing is detected, there is
still appreciable probability that the true detection frequency is greater
than zero. Since there are typically far more downgradient wells than
upgradient wells we will have a greater chance of detecting the constituent
in a downgradient well, therefore giving the appearance of a potential
release. For this reason, even when nothing is detected in background,
confidence levels associated with using the QL as the nonparametric
prediction limit should be determined. Note, that this does not apply
to VOCs which should not be detected in clean background wells with
any frequency.

Statistical computations are based on background
data only. The fact that a constituent has never been detected and/or
quantified in a downgradient well is irrelevant to the statistical analysis;
however, it may indicate that the constituent adds little to the monitoring
program and should be eliminated from the suite of constituents used
for statistical analysis.

In the DUMPStat statistical options, the "Rare
Event Statistics" setting can be used to override the choice
of nonparametric limits, even for events with high detection frequencies.
When "Poisson" is selected, you will never get a nonparametric
limit. When computing a prediction limit, if the detection frequency
is insufficient to compute a parametric limit (a "Rare event"),
you will either get a nonparametric limit or a Poisson limit, depending
on the "Rare Event Statistics" setting in your statistical
options. For interwell comparisons, if the detection frequency is
sufficient
to compute a parametric limit, the background data are tested for normality.
If they pass this test, you will get a normal limit. If they fail,
the
data are tested for lognormality. If the data are found to be lognormally
distributed, you will get a lognormal limit. If both tests fail, then
the "Rare events" setting is used — even though the
detection frequency is high. If "Nonparametric" is selected
you will get a nonparametric limit. If "Poisson" is selected,
you will get a normal limit even though the data failed the normality
test.

The trend detection for intrawell control charts
is onetailed  that is, only increasing trends are sought. In contrast,
trend detection in time series (implemented in DUMPStat version 2.1.1)
is twotailed, finding both increasing and decreasing trends. In this
case the area under the curve in each tail is half of the area in a
onetailed test, so that a trend in time series must be more pronounced
to be detected. The same Sen's test is being used for each analysis,
but results can differ based on the 'tailedness' of the detection.

Surface and air monitoring use the same minimum
number of background samples as the rest of the analyses, chosen from
the statistical options. If the number of pairs of samples for a particular
con/well is less than or equal to the minimum number of samples, the
UCL's for the upstream and downstream sample points cannot be computed.

The pH measurement differs from concentration
measurements in that there is a numerical maximum and minimum that a
result must fall within to be considered acceptable. To account for
this, the prediction limits treat the constituent pH differently from
others by computing a twosided limit. This is one of the few places
where prediction limits are more useful than control charts. While the
Shewhart portion of the control chart does account for the twosided
nature of pH, the CUSUM measure on the control chart is designed to
identify significant increases, and is therefore not a useful indicator
for decreases in pH.
DUMPStat will identify only one constituent name as 'pH'.
More specifically, amongst all data records where 'pH' occurs
as either the whole constituent name, or as the first distinct word
in a constituent, DUMPStat will designate only one name as being pH.
Thus, 'pH' or 'pH field' could be the constituent
name. However, 'phenol' would not.
If your database contains records where more than one name could be
identified as 'pH', it is important that you alias all related
names to a single choice. Otherwise, the statistical analyses will not
collect all the relevant data records. It does not matter which name
you choose to be the 'dominant' pH.
