<pubnumber>630R98004</pubnumber>
<title>Report of the Workshop on Selecting Input Distributions for Probabilistic Assessments</title>
<pages>276</pages>
<pubyear>1998</pubyear>
<provider>NEPIS</provider>
<access>online</access>
<operator>BO</operator>
<scandate>06/08/99</scandate>
<origin>hardcopy</origin>
<type>single page tiff</type>
<keyword>data population representativeness distribution risk distributions assessment one exposure surrogate issue concern uncertainty edf expert fit workshop variability sample analysis</keyword>
&EPA
United States
Environmental Protection
Agency
Office of Research and
Development
Washington DC 20460
EPA/630/R-98/004
January 1999
Report of the Workshop on
Selecting Input
Distributions for
Probabilistic Assessments
RISK ASSESSMENT FORUM
image:
image:
EPA/630/R-98/004
January 1999
Report of the Workshop on
Selecting Input Distributions
For Probabilistic Assessments
U.S. Environmental Protection Agency
New York, NY
April 21-22, 1998
Risk Assessment Forum
U.S. Environmental Protection Agency
Washington, DC 20460
Printed on Recycled Paper
image:
NOTICE
This document has been reviewed in accordance with U.S. Environmental Protection Agency
(EPA) policy and approved for publication. Mention of trade names or commercial products does not
constitute endorsement or recommendation for use.
This report was prepared by Eastern Research Group, Inc. (ERG), an EPA contractor (Contract
No. 68-DS-0028, Work Assignment No. 98-06) as a general record of discussions during the Workshop
on Selecting Input Distributions for Probabilistic Assessments. As requested by EPA, this report
captures the main points and highlights of discussions held during plenary sessions. The report is not a
complete record of all details discussed nor does it embellish, interpret, or enlarge upon matters that were
incomplete or unclear. Statements represent the individual views of each workshop participant; none of
the statements represent analyses by or positions of the Risk Assessment Forum or the EPA.
11
image:
CONTENTS
Page
SECTION ONE INTRODUCTION 1-1
1.1 Background and Purpose 1-1
1.2 Workshop Organization 1-2
SECTION TWO CHAIRPERSON'S SUMMARY 2-1
2.1 Representativeness 2-1
2.2 Sensitivity Analysis 2-2
2.3 Making Adjustments to Improve Representation - 2-4
2.4 Empirical and Parametric Distribution Functions , 2-6
2.5 Goodness-of-Fit 2-8
SECTION THREE OPENING REMARKS .... 3-1
3.1 Welcome and Regional Perspective 3-1
3.2 Overview and Background 3-1
3.3 Workshop Structure and Objectives 3-2
SECTION FOUR ISSUE PAPER PRESENTATIONS 4-1
4.1 Issue Paper on Evaluating Representativeness of Exposure Factors Data 4-1
4.2 Issue Paper on Empirical Distribution Functions and Non-Parametric Simulation. . 4-2
SECTION FIVE EVALUATING REPRESENTATIVENESS OF
EXPOSURE FACTORS DATA 5-1
5.1 Problem Definition 5-1
5.1.1 What information is required to specify a problem definition fully? .... 5-1
5.1.2 What constitutes representativeness (or lack thereof)?
What is "acceptable deviation"? 5-3
5.1.3 What considerations should be included in, added to, or
excluded from the checklists? 5-6
5.2 Sensitivity 5-8
5.3 Adjustment '. 5-10
5.4 Summary of Expert Input on Evaluating Representativeness 5-14
111
image:
SECTION SEX
CONTENTS (Continued)
EMPIRICAL DISTRIBUTION FUNCTIONS AND
RESAMPLING VERSUS PARAMETRIC DISTRIBUTIONS 6-1
6.1 Selecting an EDF or PDF 6'1
6.2 Goodness-of-Fit (GoF) - 6'5
6.3 Summary of EOF/PDF and GoF Discussions 6-7
SECTION SEVEN OBSERVER COMMENTS 7-1
SECTION EIGHT REFERENCES 8-1
APPENDICES
APPENDIX A Issue Papers A-1
APPENDIXB List of Experts and Observers - B-l
APPENDIX C Agenda • C'1
APPENDEXD Workshop Charge D'1
APPENDIX E Breakout Session Notes . E-l
APPENDIX F Premeeting Comments F-l
APPENDIX G Postmeeting Comments G-l
APPENDIXH Presentation Materials H-l
IV
image:
SECTION ONE
INTRODUCTION
1.1 BACKGROUND AND PURPOSE
The U.S. Environmental Protection Agency (EPA) has long emphasized the importance of
adequately characterizing uncertainty, and variability in its risk assessments, and it continuously studies
various quantitative techniques for better characterizing uncertainty and variability. Historically, Agency
risk assessments have been deterministic (i.e., based on a point estimate), and uncertainty analyses have
been largely qualitative. In May 1997, the Agency issued a policy on the use of probabilistic techniques
in characterizing uncertainty and variability. This policy recognizes that probabilistic analysis tools like
Monte Carlo analysis are acceptable provided that risk assessors present adequate supporting data and
credible assumptions.: The policy also identifies several implementation activities that are designed to
help Agency assessors review and prepare probabilistic assessments.
To this end, EPA's Risk Assessment Forum (RAF) is developing a framework for selecting input
distributions for probabilistic assessment. This framework emphasizes parametric distributions,
estimations of the parameters of candidate distributions, and evaluations of the candidate distributions'
quality of fit. A technical panel, convened under the auspices of the RAF, began work on the framework
in the summer of 1997. In September 1997, EPA sought input on the framework from 12 experts from
outside the Agency. The group's recommendations included:
Expanding the framework's discussion of exploratory data analysis and graphical
methods for assess the quality of fit.
Discussing distinctions between variability and uncertainty and their implications.
:
Discussing empirical distributions and bootstrapping.
Discussing correlation and its implications.
Making the framework available to the risk assessment community as soon as possible.
In response to this input, EPA initiated a pilot program in which the Research Triangle Institute
(RTI) applied the framework for fitting distributions to data from EPA's Exposure Factors Handbook
(EFH) (US EPA, 1996a). RTI used three exposure factors—drinking water intake, inhalation rate, and
residence time—as test cases. Issues highlighted as part of this effort fall into two broad categories: (1).-,
issues associated with the representativeness of the data, and (2) issues associated with using the
Empirical Distribution Function (EDF) (or resampling techniques) versus using a theoretical Parametric
Distribution Function (PDF).
In April 1998, the RAF organized a 2-day workshop, "Selecting Input Distributions for
Probabilistic Assessments," to solicit expert input on these and related issues. Specific workshop goals
included:
1-1
image:
• Discussing issues associated with the selection of probability distributions.
• Obtaining expert input on measurements, extrapolations, and adjustments.
• Discussing qualitatively how to make quantitative adjustments.
EPA developed two issue papers to serve as a focal point for discussions: "Evaluating
Representativeness of Exposure Factors Data" and "Empirical Distribution Functions and Non-
parametric Simulation." These papers which were developed strictly to prompt discussions during the
workshop are found in Appendix A. Discussions during the 2-day workshop focused on technical issues,
not policy. The experts discussed issues that would apply to any exposure data.
This workshop report is intended to serve as an information piece for Agency assessors who
prepare or review assessments based on the use of probabilistic techniques and who work with various
exposure data. This report does not represent Agency guidance. It simply attempts to capture the
technical rigor of the workshop discussions and will be used to support further development and
application of probabilistic analysis techniques/approaches.
1.2 WORKSHOP ORGANIZATION
The workshop was held on April 21 and 22, 1998, at the EPA Region 2 offices in New York
City. The 21 participants, experts in exposure and risk assessment, included biologists, chemists,
engineers, mathematicians, physicists, statisticians, and toxicologists, and represented industry,
academia, state agencies, EPA, and other federal agencies. A limited number of observers also attended
the workshop. The experts and observers are listed in Appendix B.
The workshop agenda is in Appendix C. Mr. McCabe (EPA Region 2), Steven Knott of the
RAF, and Dr. H. Christopher Frey, workshop facilitator, provided opening remarks. Before discussions
began, Ms. Jacqueline Moya and Dr. Timothy Barry of EPA summarized the two issue papers.
During the 2-day workshop, the technical experts exchanged ideas in plenary and four small
group breakout sessions. Discussions centered on the two issue papers distributed for review and
comment before the workshop. Detailed discussions focused primarily on the questions in the charge
(Appendix D). "Brainwriting" sessions were held within the smaller groups. Brainwriting, an interactive
technique, enabled the experts to document their thoughts on a topic and build on each others' ideas.
Each small group captured the essence of these sessions and presented the main ideas to the entire group
during plenary sessions. A compilation of notes from the breakout sessions are included in Appendix E.
Following expert input, observers were allowed to address the panel with questions or comments. In
addition to providing input at the workshop, several experts provided pre- and postmeeting comments,
which are in Appendices F and G, respectively.
Section Two of this report contains the chairperson's summary of the workshop. Section Three
highlights workshop opening remarks. Section Four summarizes Agency presentations of the two issue
papers. Sections Five and Six describe expert input on the two main topic areas—representativeness and
Et)F/PDF issues. Speakers' presentation materials (overheads and supporting papers) are included in
Appendix H.
" 1-2
image:
SECTION TWO
CHAIRPERSON'S SUMMARY
Prepared by: H. Christopher Frey, Ph.D.
The workshop was comprised of five major sessions, three of which were devoted to the issue of
representativeness and two to issues regarding parametric versus empirical distributions and goodness-of-
fit. Each session began with a trigger question. For the three sessions on representativeness, there was
discussion in a plenary setting, as well as discussions within four breakout groups. For the two sessions
regarding selection of parametric versus empirical distributions and the use of goddness-of-fit tests, the
discussions were conducted in plenary sessions.
2.1 REPRESENTATIVENESS
The first session covered three main questions, based on the portion of the workshop charge
(Appendix D) requesting feedback on the representativeness issue paper. After some general discussion,
the following three trigger questions were formulated and posed to the group:
1. What information is required to fully specify a problem definition?
2. What constitutes (lack of) representativeness?
3. What considerations should be included in, added to, or excluded from the checklists
given in the issue paper on representativeness (Appendix A)?
The group was then divided into four breakout groups, each of which addressed ajl three of these
questions. Each group was asked to use an approach known as "brainwriting." Brainwriting is intended
to be a silent activity in which each member of a group at any given time puts thoughts down on paper in
response to a trigger question. After completing an idea, a group member exchanges papers with another
group member. Typically, upon reading what others have written, new ideas are generated and written
down. Thus, each person has a chance to read and respond to what others have written. The advantages
of brainwriting are that all participants can generate ideas simultaneously, there is less of a problem with
domination of the discussion by just a few people, and a written record is produced as part of the process.
A disadvantage is that there is less "interaction" with the entire group. After the brainwriting activity
was completed, a representative of each group reported the main ideas to the entire group.
The experts generally agreed that before addressing the issue of representativeness, it is
necessary to have a clear problem definition. Therefore, there was considerable discussion of what
factors must be considered to ensure a complete problem definition. The most general requirement for a
good problem definition, to which the group gave general assent, is to specify the "who, what, when,
where, why, and how." The "who" addresses the population of interest. "Where" addresses the spatial
characteristics of the assessment. "When" addresses the temporal characteristics of the assessment.
"What" relates to the specific chemicals and health effects of concern. "Why" and "how" may help
clarify the previous matters. For example, it is helpful to know that exposures occur because of a
particular behavior (e.g., fish consumption) when attempting to define an exposed population and the
spatial and temporal extent of the problem. Knowledge of "why" and "how" is also useful later for
2-1
image:
proposing mitigation or prevention strategies. The group in general agreed upon these principles for a
problem definition, as well as the more specific suggestions detailed in Section 5.1.1 of this workshop
report.
In regard to the second trigger question, the group generally agreed that "representativeness" is
CQntext-specific. Furthermore, there was a general trend toward finding other terminology instead of
using the term "representativeness." In particular, many the group concurred that an objective in an
assessment is to mak'e sure that it is "useful and informative" or "adequate" for the purpose at hand. The
acfequacy of an assessment may be evaluated with respect to considerations such as "allowable error" as
well as, practical matters such as the ability to make measurements that are reasonably free of major
errors or to reasonably interpret information from other sources that are used as an input to an
assessment. Adequacy may be quantified, in principle, in terms of the precision and accuracy of model
inputs and model outputs. There was some discussion of how the distinction between variability and
uncertainty relates to assessment of adequacy. For example, one may wish to have accurate predictions
of exposures for more than one percentile of the population, reflecting variability. For any given
percentile of the population, however, there may be uncertainty in the predictions of exposures. Some
individuals pointed out that, because often it is not possible to fully validate many exposure predictions
or to obtam input information that is free of error or uncertainty, there is an inherently subjective element
irt assessing adequacy, the stringency of the requirement for adequacy will depend on the purpose of the
assessment. It was noted, for example, that it may typically be easier to adequately define mean values of
exposure than upper percentile values of exposure. Adequacy is also a function of the level of detail of an
assessment; the requirements for adequacy of an initial, screening-level calculation will typically be less
rigorous than those for a more detailed analysis.
Regarding the third trigger question, the group was generally complimentary of the proposed
checklists in the representativeness issue paper (see Appendix A). The group, however, had many
suggestions for improving the checklists. Some of the broader concerns were about how to make the
checklists context-specific, because the degree of usefulness of information depends on both the quality
of the information and the purpose of the assessment. Some of the specific suggestions included using
flowcharts rather than lists; avoiding overlap among the flowcharts or lists; developing an interactive
Web-based flowchart that would be flexible and context-specific; and clarifying terms used in the issue
paper (e.g., "external" versus "internal" distinction). The experts also suggested that the checklists or
flowcharts encourage additional data collection where appropriate and promote a "value of information"
approach to help prioritize additional data collection. Further discussion of the group's comments is
given in Section 5.1.3.
2,2 SENSITIVITY ANALYSIS
The second session was devoted to issues encapsulated in the following trigger questions:
fjow can one do sensitivity analysis to evaluate the implications of non-representativeness? In
other words, how do we assess the importance of non-representativeness?
The experts were asked to consider data, models, and methods in answering these questions.
Furthermore, the group was asked to keep in mind that the charge requested recommendations for
immediate, short-term, and long-term studies or activities that could be done to provide methods or
examples for answering these questions.
2-2
_
image:
There were a variety of answers to these questions. A number of individuals shared the view that
non-representativeness may not be important in many assessments. Specifically, they argued that many
assessments and decisions consider a range of scenarios and populations. Furthermore, populations and
exposure scenarios typically change over time, so that if one were to focus on making an assessment
"representative" for one point in time or space, it could fail to be representative at other points in time or
space or even for the original population of interest as individuals enter, leave, or change within the
exposed population. Here again the notion of adequacy, rather than representativeness, was of concern to
the group.
The group reiterated that representativeness is context-specific. Furthermore, there was some
discussion of situations in which data are collected for "blue chip" distributions that are not specific to
any particular decision. The experts did recommend that, in situations where there may be a lack of
adequacy of model predictions based on available information, the sensitivity of decisions should be
evaluated under a range of plausible adjustments to the input assumptions. It was suggested that there
may be multiple tiers of analyses, each with a corresponding degree of effort and rigor regarding
sensitivity analyses. In a "first-tier" analysis, the use of bounding estimates may be sufficient to establish
sensitivity of model predictions with respect to one or more model outputs, without need for a
probabilistic analysis. After a preliminary identification of sensitive model inputs, the next step would
typically be to develop a probability distribution to represent a plausible range of outcomes for each of
the sensitive inputs. Key questions to be considered are whether to attempt to make adjustments to
improve the adequacy or representativeness of;the assumptions and/or whether to collect additional data
to improve the characterization of the input assumptions.
One potentially helpful criterion for deciding whether data are adequate is to try to answer the
question: "Are the data good enough to replace an assumption?" If not, then additional data collection is
likely to be needed. One would need to assess whether the needed data can be collected. A "value of
information" approach can be useful in prioritizing data collection and in determining when sufficient
data have been collected.
There was some discussion of sensitivity analysis of uncertainty versus sensitivity analysis of
variability. The experts generally agreed that sensitivity analysis to identify key sources of uncertainty is
a useful and appropriate thing to do. There was disagreement among the experts regarding the meaning
of identifying key sources of variability. One expert argued that identifying key sources of variability is
not useful, because variability is irreducible. However, knowledge of key sources of variability can be
useful in identifying key characteristics of highly exposed subpopulations or in formulating prevention or
mitigation measures. Currently, there are many methods that exist for doing sensitivity analysis,
including running models for alternative scenarios and input assumptions and the use of regression or
statistical methods to identify the most sensitive input distributions in a probabilistic analysis. In the
short-term and long-term, it was suggested that some efforts be devoted to the development of "blue
chip" distributions for quantities that are widely used in many exposure assessments (e.g., intake rates of
various foods). It was also suggested that new methods for sensitivity analysis might be obtained from
other fields, with specific examples based on classification schemes, time series, and "g-estimation."
2-3
image:
2.3 MAKING ADJUSTMENTS TO IMPROVE REPRESENTATION
In the third session, the group responded to the following trigger question:
How can one make adjustments from the sample to better represent the population of interest?
The group was asked to consider "population," spatial, and temporal characteristics when
considering issues of representativeness and methods for making adjustments. The group was asked to
provide input regarding exemplary methods and information sources that are available now to help in
making such adjustments, as well as to consider short-term and long-term research needs.
The group clarified some of the terminology that was used in the issue paper and in the
discussions The term "population" was defined as referring to "an identifiable group of people." The
experts noted that often one has a sample of data from a "surrogate; population," which is not identical to
the "target population" of interest in a particular exposure assessment. The experts also noted that there
Js a difference between the "analysis" of actual data pertaining to the target population and
"extrapolation" of information from data for a surrogate population to make inferences regarding a target
population. It was noted that extrapolation always "introduces" uncertainty,
On the temporal dimension, the experts noted that, when data are collected at one point in time
and are used in an assessment aimed at a different point in time, a potential problem may occur because
of shifts in the characteristics of populations between the two periods.
Reweighting of data was one approach that was mentioned in the plenary discussion. There was
a, discussion of "general" versus mechanistic approaches for making adjustments. The distinction here
was that "general" approaches might be statistical, mathematical, or empirical in their foundations (e.g.,
regression analysis), whereas mechanistic approaches would rely on theory specific to a particular
problem area (e.g., a physical, biological, or chemical model). It was noted that temporal and spatial
issues are often problem-specific, which makes it difficult to recommend universal approaches for
leaking adjustments. The group generally agreed that it is desirable to include or state the uncertainties
associated with extrapolations. Several participants strongly expressed the view that "it is okay to state
what you don't know," and there was no disagreement on this point.
The group recommended that the basis for making any adjustments to assumptions regarding
populations should be predicated on stakeholder input and the examination of covariates. The
group noted that methods for analyzing spatial and temporal aspects exist, if data exist. Of course, a
common problem is scarcity of data and a subsequent reliance on surrogate information. For assessment
of spatial variations, methods such as kreiging and random fields were commonly suggested. For
assessment of temporal variations, time series methods were suggested-
There was a lively discussion regarding whether adjustments should be "conservative." Some
gxperts initially argued that, to protect public health, any adjustments to input assumptions should tend to
be biased in a conservative manner (so as not to make an error of understating a health risk, but with
some nonzero probability of making an error of overstating a particular risk). After some additional
discussion, it appeared that the experts were in agreement that one should strive primarily for accuracy
and that ideally any adjustments that introduce "conservatism" should be left to decision makers. It was
pointed out that invariably many judgments go into the development of input assumptions for an analysis
and that these judgments in reality often introduce some conservatism. Several pointed out that
24
image:
"conservatism" can entail significant costs if it results in over control or misidentification of important
risks. Thus, conservatism in individual assessments may not be optimal or even conservative in a
broader sense if some sources of risk are not addressed because others receive undue attention.
Therefore, the overall recommendation of the experts regarding this issue is to strive for accuracy rather
than conservatism, leaving the latter as an explicit policy issue for decision makers to introduce, although
it is clear that individual participants had somewhat differing views.
The group's recommendations regarding measures that can be taken now include the use of
stratification to try to reduce variability and correlation among inputs in an assessment, brainstorming to
generate ideas regarding possible adjustments that might be made to input assumptions, and stakeholder
input for much the same purpose, as well as to make sure that no significant pathways or scenarios have
been overlooked. It was agreed that "plausible extrapolations" are reasonable when making adjustments
to improve representativeness or adequacy. What is "plausible" will be context-specific.
In the short term, the experts recommended that the following activities be conducted:
Numerical Experiments. Numerical experiments can be used to test existing and new methods
for making adjustments based on factors such as averaging times or averaging areas. For
example, the precision and accuracy of the Duan- Wallace model (described in the
representativeness issue paper in Appendix A) for making adjustments from one averaging time
to another can be evaluated under a variety of conditions via numerical experiments.
Workshop on Adjustment Methods. The experts agreed in general that there are many potentially
useful methods for analysis and adjustment but that many of these are to be found in fields
outside the risk analysis community. Therefore, it would be useful to convene a panel of experts
from other fields for the purpose of cross-disciplinary exchange of information regarding
methods applicable to risk analysis problems. For example, it was .suggested that geostatistical
methods should be investigated.
. Put Data on the Web. There was a fervent plea from at least one expert that data for "blue chip"
and other commonly used distributions be placed on the Web to facilitate the dissemination and
analysis of such data. A common concern is that often data are reported in summary form, which
makes it difficult to analyze the data (e.g., to fit distributions). Thus, the recommendation
includes the placement of actual data points, and not just summary data, on publicly accessible
Web sites.
Suggestions on How to Choose a Method. The group felt that, because of the potentially large
number of methods and the need for input from people in other fields, it was unrealistic to
provide recommendations regarding specific methods for making adjustments. However, they
did suggest that it would be possible to create a set of criteria regarding desirable features for
such methods that could help an assessor when making choices among many options.
In the longer term, the experts recommend that efforts be directed at more data collection, such
as improved national or regional surveys, to better capture variability as a function of different
populations, locations, and averaging times. Along these lines, specific studies could be focused on the
development or refinement of a select set of "blue chip" distributions, as well as targeted at updating or
extending existing data sets to improve their flexibility for use in assessments of various populations,
2-5
image:
locations, and averaging times. The group also noted that because populations, pathways, and scenarios
change over time, there will be a continuing need to improve existing data sets.
2.4 EMPIRICAL AND PARAMETRIC DISTRIBUTION FUNCTIQNS
In the fourth session, the experts began to address the second main set of issues as given in the
charge. The trigger question used to start the discussion was:
What are the primary considerations in choosing between the use of parametric distribution
functions (PDFs) and empirical distribution functions (EDFs)?
The group was asked to consider the advantages of using one versus the other, whether the
choice is merely a matter of preference, whether one is preferred, and whether there are cases when
neither should be used.
The initial discussion involved clarification of the difference between the terms EDF and
"bootstrap." Bootstrap simulation is a general technique for estimating confidence intervals and
characterizing sampling distributions for statistics, as described by Efron and Tibshirani (1993). An EDF
can be described as a stepwise cumulative distribution function or as a probability density function in
Which each data point is assigned an equal probability. Non-parametric bootstrap can be used to quantify
sampling distributions or confidence intervals for statistics based upon the EDF, such as percentiles or
moments. Parametric bootstrap methods can be used to quantify sampling distributions or confidence
intervals for statistics based on PDFs. Bootstrap methods are also often referred to as "resampling"
methods. However, "bootstrap" and EDF are not the same thing.
The experts generally agreed that the choice of EDF versus PDF is usually a matter of
preference, and they also expressed the general opinion that there should be no rigid guidance requiring
the use of one or the other in any particular situation. The group briefly addressed the notion of
consistency. While consistency in the use of a particular method (e.g., EDF or PDF in this case) may
offer benefits in terms of simplifying analyses and helping decision makers, there was a concern that any
strict enforcement of consistency will inhibit the development of new methods or the acquisition of new
data and may also lead to compromises from better approaches that are context-specific. Here again, it is
important to point out that the experts explicitly chose not to recommend the use of either EDF or PDF as
a single preferred approach but rather to recommend that this choice be left to the discretion of assessors
on a case-by-case basis. For example, it could be reasonable for an assessor to include EDFs for some
inputs and PDFs for others even within the same analysis.
ii'ii • . .1. ' ; • >•• ''i'', '•' • ' !
Some participants gave examples of situations in which they might prefer to use an EDF, such as:
(a) when there are a large number of data points (e.g., 12,000); (b) access to high speed data storage and
retrieval systems; (c) when there is no theoretical basis for selecting a PDF; and/or (d) when one has an
''ideal" sample. There was some discussion of preference for use of EDFs in "data-rich" situations rather
than "data-poor" situations. However, it was noted that "data poor" is context-specific. For example, a
data set may be adequate for estimating the 90th percentile but not the 99th percentile. Therefore, one
may be "data rich" in the former case and "data poor" in the latter case with the same data set.
Some experts also gave examples of when they would prefer to use PDFs. A potential limitation
gf conventional EDFs is that they are restricted to the range of observed data. In contrast, PDFs typically
image:
intuitive or theoretical appeal. PDFs are also preferred by some because they provide a compact
representation of data and can provide insight into generalizable features of a data set. Thus, in contrast
to the proponent of the use of an EDF for a data set of 12,000, another expert suggested it would be
easier to summarize the data with a PDF, as long as the fit was reasonable. At least one person suggested
that a PDF may be easier to defend in a legal setting, although there was no consensus on this point.
For both EDFs and PDFs, the issue of extrapolation beyond the range of observed data received
considerable discussion. One expert stated that, the "further we go out in the tails, the less we know," to
which another responded, "when we go beyond the data, we know nothing." As a rebuttal, a third expert
asked "do we really know nothing beyond the maximum data point?" and suggested that analogies with
similar situations may provide a basis for judgments regarding extrapolation beyond the observed data.
Overall, most or all of the experts appeared to support some approach to extrapolation beyond observed
data, regardless of whether one prefers an EDF or a PDF. Some argued that one has more control over
extrapolations with EDFs, because there are a variety of functional forms that can be appended to create
a "tail" beyond the range of observed-data. Examples of these are described in the issue paper. Others
argued that when there is a theoretical basis for selecting a PDF, there is also some theoretical basis for
extrapolating beyond the observed data. It was pointed out that one should not always focus on the
"upper" tail; sometimes the lower tail of a model input may lead to extreme values of a model output
(e.g., such as" when an input appears in a denominator).
There was some discussion of situations in which neither an EDF or a PDF may be particularly
desirable. One suggestion was that there may be situations in which explicit enumeration of all
combinations of observed data values for all model inputs, as opposed to a probabilistic resampling
scheme, may be desired. Such an approach can help, for example, in tracing combinations of input
values that produce extreme values in model outputs. One expert suggested that neither EDFs nor PDFs
are useful when there must be large extrapolations into the tails of the distributions.
A question that the group chose to address was, "How much information do we lose in the tails
of a model output by not knowing the tails of the model inputs?" One comment was that it may not be
necessary to accurately characterize the tails of all model inputs because the tails (or extreme values) of
model outputs may depend on a variety of other combinations of model input values. Thus, it is possible
that even if no effort is made to extrapolate beyond the range of observed data in model inputs, one may
still predict extreme values in the model outputs. The use of scenario analysis was suggested as an
alternative or supplement to probabilistic analysis in situations in which either a particular input cannot
reasonably be assigned a probability distribution or when it may be difficult to estimate the tails of an
important input distribution. In the latter case, alternative upper bounds on the distribution, or alternative
assumptions regarding extrapolation to the tails, should be considered as scenarios.
Uncertainty in EDFs and PDFs was discussed. Techniques for estimating uncertainties in the
statistics (e.g., percentiles) of various distributions, such as bootstrap simulation, are available. An
example was presented for a data set of nine measurements, illustrating how the uncertainty in the fit of a
parametric distribution was greatest at the tails. It was pointed out that when considering alternative
PDFs (e.g., Lognormal vs. Gamma) the range of uncertainty in the upper percentiles of the alternative
distributions will typically overlap; therefore, apparent differences in the fit of the tails may not be
particularly significant from a statistical perspective. Such insights are obtained from an explicit
approach to distinguishing between variability and uncertainty in a "two-dimensional" probabilistic
framework.
2-7
image:
The group discussed whether mixture distributions are useful. Some experts were clearly
proponents of using mixture distributions. A few individuals offered some cautions that it can be
difficult to know when to properly employ mixtures. One example mentioned was for radon
concentrations. One expert mentioned in passing that radon concentrations had been addressed in a
particular assessment assuming a Lognormal distribution. Another responded that the concentration may
more appropriately be described as a mixture of normal distributions. There was no firm consensus on
whether it is better to use a mixture of distributions as opposed to a "generalized" distribution that can
take on many arbitrary shapes. Those who expressed opinions tended to prefer the use of mixtures
because they could offer more insight about processes that produced the data.
,; , ' , ' . , ' P-I. ' ' : ' " " ''
Truncation of the tails of a PDF was discussed. Most of the experts seemed to view this as a last
resort fraught with imperfections. The need for truncation may be the result of an inappropriate selection
of a PDF. For exainple, one participant asked, "If you truncate a Lognormal, does this invalidate your
justification of the Lognormal?" It was suggested that alternative PDFs (perhaps ones that are less "tail
heavy") be explored. Some suggested that truncation is often unnecessary. Depending upon the
probability mass of the portion of the distribution that is considered for truncation, the probability of
sampling an extreme value beyond a plausible upper bound may be so low that it does not occur in a
typical Monte Carlo simulation of only a few thousand iterations. Even if an unrealistic value is sampled
for one input, it may not produce an extreme value in the model output. If one does truncate a
distribution, it can potentially affect the mean and other moments of the distribution. Thus, one expert
summarized the issue of truncation as "nitpicking" that potentially can lead to more problems than it
solves.
2.5 GOODNESS-OF-FIT
The fifth and final session of the workshop was devoted to the following trigger question:
On what basis should it be decided whether a data set is adequately fitted by a parametric
distribution?
The premise of this session was the assumption that a decision had already been made to use a
PDF instead of an EDF. While not all participating experts were comfortable with this assumption, all
agreed to base the subsequent discussion on it.
The group agreed unanimously that visualization of both the data and the fitted distribution is the
most important approach for ascertaining the adequacy of fit. The group in general seemed to share a
view that conventional Goodness-of-Fit (GoF) tests have significant shortcomings and that they should
not be the only or perhaps even primary methods for determining the adequacy of fit.
One expert elaborated that any type of probability plot that allows one to transform data so that
they can be compared to a straight line, representing a perfect fit, is extremely useful. The human eye is
generally good at identifying discrepancies from the straight line perfect fit. Another pointed out that
visualization and visual inspection is routinely used in the medical community for evaluation of
information such as x-rays and CAT scans; thus, there is a credible basis for reliance on visualization as a
means for evaluating models and data.
2-8
image:
One of the potential problems with GoF tests is that they may be sensitive to imperfections in the
fit that are not of serious concern to an assessor or a decision maker. For example, if there are outliers at
the low or middle portions of the distribution, a GoF test may suggest that a particular PDF should be
rejected even though there is a good fit at the upper end of the distribution. In the absence of a visual
inspection of the fit, the assessor may have no insight as to why a particular PDF was rejected by a GoF
test. .
The power of GoF tests was discussed. The group in general seemed comfortable with the
notion of overriding the results of a GoF test if what appeared to be a good fit, via visual inspection, was
rejected by the test, especially for large data sets or when the imperfections are in portions of the
distribution that are not of major concern to the assessor or decision maker. Some experts shared stories
of situations in which they found that a particular GoF test would reject a distribution due to only a few
"strange" data points in what otherwise appears to be a plausible fit. It was noted that GoF tests become
increasingly sensitive as the number of data points increases, so that even what appear to be small or
negligible "blips" in a large data set are sufficient to lead to rejection of the fit. In.contrast, for small data
sets, GoF tests tend to be "weak" and may fail to reject a wide range of PDFs. One person expressed
concern that any strict requirement for the use of GoF tests might reduce incentives for data collection,
because it is relatively easy to avoid rejecting a PDF with few data.
The basis of GoF tests sparked some discussion. The "loss functions" assumed in many tests
typically have to do with deviation of the fitted cumulative distribution function from the EDF for the
data set. Other criteria are possible and, in principle, one could create any arbitrary GoF test. One expert
asked whether minimization of the loss function used in any particular GoF test might be used as a basis
for choosing parameter values when fitting a distribution to the data. There was no specific objection,
but it was pointed out that a degree-of-freedom correction would be needed. Furthermore, other
methods, such as maximum likelihood estimation (-MLE), have a stronger theoretical basis as a method
for parameter estimation.
The group discussed the role of the "significance level" and the "p-value" in GoF tests. One
expert stressed that the significance level should be determined in advance of evaluating GoF and that it
must be applied consistently in rejecting possible fits. Others, however, suggested that the appropriate
significance level would depend upon risk management objectives. One expert suggested that it is useful
to know the p-value of every fitted distribution so that one may have an indication of how good or weak
the fit may have been according to the particular GoF test.
2-9
image:
image:
SECTION THREE
OPENING REMARKS
At the opening session of the workshop, representatives from EPA Region 2 and the RAF
welcomed members of the expert panel and observers. Following EPA remarks, the workshop facilitator
described the overall structure and objectives of the 2-day forum, which this section summarizes.
3.1 WELCOME AND REGIONAL PERSPECTIVE
Mr. William McCabe, Deputy Director, Program Support Branch, Emergency and
Remedial Response Division, U.S. EPA Region 2
William McCabe welcomed the group to EPA Region 2 and thanked everyone for participating
in the workshop. He noted that, in addition to this workshop, Region 2 also hosted the May 1996 Monte
Carlo workshop, which ultimately led to the release of EPA's May 1997 policy document on
probabilistic assessment. He commented on how this 2-day workshop was an important followup to the
May 1996 eventTMr. McCabe stressed that continued discussions on viable approaches to probabilistic
assessments are important because site-specific decisions rest on the merit of the risk assessment. He
stated that this type of workshop is an excellent opportunity for attendees to discuss effective methods
and expressed optimism that workshop discussions would provide additional insight and answers to
probabilistic assessment issues. Resolution of key probabilistic assessment issues, he noted, will help the
region members as they review risk assessments using probabilistic techniques. He mentioned, for
example, the ongoing Hudson River PCB study for which deterministic and probabilistic assessments
will be performed. In that case, as in others, Mr. McCabe said it will be critical for Agency reviewers to
put the results into the proper context and to validate/critically review probabilistic techniques employed
by the contractor(s) for the Potentially Responsible Parties.
3.2 OVERVIEW AND BACKGROUND
Mr. Steve Knott, U.S. EPA, Office of Research and Development, Risk Assessment Forum
On behalf of the RAF, Steve Knott thanked Region 2 for hosting the workshop. Mr. Knott briefly
explained how the RAF originated in the early 1980s and comprises approximately 30 scientists from
EPA program offices, laboratories, and regions. One primary RAF function is to bring experts together
to carefully study and help-foster cross-agency consensus on tough risk assessment issues.
Mr. Knott described the following activities related to probabilistic analysis in which the RAF
has been involved:
• Formation of the 1983 ad hoc technical panel on Monte Carlo analysis.
• May 1996 workshop on Monte Carlo analysis (US EPA, 1996b).
• Development of the guiding principles for Monte Carlo analysis (US EPA, 1997a)
• EPA's general probabilistic analysis policy (US EPA, 1997b).
3-1
image:
Mr. Knott reiterated the Agency's perspective on probabilistic techniques, stating that "the use of
probabilistic techniques can be a viable statistical tool for analyzing variability and uncertainty in risk
assessment" (US EPA, 1997b). Mr. Knott highlighted Condition 5 (on which this workshop was based)
Of the eight conditions for acceptance listed in EPA's policy:
Information for each input and output distribution is to be provided in the report. This includes
tabular and graphical representations of the distributions (e.g., probability density function and
cumulative distribution function plots) that indicate the location of any point estimates of interest
(e.g., mean, median, 95th percentile). The selection of distributions is to be explained and
justified. For both the input and output distributions, variability and uncertainty are to be
differentiated where possible (US EPA, 1997b).
Mr. Knott referred to the recent RTI report, "Development of Statistical Distributions for
Exposure Factors" (1998), which presents a framework for fitting distributions and applies the
framework to three case studies.
Mr. Knott explained that the Agency is seeking input from workshop participants primarily in the
following areas:
• Methods for fitting distributions to less-than-perfect data (i.e., data that are not perfectly
representative of the scenario(s) under study).
" Using the EDF (or resampling techniques) versus the PDF.
These issues were the focus of the workshop. Mr. Knott noted that the workshop will enable EPA to
receive input from experts, build on existing guidance, and provide Agency assessors additional insight.
EPA will use the information from this workshop in future activities, including (1) developing or revising
guidelines and models, (2) updating the Exposure Factors Handbook, (3) supporting modeling efforts,
and (4) applying probabilistic techniques to dose-response assessment.
3.3 WORKSHOP STRUCTURE AND OBJECTIVES
Dr. H. Christopher Frey, Workshop Chair
Dr. Frey, who served as workshop chair and facilitator, reiterated the purpose and goals of the
workshop. As facilitator, Dr. Frey noted, he would attempt to foster discussions that would further
illuminate and support probabilistic assessment activities. Dr. Frey stated that workshop discussions
\VOuld center on trie two issue papers mentioned previously. He explained that the RTI report was
provided to experts for background purposes only. While the RTI report was not the review subject for
this workshop, Dr. Frey commented that it may provide pertinent examples.
The group*s charge, according to Dr. Frey, was to advise EPA and the profession on
representativeness and distribution function issues. Because a slightly greater need exists for discussing
representativeness issues and developing new techniques in this area, Dr. Frey explained that this topic
would receive the greatest attention during the 2-day workshop. He reemphasized that the workshop
would focus on technical issues, not policy issues.
3-2
; i Ji
image:
Dr. Frey concluded his introductory remarks by stating that the overall goal of the workshop was
to provide a framework for addressing technical issues that may be applied widely to different future
activities (e.g., development of exposure factor distributions).
Workshop Structure and Expert Charge
Dr. Frey explained that the workshop would be structured around technical questions related to
the two issue papers. Appendix D presents the charge provided to experts before the workshop,
-including specific questions for consideration and comment. The workshop material, Dr. Frey noted, is~
inherently technical. He, therefore, encouraged the experts to use plain language where possible. He
also noted that the workshop was not intended to be a short course or tutorial. In introducing the key
topics for workshop discussions, Dr. Frey highlighted the following, which he perceived as the most
challenging issues and questions based on experts' premeeting comments:
Representativeness. How should assessors address representativeness? What deviation is
acceptable (given uncertainty and variability in data quality, how close will we come to
answering the question)? How do assessors work representativeness into their problem
definition (erg., What are we asking? What form will the answer take?) T
Sensitivity. How important is the potential lack of representativeness? How do we evaluate
this?
Adjustment. Are there reasonable ways to adjust or extrapolate in cases where exposure data are
not representative of the population of concern?
EOF/PDF. How do assessors choose between EDFs and theoretical PDFs? On what basis do
assessors decide whether a data set is adequately represented by a fitted analytic distribution?
Dr. Frey encouraged participants to remember the following general questions as they discussed
specific technical questions during plenary sessions, small group discussions, and brainwriting sessions:
• What do we know today that we can apply to answer the questions or provide guidance?
• What short-term studies (e.g., numerical experiments) could answer the question or
provide additional guidance?
• What long-term research (e.g., greater than 18 months) may be needed to answer the
question or provide additional guidance?
According to Dr. Frey, the answers_to these questions will help guide Agency activities related to
probabilistic assessments.
Dr. Frey also encouraged the group to consider what, if anything, is not covered in the issue
papers, but is related to the key topics. He noted some of the following examples, which were
communicated in the experts'premeeting comments:
3-3
image:
• Role of expert judgment and Bayesian methods, especially in making adjustments.
» Is model output considered representative if all the inputs to the model are considered
representative? This issues relates, in part, to whether or not correlations or
dependencies among the input are properly addressed.
• Role of representativeness in a default or generic assessment.
. • „, , i. '"'",' ,' . • '' ''
• Role of the measurement process.
!l jj •'••••'.• ..•;•, .,, . ;•',, : .. , , 7 .. , • . | , • .
Lastly, Dr. Frey explained that the activities related to the workshop are public information. The
Workshop was advertised in the Federal Register and observers were welcomed. Time was set aside on
both days of the workshop for observer questions and comments.
image:
SECTION FOUR
ISSUE PAPER PRESENTATIONS
Two issue papers were developed to present the expert panelists with pertinent issues and to
initiate workshop discussions. Prior to the plenary and small group discussions, EPA provided an
overview of each paper. This section provides a synopsis of each presentation. The two issue papers are
presented in Appendix A- The overheads are in Appendix H.
4.1 ISSUE PAPER ON EVALUATING REPRESENTATIVENESS OF EXPOSURE
FACTORSDATA
Jacqueline Moya, U.S. EPA, NCEA, Washington, DC
Ms. Moya opened her overview by notirigthat, while exposure distributions are available in the
Exposure Factors Handbook, there is still a need to fit distributions for these data. Ms. Moya noted that a
joint NCEA-RTI pilot project in September 1997 was established to do this. She then discussed the
purpose of the issue paper and the main topics she planned to cover (i.e., framework for inferences,
components of representativeness, the checklists, and methods for improving representativeness). The
purpose of the issue paper, Ms. Moya reminded the group, was to introduce concepts and to prompt
discussions on how to evaluate representativeness and what to do if a sample is not representative.
Ms. Moya presented a flow chart (see Figure 1 in the issue paper) of the data-collection process
for a risk assessment. If data collection is not possible, she explained, surrogate data must be identified.
The next step is to ask whether the surrogate data represent the site or chemical. Ms. Moya pointed to
Checklist I (Assessing Internal Representativeness), which includes suggested questions for determining
whether the surrogate data are representative of the population of concern. If not, the assessor must ask,
"How do we adjust the data to make it more representative?"
Ms. Moya then briefly reviewed the key terms hi the paper. Representativeness in the context of
an exposure/risk assessment refers to the comfort with which one can draw inferences from the data.
Population is defined in terms of its member characteristics (i.e., demographics, spatial and temporal
elements, behavioral patterns). The assessor's population of concern is the population for which the
assessment is being conducted. The surrogate population is the population used when data on the
population of concern is not available. The population of concern for the surrogate study is the sample
population for which the surrogate study was designed. The population sampled is a sample from the
population of concern of the surrogate study.
Ms. Moya briefly described the external and internal components of representativeness. She
explained that external components reflect how well the surrogate population represents the population
of concern. Internal components refer to the surrogate study, specifically:
1. How well do sampled individuals represent the surrogate population? This depends on
how well the study was designed. For example, was it random?
4-1
image:
ii IIP Oil '',;-ill!!!11.
2. How well do the respondents represent the sample population? For example, if
recreational fishermen are surveyed, is someone who fishes more frequently more likely
to respond the survey, and therefore bias the response?
3. How well does the measured value represent the true value for the measurement unit?
For example, are the recreational fishermen in the previous example accurately reporting
the sizes of the fish they catch?
Ms. Moya reviewed the four checklists in the issue paper which may serve as tools for risk
assessors trying to evaluate data representativeness. One checklist is for the population sampled versus
the population of concern for the surrogate study (internal representativeness). The other checklists refer
to the surrogate population versus the population of concern based on individual, spatial, and temporal
characteristics (external representativeness). One goal of the workshop, Ms. Moya explained, was to
solicit input from experts on the use of these checklists. Specifically, she asked whether certain
questions should be eliminated (e.g., only a subset of the questions may be needed for a screening risk
assessment).
Lastly, Ms. Moya pointed to discussions in the issue paper on attempting to improve
representativeness. One section refers to how to make adjustments for differences in population
characteristics (with discussions geared toward using weights for the sample). The second section refers
to time-unit differences and includes how to adjust for this. Ms. Moya asked the group to consider how
to evaluate the significance of population differences and how to perform extrapolations if they are
necessary.
4.2 ISSUE PAPER ON EMPIRICAL DISTRIBUTION FUNCTIONS AND NON-
PARAMETRIC SIMULATION
Timothy Barry, U.S. EPA, NCEA, Washington, DC
Dr. Barry reviewed the issues of concern related to selecting and evaluating distribution
functions. He explained that, assuming data are representative, the risk assessor has two methods for
representing an exposure factor in a probabilistic analysis: parametric (e.g., a Lognormal, Gamma, or
Weibull distribution) and non-parametric (i.e., use the sample data to define an EDF).
To illustrate how the EDF is generated, Dr. Barry presented equations and histograms (see
Appendix H). The basic EDF properties were defined as follows:
• Values between any two consecutive samples, xk and xk±,, cannot be simulated, nor can
values smaller than the sample minimum, x,, or larger than the sample maximum, xm be
generated (i.e., x>x, and x<xn).
» The mean of the EDF equals the sample mean. The variance of the EDF mean is always
smaller than the variance of the sample mean; it equals (n-l)/n times the variance of the
sample mean.
" Expected values of simulated EDF percentiles are equal to the sample percentiles.
4-2
image:
• If the underlying distribution is skewed to the right (as are many environmental
quantities), the EDF tends to underestimate the true mean and variance.
In addition to the basic EDF, Dr. Barry explained, the following variations exist:
• Linearized EDF. In this case, a linearized cumulative distribution pattern results. The
linearized EDF linearly extrapolates between two observations.
• Extended EDF. An extended EDF involves linearization and adds lower and upper tails
to the data to reflect a "more realistic range" of the exposure variable. Tails are added
__"" based expert judgment.
» Mixed Exponential. In this case, an exponential upper tail is added to the EDF. This
approach is based on extreme value theory.
After describing the basic concepts of EDFs, Dr. Barry provided an example in which
investigators compared and contrasted parametric and non-parametric techniques. Specifically, 90 air
exchange data points were shown to have a Weibull fit. When a basic EDF for these data is used, means
and variance reproduce well. It was concluded that if the goal is to reproduce the sample, Weibull does
well on the mean but poorly at the high end.
Dr. Barry encouraged the group to consider the following questions during the 2-day workshop:
• Is an EDF preferred over a PDF in any circumstances?
• Should an EDF not be used in certain situations?
• When an EDF is used, should the linearized, extended, or mixed version be used?
Dr. Barry briefly described the Goodness of Fit (GoF) questions the issue paper introduces. He
explained that, generally, assessors should pick the simplest analytic distribution not rejected by the data.
Because rejection depends on the chosen statistic and on an arbitrary level of statistical significance, Dr.
Barry posed the following questions to the group:
• What role should the GoF statistic and its p-value (when available) play in deciding on
the appropriate distribution?
• What role should graphical assessments of fit play?
• When none of the standard distributions fit well, should you investigate more flexible
families of distributions (e.g., four parameter gamma, four parameter F, mixtures)?
4-3
image:
_
image:
SECTION FIVE
EVALUATING REPRESENTATIVENESS OF EXPOSURE FACTORS DATA
Discussions on the first day and a half of the workshop focused on developing a framework for
characterizing and evaluating the representativeness of exposure data. The framework described in the
issue paper on representativeness (see Appendix A) is organized into three broad sets of questions: (1)
those related.to differences in populations, (2) those related to differences in spatial coverage and scale,
and (3) those related to differences in temporal scale. Therefore, discussions were held in the context of
these three topic areas. The panel also discussed the strengths and weaknesses of the proposed
"checklists" in the issue paper, which were designed to help the assessor evaluate representativeness.
The last portion of the workshop session on representativeness included discussions on sensitivity
(assessing the importance of non-representativeness) and on the methods available to adjust data to better
represent the population of concern. This section describes the outcome of each of these discussions.
Initial deliberations centered on the need to define risk assessment objectives (i.e. problem
definition) before evaluating the representativeness of exposure data. Discussions on sensitivity and
adjustment followed.
5.1 PROBLEM DEFINITION
The group agreed on two points: that "representativeness" depends on the problem at hand and
that the context of the risk analysis is critical. Several experts commented that assessors will have a
difficult time defining representativeness if the problem has not been well-defined. The group therefore
spent a significant amount of time discussing problem definition and problem formulation in the context
of assessing representativeness. Several experts noted the importance of understanding the end use of the
assessment (e.g., site-specific or generic, national or regional analysis). The group agreed that the most
important step for assessors is to ask whether the data are representative enough for their intended use(s).
The group agreed that stakeholders and other data users should be involved in all phases of the
assessment process, including early brainstorming sessions. Two experts noted that problem definition
must address whether the assessment will adequately protect public health and the environment. Another
expert stressed the importance of problem formulation, because not doing so risks running analyses or
engaging resources needlessly. One participant commented that the importance of representativeness
varies with the level (or tier) of the assessment. For example, if data are to be used in a screening
manner, then conservativeness may be more important than representativeness. If data are to be used in
something other than screening assessments, the assessor must consider the value added of more complex
analyses (i.e., additional site-specific data collection, modeling). Two experts noted, however, that the
following general problem statement/question would not change with a more or less sophisticated (tiered)
assessment: Under an agreed upon set of exposure conditions, will the population of concern experience
unacceptable risks? A more sophisticated analysis would merely enable a closer look at less
conservative/more realistic conditions.
5-1
image:
5.1.1 What information is required to specify a problem definition fully?
The group agreed that when defining any problem, the "fundamental who, what, when, where,
Why, and how" questions must be answered. One individual noted that if assessors answer these
questions, they will be closer to determining if data are representative. The degree to which each basic
question is important is specific to the problem or situation. Another reiterated the importance of
remembering that the premier consideration is public health protection; he noted that if only narrow
issues are discussed^ the public health impact may be overlooked.
The group concurred that the problem must be defined in terms of location (space), time (over
what duration and when in time), and population (person or unit). Some of these definitions may be
concrete (e.g., spatial locations around a site), while some, like people who live on a brownfield site, may
be more vague (e.g!, because they may change with mobility and new land use). Because the problem
addresses a future context, it must be linked to observable data by a model and assumptions. The
problem definition should include these models and assumptions.
Various experts provided the following specific examples of the questions assessors should
consider at the problem formulation stage of a risk assessment.
» What is the purpose of the assessment (e.g., regulatory decision, setting cleanup
standards)?
• What is the population of interest?
• What type of assessment is being performed (site-specific or generic)?
• How is the assessment information being used? How will data be used (e.g., screening
assessment versus court room)?
• \Vho are the stakeholders?
• What are the budget limitations? What is the cost/benefit of performing a probabilistic
versus a deterministic assessment?
,"" ' , : ' ' lil!"' ' '" ,' i ' ' • / •"
• What population is exposed, and what are its characteristics?
» How, when, and where are people exposed?
» In what activities does the exposed population engage? When does the exposed
population engage in these activities, and for how long? Why are certain activities
performed?
• What type of exposure is being evaluated (e.g., chronic/acute)?
• What is the scenario of interest (e.g., what is future land use)?
• What is the target or "acceptable" level of risk (e.g., 10"2 versus 10'6)?
5-2
image:
" What is the measurement error?
• What is the acceptable level of error?
• What is the geographic scale and location (e.g., city, county)?
" What is the scale for data collection (e.g., regional/city, national)?
• What are site/region-specific issues (e.g., how might a warm climate or poor-tasting
. .water affect drinking water consumption rates)?
• What is the temporal scale (day, year, lifetime)?
» What are the temporal characteristics of source emissions (continuous)?
• What is/are the route(s) of exposure?
.» What is the dose (external, biological)?
• What is/are the statistic(s) of interest (e.g., mean, uncertainty percentile)?
• What is the plausible worst case?
• What is the overall data quality?
• What models must be used?
• What is the measurement error?
• When would results change a decision?
Many of the preceding questions are linked closely to defining representativeness. One subgroup
compiled a list of key elements that are directly related to these types of questions when defining
representativeness (see textbox on page 5-4).
5.1.2 What constitutes representativeness (or lack thereof)? What is "acceptable
deviation"?
Several of the experts commented that, fundamentally, representativeness is a function of the
quality of the data but reiterated that it depends ultimately on the overall assessment objective. Almost
all data used in risk assessment fail to be representative in one or more ways. At issue is the effect of the
lack of representativeness on the risk assessment. One expert suggested that applying the established
concepts of EPA's data quality objective/data quality assessment process would help assessors evaluate
data representativeness. Because populations are not fixed in time, one expert cautioned that if a data set
is too representative, the risk assessment may be precise for only a moment. Another stressed the
importance of taking a credible story to the risk manager. In that context, "precise representativeness"
may be less important than answering the question of whether we are being protective of public health. It
5-3
image:
Sources of Variability and Uncertainty Related to the Assessment of Data Representativeness
EPA policy sets the standard that risk assessors should seek to characterize central tendency and plausible upper
bounds on both individual risk and population risk for the overall target population as well as for sensitive
subpopulations. To this extent, data representativeness cannot be separated from the assessment endpoint(s).
Following are some key elements that may affect data representativeness. These elements are not mutually
exclusive.
Exposed Population
General target population
Particular ethnic group
Known sensitive subgroup (e.g., children, elderly, asthmatics)
Occupational group (e.g., applicators)
Age group (e.g., infant, child, teen, adult, whole life)
Gender
Activity group (e.g., sport fishermen, subsistence fishermen)
Geographic Scale, Location
Trends (e.g., stationary, nonstationary behaviors)
Past, present, future exposures
Lifetime exposures
Less-than-lifetime exposures (e.g., hourly, daily, weekly, annually)
Temporal characteristics of source(s) (e.g., continuous, intermittent, periodic, concentrated,
random)
Exposure Route
Inhalation
Ingestion (e.g., direct, indirect)
Dermal (direct) contact (by activity; e.g., swimming)
Multiple pathways
Exposure/Risk Assessment Endpoint
Cancer risk
Noncancer risk (margin of exposure, hazard index)
Potential dose, applied dose, internal dose, biologically effective dose
Risk statistic
Mean, uncertainty percentile of mean
Percentile of a distribution (e.g., 95th percentile risk)
Uncertainty limit of variability percentile (upper confidence limit on 95th percentile
risk)
Plausible worst case, uncertainty percentile of plausible worst case
Data Quality Issues
Direct measurement, indirect measurement (surrogates)
Modeling uncertainties
Measurement error (accuracy, precision, bias)
Sampling error (sample size, non-randomness, independence)
Monitoring issues (short-term, -long-term, stationary, mobile)
5-4
image:
is important to understand whether a lack of representativeness could mean the risk assessment results
fail to protect public health or that they grossly overestimate risks.
One participant expressed concern that assessors feel deviations from representativeness can be
measured. In reality, risk assessors may more often rely on qualitative or semiquantitative ways of
describing that deviation. Another expert emphasized that assessors often have no basis on which to
judge the representativeness of surrogate data (e.g., drinking water consumption), because rarely is local
data available for comparison. Therefore, surrogate data, must be accepted or modified based on some
qualitative information (e.g., the local area is hotter than that which the surrogate data is based).
The experts provided the following views on what constitutes representativeness and/or an
acceptable level of non-representativeness. These views were communicated during small group and
plenary discussions.
Nearly consistent with the definition in the issue paper, representativeness was defined by one
subgroup as "the degree to which a value for a given endpoint adequately describes the value of that
endpoint(s) likely seen in the target population." The term "adequately" replaces the terms "accurately
and precisely" in the issue paper definition. One expert suggested changing the word representative to
"useful and informative." The latter terms imply that one has learned something from the surrogate
population. For example, the assessor may not prove the data are the same, but can, at minimum, capture
the extent to which they differ. The term non-representativeness was defined as "important differences
between target and surrogate populations with respect to the risk assessment objectives." Like others, this
subgroup noted that the context of observation is important (e.g., what is being measured: environmental
sample [water, air, soil] versus human recall [diet] versus tissue samples in humans [e.g., blood]).
Assessors must ask about internal sample consistency, inappropriate methods, lack of descriptors (e.g.,
demographic, temporal), and inadequate sample size for targeted measure.
The group agreed, overall, that assessing adequacy or representativeness is inherently subjective.
However, differing opinions were offered in terms of how to address this subjectivity. Several
participants stressed the importance of removing subjectivity to the extent possible but without making
future guidance too rigid. Others .noted, however, that expert judgment is and must remain an integral
part of the assessment process.
A common theme communicated by the experts was that representativeness depends on how
much uncertainty and variability between the population of concern and the surrogate population the
assessor is willing to accept. What is "good enough" is case specific, as is the "allowable error." Several
experts commented that it is also important for assessors to know if they are comparing data means or
tails. One expert suggested reviewing some case studies using assessments done for different purposes to
illuminate the process of defining representativeness. "With regard to exposure factors, we [EPA] need
to do a better job at specifying or providing better guidance on how to use the data that are available."
For example, the soil ingestion data for children are limited, but they may suffice to provide an estimate
of a mean. These data are not good enough to support a distribution or a good estimate of a high-end
value, however. -
One subgroup described representativeness/non-representativeness as the degree of bias between
a data set and the problem. For example:
5-5
image:
Scenario: Is a future Residential scenario appropriate to the problem?" For prospective risk
assessment, there are usually irreducible uncertainties about making estimates
about a future unknown population, therefore, a certain amount of modeling
must occur.
Model: Is a.multiplicative, independent variable model appropriate? tlriceriliritibs in the
rfibtlei call contribute to hoh-represehtativfehess (e.g., it might not apply, it may
be iVroitig, or calculations may be incorrect).
J!1' : , ../ ' * n '
fariaties: Is a particular study appropriate to the problem at hand—are the variables
biased, uncertain? It may be easy to get confused about distinctions between bias
(of inaccuracies), prebision/imprecisioh, and representativeness/noil
representativeness. It is often assumed that a "representative" data set is one that
has been obtained with a certain amount of randomisation. More often, however,
data that meet this definition are not available.
the group spokesperson explained that a weli-designed and controlled randomized study
yielding two results can be "representative" of the mean and dispersion but highly imprecise.
Imprecision and representativeness are therefore different, but related, the central tendency of the
distribution may be accurately estimated, but the Upper percentile may not.
In summary, when assessing representativeness, the group agreed that emphasis should be placed
Oil the adequacy of the data and how useful and informative a data set is to the defined problem, the
gfbtip af fe'gd that these terms are more appropriate than "accuracy and precision" in defining
representative data in the context of a risk assessment, the importance of considering end Use of the data
was stressed and was a recurring theme in the discussions (i.e., how much representativeness is needed to
aftmver the problem). Because the subject population is often a moving target with unpredictable
direction in terms of its demographics and conditions of exposure, one expert commented that, In some
cases, representativeness of a givert data set may not be a relevant concept and generic models may be
ntore appropriate.
5.1.3
coisideratioHs should be included iti, added to, or excluded from the
j '
ch
ec
More than half the experts indicated that the checklists in Issue Paper 1 are useful for evaluating
representativeness One expert noted that regulators aire often forced to make decisions without
information. A checklist helps the assessor/risk manager evaluate the potential importance of missing
exposure data. One expert re-emphasized the importance of allowing for professional judgement and
expert elicitation when evaluating exposure data. Another panelist concurred, commenting that this type
of the checklist is preferred over prescriptive guidance. Several of the experts noted, however, that
checklists could be improved and offered several recommendations.
the group agreed that the checklist should be flexible for various problems and that users should
be directed to consider the purpose of the risk assessment. The assessor must know the minimum
requirements for a screening versus a probabilistic assessment. As one expert said, the requirements for
i screening level assessment rtiiist differ from those for a full-blown risk assessment: t>o I have enough
information about the population (e.g., type, space, time) to answer the questions at this tier, and is that
5-6
image:
information complete enough to make a management decision? Do I need to go through all the checklists
before I can stop?
Instead of the binary (yes/no) and linear format of the checklists, several individuals suggested a
flowchart format centered on the critical elements of representativeness (i.e., a "conditional"
checklist)—to what extent does the representativeness of the data really matter? A flowchart would
allow for a more iterative process and would help the assessor work through problem-definition issues.
One expert suggested developing an interactive Web-based flowchart that would be flexible and context-
specific. Another agreed, adding that criteria are needed to guide the assessor on what to do if
information is not available._As one expert noted, questions should focus on the outcome of the risk
assessment. The assessor needs to evaluate whether the outcome of the assessment changes if the
populations differ.
One of the experts strongly encouraged collecting more/new data or information. Collection of
additional data, he noted, is needed to improve the utility of these checklists. Another participant
suggested that the user be alerted to the qualities of data that enable quantifying uncertainty and
reminded that the degree of representativeness cannot be defined in certain cases. When biases due to
lack of representativeness are suspected, how can assessors judge the direction of those biases?
In addition to general comments and recommendations, several individuals offered the following
specific suggestions for the checklists:
• ' Clarifying definitions (e.g., internal versus external).
• Recategorizing. For example, use the following five categories: (1) interpreting
measurements (more of a validity than representative issue), (2) evaluating whether
sampling bias exists, (3) evaluating statistical sampling error, (4) evaluating whether the
study measured what must be known, and (5) evaluating differences in the population.
The first three issues are sources, of internal error, the latter two are sources of external
representativeness.
• Reducing the checklists. Several experts suggested combining Checklists II, III, and IV.
• Combining temporal, spatial, and individual categories. Avoid overlap in questions. For
example, when overlap exists (e.g., in some spatial and temporal characteristics), which
questions in the checklist are critical? A Web-based checklist, with the flow of questions
appropriately programmed, could be designed to avoid duplication of questions.
• Including other populations of concern (e.g., ecological receptors).
• Including worked examples that demonstrate the criteria for determining if a question is
answered adequately and appropriately. These examples should help focus the risk
assessor on the issues that are critical to representativeness.
• Separating bias and sampling quality and extrapolation from reanalysis and
reinterpretation.
5-7
image:
» Asjdng the following additional questions:
— JT Relative to application, is there consistency in the survey instruments used to
collect the exposure data? How was measurement error addressed?
— =• Is the sample representative enough to bound the risk?
rrm," Are data available on population characterization factors (e.g., age, sex)?
— What is known about the population of concern relative to the surrogate
population? (If the population of concern is inadequately characterized,, then the
ability to consider the representativeness of the surrogate data is limited, and
meaningless adjustment may result).
In summary, the group agreed on the utility of the checklists but emphasized the need to include
in them decision criteria (i.e., how do we know if we have representative/non-representative data?) A
brief discussion on the need to collect data followed. Some experts posed the following questions: How
important is it to have more data? Is the risk assessment really driving decisions? Is more information
needed to make good decisions? Is making risk assessment decisions on qualitative data acceptable?
What data must to be collected, at minimum, to validate key assumptions? The results of the sensitivity
analysis, as one expert pointed out, are key to answering these questions.
5.2 SENSITIVITY
How do we assess the importance of non-representativeness?
In considering the implications of non-representativeness, the group was asked to consider how
one identifies the implications of non-representativeness in the context of the risk assessment. One
expert commented that the term "non-representativeness" may be a little misleading, and as discussed
earlier, finds the terms data adequacy or data useability more fitting to the discussions at hand. The
expert noted that, from a Superfund perspective, data representativeness is only one consideration when
assessing overall data quality or useability. Others agreed. The workshop chair encouraged everyone to
discuss the suitability of the term "representativeness" while assessing its importance during the small
group discussions.
One group described a way in which to assess the issue of non-representativeness as follows:
The assessor must check the sensitivity of decisions to be made as a result of the assessment. That is,
under a range of plausible adjustments, will the risk decision change? Representativeness is often not
tfoat important because risk management decisions depend on a range of target populations under various
Scenarios. A few of the experts expressed concern that problems will likely arise if the exposure assessor
is separated from decision makers. One person noted that often times an exposure assessment will be
dpne absent of a specific decision (e.g., nonsite, non-Superfund situation). Another noted that in the
pesticide program situations occur in which an exposure assessment is done before toxicity data are
available. Such separations may be unavoidable. Another expert emphasized that any future guidance
should stress the importance of assessors being cognizant of data distribution needs even if the assessors
are removed from the decision or have limited data.
image:
One individual noted that examples would help. The assessor should perform context-specific
sensitivity analysis. It would help to. develop case studies and see how sensitivity analysis affects
application (e.g., decision focus).
Another group discussed sensitivity analysis in the context of a tiered approach. For the first tier,
a value that is "biased high" should be selected (e.g., 95th percentile upper bound). The importance of a '
parameter (as evidenced by a sensitivity analysis) is determined first, making the representativeness or
non-representativeness of the nonsensitive parameters unimportant. For the second tier (for sensitive
parameters), the assessor must consider whether averages or high end estimates are of greater
importance. This group presented an example using a corn oil scenario to illustrate when differences
between individuals (e.g. high end) and mixtures (averages) may be important. Because corn oil is a
blend with input from many ears of corn, if variability exists in the contaminant concentrations in
individual ears of corn, then corn oil will typically represent some type of average of those
concentrations. For such a mixture, representativeness is less of an issue. It is not necessary to worry
about peak concentrations in one ear of corn. Instead, one would be interested in situations which might
give rise to a relatively high average among the many ears of corn that comprise a given quantity of corn
oil. If one is considering individual ears of corn, it becomes more important to have a representative
sample; the tail of the distribution becomes of greater interest.
A third subgroup noted that, given a model and parameters, assessors must determine whether
enough data exist to bound the estimates: If they can bound the estimates, a sensitivity analysis is
performed with the following considerations: (1) identify the sensitive parameters in the model; (2)
focus on sensitive parameters and evaluate the distribution beyond the bounding estimate (i.e., identify
the variability of these parameters) for the identified sensitive parameters; (3) evaluate whether the
distribution is representative; and (4) evaluate whether more data should be collected or if an adjustment
is appropriate.
Members of the remaining subgroup noted, and others agreed, that a "perfect" risk assessment is
not possible. They reiterated that it is key to evaluate the data in the context of the decision analysis.
Again^ what are the consequences of being wrong, and what difference do decision errors make in the
estimate of the parameter being evaluated? This group emphasized that the question is situation-specific.
In addition, they noted the need for placing bounds on data used.
One question asked'throughout these discussions was "Are the data good enough to replace an
existing assumption and, if not, can we obtain such data?" One individual again stressed the need for
"blue chip" distributions at the national level (e.g., inhalation rate, drinking water). Another expert
suggested adding activity patterns to the list of needed data.
In summary, the group generally agreed that the sensitivity of the risk assessment decision must
be considered before non-representativeness is considered problematic. In some cases, there may not be
an immediate decision, but good distributions are still important.
How can one do sensitivity analysis to evaluate the implications of non-representativeness?
The workshop chair asked the group to consider the mechanics of a sensitivity analysis. For
example, is there a specific statistic that should be used, or is it decision dependent? One expert
responded by noting that sensitivity analysis can be equated to partial correlation coefficients (which are
5-9
image:
internal to a model). He noted, however, that sensitivity analysis in the context of exposure assessment is
more "bottom line" sensitivity (i.e., if an assumption is changed, how does the change affect the bottom
line?). The focus here is more external—what happens when you change the inputs to the model (e.g.,
the distributions)? Another pointed to ways in which to perform internal sensitivity analysis. For
example, the sensitivity of uncertainty can be separated out from the sensitivity of the variability
Component (see William Huber's premeeting comments on sensitivity). Another expert stressed,
however, that sensitivity analysis is inherently tied to uncertainty; it is not tied to variability unless the
Variability is uncertain. It was noted that sensitivity analysis is an opportunity to view things that are
subjective. Variability, in contrast is inherent in the data, unless there are too few data to estimate
variability sufficiently. One expert commented that it is useful to know which sources of variability are
fnost important in determining exposure and risk.
One individual voiced concern regarding how available models address sensitivity. Another
questioned whether current software (e.g., Crystal Ball® and @Risk®) covers sensitivity coefficients
adequately (i.e., does it reflect the depth and breadth of existing literature?).
Lastly, the group discussed sensitivity analysis in the context of what we know now and what we
need to know to improve the existing methodology. Individuals suggested the following:
.11; .'..,.., ' ,: , ' ' • .
* Add the ability to classify sample funs to available software. Classify inputs and
evaluate the effect on outputs.
» Crystal Ball® and @Risk® are reliable for many calculations, but one expert noted they
may not currently be useful for second-order estimates, nor can they use time runs. Time
series analyses are particularly important for Food Quality Protection Act (FQPA)
evaluations.
» Consider possible biases built into the model due to residuals lost during regression
ati|lyses. This factor is important to the sensitivity of the model prediction.
,>, ''! ' '; '" i ' ' "''" ' „' ' ' : ' ',:" '"V"' ".' ;,!' - ;,/ ,"!
"i1 ' ' ', J , ' „; i
One expert pointed out that regression analyses can introduce bias because residuals are often
dropped out. Others agreed that this is an important issue. For example, it can make an order-of-
iriagnitude difference in body weight and surface area scaling. Another expert stated that this issue is of
special interest for work under the FQPA, where use of surrogate data and regression analysis is
receiving more and more attention. Another expert noted that "g-estimation" looks at this issue. The
group revisited this issue during their discussions on adjustment.
5.3 ADJUSTMENT
How can one adjust the sample to better represent the population of interest?
The experts addressed adjustment in terms of population, spatial, and temporal characteristics.
The group was asked to identify currently available methods and information sources that enable the
quantitative adjustment of surrogate sample data. In addition, the group was asked to identify both short-
and long-term research needs in this area. The workshop chair noted that the issue paper only includes
discussion on adjustments to account for time-scale differences. The goal, therefore, was to generate
some discussion on spatial and population adjustments as well. Various approaches for making
5-10
image:
adjustments were discussed, including general and mechanistic. General approaches include those that
are statistically-, mathematically-, or empirically-based (e.g., regression analysis). Mechanistic
approaches would involve applying a theory specific to a problem area (e.g., a biological, chemical, or
physical model).
Some differing opinions were provided as to how reliably we can apply available statistics to
adjust data. In time-space modeling, where primary data and multiple observations occur at different
spatial locations or in multiple measures over time, one expert noted that a fairly well-developed set of
analytic methods exist. These methods would fall under the category of mix models, kreiging studies for
spatial analysis, or random-effects models. The group agreed that extrapolating or postulating models are
less well-developed. One person noted that classical statistics fall short because they do not apply to
situations in which representativeness is a core concern. Instead, these methods focus more on the
accuracy or applicability of the model. The group agreed that statistical literature in this area is growing.
Another individual expressed concern that statistical tools and extrapolations introduce more
uncertainty to the assessment. This uncertainty may not be a problem if the assessor has good
information about the population of concern and is simply adjusting or reweighing the data, but when the
assessor is extrapolating the source term, demographics, and spatial characteristics simultaneously, more
assumptions and increasing uncertainty are introduced.
In general, the group agreed that a model-based approach has merit in certain cases. The
modeled approach, as one expert noted, is a cheap and effective approach and likely to support
informed/more objective decisions. The group agreed that validated models (e.g., spatial/fate and
transport models) should be used. Because information on populations may simply be unavailable to
validate some potentially useful models, several participants reemphasized the need to collect more data,
which was a recurring workshop theme.
One expert pointed out that the assessor must ask which unit of observation is of concern. For
example, when evaluating cancer risk, temporal/spatial issues (e.g., residence time) are less important.
When evaluating developmental effects (when windows of time are important), however, the
temporal/spatial issues are more relevant. Again, assessors must consider the problem at hand before
identifying the unit of time.
From a pesticide perspective, it was noted that new data cannot always be required of registrants.
When considering the effects of pesticides, for example, crop treatment rates change over time. As a
result, bridging studies are used to link available application data to crop residues (using a multiple linear
regression model).
One expert stressed the importance and need for assessors to recognize uncertainty. Practitioners
of probabilistic assessment should be encouraged to aggressively evaluate and discuss the uncertainties in
extrapolations and their consequences. Often, probabilistic techniques can provide better information for
better management decisions. The expert pointed out that, in some cases, one may not be able to assign a
distribution, or one may choose not to do so because it would risk losing valuable information. In those
cases, multiple scenarios and results reported in a nonprobabilistic way (both for communication and
management decisions) may be appropriate.
At this point, one expert suggested that the discussion of multiple scenarios was straying from
the basic question to be answered— "If I have a data set that does not apply to my population, what do I
5-11
image:
need to do, if anything?" Others disagreed, noting that it may make sense to run different scenarios and
evaluate the difference. If a different scenario makes a difference, more data must be collected. One
expert argued, however, that we cannot wait to observe trends; assessors must predict the future based on
a "snapshot" of today.
One expert suggested the following hierarchy when deciding on the need to refine/adjust data:
if Can the effect be bounded? If yes, no adjustment is needed.
• If the bias is conservative, no adjustment is needed.
• Use a simple model to adjust the data.
» If adjustments fail, resample/collect more data, if possible.
The group then discussed the approaches and methods that are currently available to address non-
representative data, and indicated that trie following approaches are viable:
1. Start with brainstorming. Obtain stakeholder input to determine how the target
population differs from the population for which you have data.
2 Look at covariates to get an idea of what adjustment might be needed. Stratify data to
see if correlation exists. Stratification is a good basis for adjustments.
3. Use "kreiging" techniques (deriving information from one sample to a smaller, sparser
data set). Kreiging may not fully apply to spatial, temporal, and population adjustments,
however, because it applies to the theory of random fields. Kreiging may help improve
the accuracy of existing data, but it does not enable extrapolation.
4. Include time-steps in models to evaluate temporal trends.
5. Use the "plausible extrapolation" model. This model is acceptable if biased
conservatively.
6. Consider spatial estimates of covariate data (random fields).
7 Use the scenario approach instead of a probabilistic approach.
8. Bayesian statistical methods may be applicable and relevant.
One expert presented a briefcase study as an example of Bayesian analysis of variability
and uncertainty and use of a covariate probability distribution model based on regression
to allow extrapolation to different target populations. The paper he summarized,
'fBayesian Analysis of Variability and Uncertainty on Arsenic Concentrations in U.S.
Public Water Supplies," and supporting overheads, are in Appendix G. The paper
describes a Bayesian methodology for estimating the distribution and its dependence on
cb'Variates. Posterior distributions were computed using Markov Chain Monte Carlo
(MCMC). In this example, uncertainties and variability were associated with time issues
5-12
image:
and the self-selected nature of arsenic samples. After briefly reviewing model
specifications and distributional assumptions, the results and interpretations were
presented, including a presentation of MCMC output plots and the posterior cumulative
distribution of source water. The uncertainty of fitting site-specific data to the national
distribution of arsenic concentrations was then discussed. The results suggest that
Bayesian methodology powerfully characterizes variability and uncertainty in exposure
factors. The probability distribution model with covariates provides insights and a basis
for extrapolation to other targeted populations or subpopulations. One of the main points
of presenting this methodology was to demonstrate the use of covariates. This case study
showed that you can fit a modei with covariates, explicitly account for residuals (which
may be important), and apply that same model to a separate subpopulation where you
know something about the covariates. According to the presenter, such an approach helps
reveal whether national data represent local data.
When evaluating research needs, one expert pointed out that assessors should identify the
minimal amount of information they need to analyze the data using available tools. The group offered
the following suggestions for both short and long-term research areas. The discussion of short-term
needs also included recommendations for actions the assessors can take now or in the short term to
address the topics discussed in this workshop.
Short-term research areas and actions
1. Design studies for data collection that are amenable to available methods for data
analysis. Some existing methods are unusable because not all available data, which were
used to support the methods, are from well-designed studies.
2. Validate existing models on population variability (e.g., the Duan-Wallace model
[Wallace et al., 1994] and models described by Buck et al. [1995]). This validation can
be achieved by collecting additional data.
3. Run numerical experiments to test existing and new methods for making adjustment
based on factors such as averaging times or area. Explore and evaluate the Duan-Wallace
model.
4. Hold a separate workshop on adjustment methods (e.g., geostatistical and time series
methods). Involve the modelers working with these techniques on a cross-disciplinary
panel to learn how particular techniques might apply to adjustment issues that are
specific to risk assessment.
5. Provide guidelines on how to evaluate or choose an available method, instead of simply
describing available techniques. These guidelines would help the assessor determine
whether a method applies to a specific problem.
6. To facilitate their access and utility, place national data on the Web (e.g., 3-day CSFII
data, 1994-1996 USDA food consumption data). Ideally, the complete data set, not just
summary data, could be placed on the Web because data in summary form is difficult 'to
analyze (e.g., to fit distributions).
5-13
image:
Possible long-term research areas
I, Collect additional exposure parameter data on the national and regional levels (e.g., "blue
chip" distributions). One expert cautioned that some sampling data have been or may be
ccjljected by field investigators working independently of risk assessment efforts.
Therefore, risk assessors should have input in methods for designing data collection.
j ' : i , • • ..'I ' ; ' ; ' •'; ;. ' ,';";-' •,'; • ';•'• , ' '•: : '.' .,*•! '. : '••• .
1, Perform targeted studies (spatial/temporal characteristics) to update existing data.
Discussions of adjustment ended with emphasis on the fact that adjustment and the previously
described methods only need be considered if they impact the endpoint. One expert reiterated that when
no quantitative or objective ways exist to adjust the surrogate data, a more generalized screening
approach should be used.
As a follow-up to the adjustment discussions, a few individuals briefly discussed the issue of
"bias/loss function" to society. Because this issue is largely a policy issue, it only received brief
attention. One expert noted that overconservatism is undesirable. Another stressed that it is not in the
public interest to extrapolate in the direction of not protecting public health; assessors should apply
conservative bias but make risk managers aware of the biases. The other expert countered that blindly
applying conservative assumptions could result in suboptimal decisions, which should not be taken
lightly. In general, the group agreed on the following point: Assessors should use their best scientific
judgment and strive for accuracy when considering representativeness and uncertainty issues. Which
choice will ensure protection of public health without unreasonable loss? It was noted that the cost of
ovtrconservatism should drive the data-collection push (e.g., encourage industry to contribute to data
collection efforts because they ultimately pay for conservative risk assessments).
T • ' „ '"'i! ' §i '•"'," !•'•','•'•• , _
5.4 SUMMARY OF EXPERT INPUT ON EVALUATING REPRESENTATIVENESS
'I* ' ii ' '' „ i' ' , , ' '
nil",': ; , , '.; • / • i • •> • ...••. «, • . • : ;;, -
Workshop discussions on representativeness revealed some common themes. The group
generally agreed that representativeness is context-specific. Methods must be developed to ensure
representativeness exists in cases where lack of representativeness would substantially impact a risk-
management decision. Methods, the sensitivity analysis, and the decision endpoint are closely linked.
One expert warned that once the problem is defined, the assessor must understand how to use statistical
tools properly to meet assessment goals. Blind application of these tools can result in wrong answers
(e.g., examining the tail versus the entire curve).
One or more experts raised the following issues related to evaluating the quality and
"representativeness" of exposure factors data:
" Representativeness might be better termed "adequacy" or "usefulness."
» Before evaluating representativeness, the risk assessor, with input from stakeholders,
must define the assessment problem clearly.
<• ' ' . "'"Bill ' .. ' , 'v1 '. I ; , ' » , :
• Ho data are perfect; assessors must recognize this fact, clearly present it in their
assessments, and adjust non-representative data as necessary using available tools. The
5-14
image:
assessors must make plausible adj ustments if non-representativeness matters to the
endpoint.
• To perform a probabilistic assessment well, adequate data are necessary, even for an
assessment with a well-defined objective. In large part, current exposure distribution data
fall short of the risk assessors' needs. Barriers to collecting new data must be identified,
then removed. Cost limitations were pointed out, however. One expert, therefore,
recommended that justification and priorities be established.
• Methods must be sensitive to needs broader than the Superfund/RCRA programs (e.g.,
food quality and pesticide programs).
« When evaluating the importance of representativeness and/or adjusting for non-
representativeness, the assessor needs to make decisions that are adequately protective of
public health while still considering costs and other loss functions. Ultimately, the
assessor should strive for accuracy.
Options for the assessor when the population of concern has been shown to have different habits
than the surrogate population were summarized as follows: (1) determine that the data are clearly not
representative and cannot be used; (2) use the surrogate data and clearly state the uncertainties; or (3)
adjust the data, using what information is available to enable a reasonable adjustment.
5-15
image:
1. r,l|, ...... I'll,, ....... ...... it .......... I ,- ....... .
....... !i .I,,-,,!,,., ..... 'Jil !
.-,:.. • ....... :"Mlr • ...... H : ' • . t •_;, 'ii (-, '.,-fl'
, :\\')i.n.>] iV.lii -'Hii'll'Q (j;s
:;>*>':>[" I' I: '<(.'» I'*';?' ; ..... PI: -*ft'f ;>i"Ti": ...... !" I I ' ",t?f ]',)'.' I •!,-;!""'
:l!"i",! li'fsrj'ibi^fJft'' l*->!? :>V['(i/? rf^'lt*!'! t'JHrti
il'f j, "i,,,, ;;•;•; ijii'i"! ..... »", ..... "U '.n'f 'V'f'' ..... i vf't:i'j!'j hu
f" ....... ;•;. -1!,; ::; -r ( ;i;f, ' •/ 1.-.;;
'"! JO'-:::
;',? !;>!'"»', -J-jh ,
;:'iai' .(."!;. I
image:
SECTION SIX
EMPIRICAL DISTRIBUTION FUNCTIONS AND RESAMPLING
VERSUS PARAMETRIC DISTRIBUTIONS
Assessors often must understand and judge the use of parametric methods (e.g., using such
theoretical distribution functions as the Lognormal, Gamma, or Weibull distribution) versus non-
parametric methods (using an EDF) for a given assessment. The final session of the workshop was
therefore dedicated to exploring the strengths and weaknesses of EDFs and issues related to judging the
quality of fit for theoretical distributions. Discussions centered largely on the topics in Issue Paper 2 (see
Appendix A for a copy of the paper and Section 3 for the workshop presentation of the paper). This
section presents a summary of expert input on these topics.
Some of the experts thought the issue paper imposed certain constraints on discussions because it
assumed that: (1) no theoretical premise exists for assuming a parametric distribution, and (2) the data
are representative of the exposure factor in question (i.e., obtained as a simple random sample and in the
proper scale). These experts noted that many of the assertions in the issue paper do not exist in reality.
For example, it is unlikely to find a perfectly random sample for exposure parameter data.
As a result, the discussions that followed covered the relative advantages and disadvantages of
parametric and non-parametric distributions under a broader range of conditions.
6.1 SELECTING AN EDF OR PDF
Experts were asked to consider the following questions.
What are the primary considerations in choosing between the use of EDFs and theoretical
PDFs? What are the advantages of one versus the other? Is the choice a matter of preference?
Are there situations in which one method is preferred over the other? Are there cases in which
neither method should be used?
The group agreed that selecting an EDF versus a PDF is often a matter of personal preference or
professional judgment. It is not a matter of systematically selecting either a PDF- or EDF-based
approach for every input. It was emphasized that selection of a distribution type is case- or situation-
specific. In some cases, both approaches might be used in a single assessment. The decision, as one
expert pointed out, is driven largely by data-rich versus data-poor situations. The decision is based also
on the risk assessment objective. Several experts noted that the EDF and PDF have different strengths in
different situations and encouraged the Agency not to recommend the use of one over the other or to
develop guidance that is too rigid. Some experts disputed the extent to which a consistent approach
should be encouraged. While it was recognized that a consistent approach may benefit decision making,
the overall consensus was that too many constraints would inhibit the implementation of new/innovative
approaches, from which we could learn.
Technical discussions started with the group distinguishing between "bootstrap" methods and
EDFs. One expert questioned if the methods were synonymous. EDF, as one expert explained, is a
specific type of step-wise distribution that can be used as a basis for bootstrap simulations. EDF is one
6-1
image:
way to describe a distribution using data; bootstrapping enables assessors to resample that distribution in
a special way (e.g., setting boundaries on the distribution of the mean or percentile) (Efron and
Tibshirani, 1993). Another expert distinguished between a parametric and non-parametric bootstrap,
stating that there are good reasons for using both methods. These reasons are well-covered in the
statistical literature. One expert noted that bootstrapping enables a better evaluation of the uncertainty of
the distribution.
Subsequent discussion focused on expert input on deciding which distribution to fit, if any, for a
given risk assessment problem. That is, if the assessor has a data set that must be represented, is it. better
to use the data set as is and not make any assumptions or to fit the data set to a parametric distribution?
The following is a compilation of expert input.
» Use of the EDF. The use of an EDF may be preferable (1) when a large number of data
points exists, (2) when access is available to computers with high speed and storage
capabilities, (3) when no theoretical basis for selecting a PDF exists, or (4) when a
"perfect" data set is available. With small data sets, it was noted that the EDF is unlikely
to represent an upper percentile adequately; EDFs are restricted to the range of observed
data. One expert stated that while choice of distribution largely depends on sample size,
in most cases he would prefer the EDF.
When measurement or response error exists, one expert pointed out that an EDF should
not be used before looking at other options.
« Use of the PDF. One expert noted that it is easier to summarize a large data set with a
PDF as long as the fit is reasonable. Use of PDFs can provide estimates of "tails" of the
distribution beyond the range of observed data. A parametric distribution is a convenient
way to concisely summarize a data set. That is, instead of reporting the individual data
values, one can report the distribution and estimated parameter values of the distribution.
While data may not be generated exactly according to a parametric distribution,
evaluating parametric distributions may provide insight to generalizable features of a
data set, such as moments, parameter values, or other statistics. Before deciding which
distribution to use, two experts pointed out the value of trying to fit a parametric
distribution to gain some insight about the data set (e.g., how particular parameters may
be related to other aspects of the data set). These experts felt there is great value in
examining larger data sets and thinking about what tools can be used to put data into
better perspective. Another expert noted that the PDF is easier to defend at a public
meeting or in a legal setting because it has some theoretical basis.
» Assessing risk assessment outcome. The importance of understanding what the
implications of the distribution choice are to the outcome of the risk assessment was
stressed. An example of fitting soil ingestion data to a number of parametric and non-
parametric distributions yielded very different results. Depending on which distribution
was used, cleanup goals were changed by approximately 2 to 3 times. Therefore, the
cbqice may have cost implications.
"' i 'i '' ' ':
• Assuming all data are empirical. One expert felt strongly that all distributions are
efiipirical. In data poor situations, why assume that the data are Lognormal? The data
6-2
image:
could be bimodal in the tails. If a data set is assumed to be empirical, there is some
control as to how to study the tails. Another expert reiterated that using EDFs in data
poor situations (e.g., six data points) does not enable simulation above or below known
data values. One expert disagreed providing an example that legitimizes the concern for
assuming that data fit a parametric distribution. He noted that if there is no mechanistic
basis for fitting a parametric distribution, and a small set of data points by chance are at
the lower end of the distribution, the 90th percentile estimate will be wrong.
Evaluating uncertainty. Techniques for estimating uncertainty in EDFs and PDFs were
discussed. The workshop chair presented an example in which he fit a distribution for
variability to nine data points. He then placed uncertainty bands around the distributions
(both Normal and Lognormal curves) using parametric bootstrap simulation. (See Figure
6-1). For example, bands were produced by plotting the results of 2,000 runs of a
synthetic data set of nine points sampled randomly from the Lognormal distribution fitted
to the original data set. The wide uncertainty (probability) bands indicate the confidence
in the distribution. This is one approach for quantifying how much is known about what
is going on at the tails, based on random sampling error. When this exercise was
performed for the Normal distribution, less uncertainty was predicted in the upper tail;
however, a lower, tail with negative values was predicted, which is not appropriate for a
non-negative physical quantity such as concentrations. The chair noted that, if a stepwise
EDF had been used, high and low ends would be truncated and tail concentrations would
not have been predicted. This illustrates that the estimate of uncertainty in the tails
depends on which assumption is made for the underlying distribution. Considering
uncertainty in this manner allows the assessor to evaluate alternative distributions and
gain insight on distinguishing between variability and uncertainty in a "2-dimensional
probabilistic framework." Several participants noted that this was a valuable example.
Figure 6-1: Variability and Uncertainty in the Fit of Lognormal Distribution to a.
Data Set of n=9 (Frey, H.C. and D.E. Burmaster, 1998)
Data Set 2
Method of Matching Moments
Bootstrap Simulation
8=2,000, n=2,000
-I • r
0.2 0.4 0.6 0.8
PCB Concentration (ng/g, wet basis)
6-3
image:
Extrapolating beyond the range of observable data. The purpose of the risk analysis
drjyes what assessors must know about the tails of the distribution. One expert
emphasized that the further assessors go into the tails, the less they know. Another
stressed that once assessors get outside the range of the data, they know nothing.
Another expert disagreed with the point that assessors know nothing beyond the highest
data point. He suggested using analogous data sets that are more data rich to help in
predicting the tails of the distribution. The primary issue becomes how much the
assessors are willing to extrapolate.
• ,;; •' ,"!• '• : "\ ' '• ' •• • • • • .,j' •;.•'. •/; .': ' ..... ' . ' .
Several experts agreed that uncertainty in the tails is not always problematic. If the
assessor Wants to focus on a subgroup, for example, it is not necessary to look at the tail
of tjjie larger group. Stratification, used routinely by epidemiologist, was suggested. With
stratification, the assessor would look at the subgroup arid avoid having to perform an
exhaustive assessment of the tail, especially for more preliminary calculations used in a
tiered approach. In a tiered risk assessment system, if the assessor assumes the data are
Lognoririal, standard multiplicative equations can be run on a simple calculator. While
Monte Carlo-type analyses can provide valuable information in many cases, several
experts agreed that probabilistic analyses are not always appropriate or necessary. It was
suggested that, in some cases, deterministic scenario-based analyses, rather than Monte
Carlo simulation, would be a useful way to evaluate extreme values for a model output.
".''* • ' : '': .,' ./ ' ::• ' '" ' ' t" ' • " ' . • : : ; ' '
In a situation where a model is used to make predictions of some distribution, several
experts agreed that the absence of perfect information about the tails of the distribution
of each input does not mean that assessors will not have adequate information about the
ta|| of the m°del output. Even if all we have is good information about the central
portions of the input distributions, it may be possible to simulate an extreme value for the
output.
Use of data in the tails of the distribution. One expert cautioned assessors to be sensitive
to potentially important data in the tails. He provided an example in which assessors
re.Jied on the "expert judgement" of facility operators in predicting contaminant releases
from a source, they failed to adequately predict "blips" that were later shown to exist in
20 to 30 percent of the distribution. Another expert noted that he was skeptical about
adding tails (but was not skeptical about setting upper and lower bounds). It was agreed
that, in general, assessors need to carefully consider what they do know about a given
data set that could enable them to set a realistic upper bound (e.g., body weight). The
goal is to provide the risk manager with an "unbiased estimate of risk." One expert
reiterated that subjective judgments are inherent in the risk assessment process. In the
ca$£ of truncating data, such judgments must be explained clearly and justified to the risk
mMiager. In contrast to truncation, one expert reminded the group that the risk manager
decides upon what percentile of the tail is of interest. Because situations arise in which
the' risk manager may be looking for 90th to 99th percentile values, the assessor must
know how to approach the problem and, ultimately, must clearly communicate the
approach and the possible large uncertainties.
Scenarios, The group discussed approaches for evaluating the high ends of distributions
(e.g., the treatment blips mentioned previously or the pica child). Should the strategy for
assessing overall risks include high end of unusual behavior? Several experts felt that
• • • ' ' 6-4
image:
including extreme values in the overall distribution was not justified and suggested that
the upper bounds in these cases be considered "scenarios." As with upper bounds, one
expert noted that low end values also need special attention in some cases (e.g., air
exchange in a tight house).
• Generalized distributions versus mixtures. Expert opinion differed regarding the issue of
generalized versus mixture distributions. One expert was troubled by the notion of a
mixture distribution. He would rather use a more sophisticated generalized distribution.
Another expert provided an example of radon, stating that it is likely a mixture of
Normal distributions, not a Lognormal distribution. Therefore, treatment of mixtures
might be a reasonable approach. Otherwise, assessors risk grossly underestimating risk
in concentrated areas by thinking they know the parametric form of the underlying
distribution.
The same expert noted that the issue of mixtures highlights the importance of having -
some theoretical basis for applying available techniques (e.g., possible Bayesian
methods). Another expert stated that he could justify using distributions that are
mixtures, because in reality many data sets are inherently mixtures.
• Truncation of distributions. Mixed opinions were voiced on this issue. One expert noted
that assessors can extend a distribution to a plausible upper bound (e.g., assessors can
predict air exchange rates because they know at a certain point they will not go higher).
Another expert noted that truncating the distribution by 2 or 3 standard deviations is not
uncommon because, for example, the assessors simply do not want to generate 1,500-
ppund people. One individual questioned, however, whether truncating a Lognormal
distribution invalidates calling the distribution Lognormal. Another commented on
instances in which truncating the distribution may be problematic. For example, some
relevant data may be rejected. Also, the need to truncate suggests that the fit is very
poor. The only reason to truncate, in his opinion, is if one is concerned about getting a
zero or negative value, or perhaps an extremely high outlier value. One expert noted that
truncation clearly has a role, especially when a strong scientific or engineering basis can
be demonstrated.
• When should neither an EDF nor PDF be used? Neither an EDF nor a PDF may be
useful/appropriate when large extrapolations are needed or when the assessor is
uncomfortable with extrapolation beyond the available data points. In these cases,
scenario analyses may come into play.
In their final discussions on EDF/PDF, the group widely encouraged visual or graphical
representation of data. Additional thoughts on visually plotting the data are presented in the following
discussions of goodness of fit.
6.2 GOODNESS-OF-FIT (GoF)
On -what basis should it be decided whether a data set is adequately represented by a fitted
parametric distribution?
6-5
image:
r
The final workshop discussions related to the appropriateness of using available GoF test
statistics in evaluating how well a data set is represented by a fitted distribution. Experts were asked to
CQpsider what options are best suited and how one chooses among multiple tests that may provide
different answers. The following highlights the major points of these discussions.
1 Interpreting poor fit. GoF in the middle of the distribution is not as important as that of
the tails (upper and lower percentiles). Poor fit may be due to outliers at the other end of
the distribution. If there are even only a few outliers, GoF tests may provide the wrong
answer.
Graphical representation of data is key to evaluating goodness or quality of fit.
Unanimously, the experts agreed that using probability plots (e.g., EDF, QQ plots, PP
plots) or other visual techniques in evaluating goodness of fit is an acceptable and
recommended approach. In fact, the group felt that graphical methods should always be
U§ed. Generally, it is easier to judge the quality of fit using probability plots that compare
data to a straight line. There may be cases in which a fit is rejected by a particular GoF
test but appears reasonable when using visual techniques.
The group supported the idea that GoF tests should not be the only consideration in
fitting a distribution to data. Decisions can be made based on visual inspection of the
data. It was noted that graphical presentations help to show quirks in the data (e.g.,
mixture distributions). It was also recommended that the assessor seek the consensus of a
few trained individuals when interpreting data plots (as is done in the medical
community when visually inspecting X-rays or CAT scans).
What is the significance of failing a weak test such as chi-square? Can we justify using
dfftq (hat fail a GoF test? GoF tests may be sensitive to imperfections in the fit that are
not important to the assessor or decision maker. The group therefore agreed that the
fitted distribution can be used especially if the failure of the test is due to some part of
the distribution that does not matter to the analysis (e.g., the lower end of the
distribution). The reason the test failed, however, must be explained by the assessor.
Failing a chi-square test is not problematic if the lower end of the distribution is the
reason for the failure. One expert questioned whether the assessor could defend (in
court) a failed statistical test. Another expert responded indicating that a graphical
presentation might be used to defend use of the data, showing, for example, that the poor
fit was a result of data set size, not chance.
Considerations for risk assessors when GoF tests are used.
"IF | :; i ,
— The evaluation of distributions is an estimation process (e.g., PDFs). Using a
systematic testing approach based on the straight line null hypothesis may be
problematic.
TT-,.. R2 is a poor way to assess GoF.
— The appropriate Joss function must be identified.
6-6
„.,„:- ;,. ii.i. M aiiiiltl Jlii.
image:
— The significance level must be determined before the data are analyzed.
Otherwise, it is meaningless. It is a risk management decision. The risk assessor
and risk manager must speak early in the process. The risk manager must
understand the significance level and its application.
• Should GoF tests be used for parameter estimation (e.g., objective function is to
minimize the one-tail Anderson-Darling)? A degree of freedom correction is needed
before the analysis is run. The basis for the fit must be clearly defined—are the objective
and loss functions appropriate?
• "Maximum likelihood estimation (MLE)" is a well-established statistical tool and
provides a relatively easy path for separating variability from uncertainty.
• The adequacy of Crystal Ball®'s curve-fitting capabilities was questioned. One of the
experts explained that it runs three tests, then ranks them. If the assessor takes this one
step further by calculating percentiles and setting up plots, it is an adequate tool.
• The Office of Water collects large data sets. Some of the office's efforts might provide
some useful lessons into interpreting data in the context of this workshop.
» What do we do if only summary statistics are available? Summary statistics are often all
that are available for certain data sets. The group agreed that MLE can be used to
estimate distribution parameters from summary data. In addition, one expert noted that
probability plots are somewhat useful for evaluating percentile data. Probability plots
enable assessors to evaluate the slope (standard deviation) and the intercept (mean).
Confidence intervals cannot be examined and uncertainty cannot be separated from
variability.
In summary, the group identified possible weaknesses associated with using statistical GoF tests
in the context described above. The experts agreed unanimously that graphical/visual techniques to
evaluate how well data fit a given distribution (alone or in combination with GoF techniques) may be
more useful than using GoF techniques alone.
6.3
SUMMARY OF EDF/PDF AND GoF DISCUSSIONS
The experts agreed, in general, that the choice of an EDF versus a PDF is a matter of personal
preference. The group recommended, therefore, that no rigid guidance be developed requiring one or the
other in a particular situation. The decision on which distribution function to use is dependent on several
factors, including the number of data points, the outcome of interest, and how interested the assessor is in
the tails of the distribution. Varied opinions were voiced on the use of mixture distributions and the
appropriateness of truncating distributions. The use of scenario analysis was suggested as an alternative
to probabilistic analysis when a particular input cannot be assigned a probability distribution or when
estimating the tails of an important input distribution may be difficult.
Regarding GoF, the group fully agreed that visualization/graphic representation of both the data
and the fitted distribution is the most appropriate and useful approach for ascertaining adequacy of fit. In
6-7
image:
general, the group^ agreed that conventional GoF tests have significant shortcomings and should not be
the primary method for determining adequacy of fit.
6-8
image:
SECTION SEVEN
OBSERVER COMMENTS
This section presents observers' comments and questions during the workshop, as well as
responses from the experts participating in the workshop.
DAY ONE: Tuesday, April 21,1998
Comment 1
Helen Chernoff, TAMS Consultants
Helen. Chernoff said that, with the release of the new policy, users are interested in guidance on
how to apply-the information on data representativeness and other issues related to probabilistic risk
assessment. She had believed that the workshop would focus more on application, rather than just on the
background issues of probabilistic assessments. What methods could be used to adjust data and improve
data representativeness (e.g., the difference between past and current data usage)?
Response
The workshop chair noted that adjustment discussions during the second day of the workshop
start to explore available methods. One expert stated that, based on his impression, the workshop was
designed to gather input from experts in the field of risk assessment and probabilistic techniques. He
noted that EPA's policy on probabilistic analysis emerged only after the 1996 workshop on Monte Carlo
analysis. Similarly, EPA will use the information from this workshop to help build future guidance on
probabilistic techniques, but EPA will not release specific guidance immediately (there may be an
approximate two-year lag).
The chair noted that assessors may want to know when they can/should implement alternate
approaches. He pointed out that the representativeness issue is not specific to probabilistic assessment. It
applies to all assessments. Since EPA released its May 1997 policy on Monte Carlo analysis,
representativeness has been emphasized more, especially in exposure factor and distribution evaluations.
He noted, however, that data quality/representativeness is equally important when considering a point
estimate. However, it may not be as important if a point estimate is based on central tendency instead of
an upper percentile where there may be fewer data. Another agreed that the representativeness issue is
more important for probabilistic risk assessment than deterministic risk assessment (especially a point,
estimate based on central tendency).
Comment 2
Emran Dawoud, Human Health Risk Assessor, Oak Ridge National Laboratory
Mr. Dawoud commented that the representativeness question should reflect whether additional
data must be collected. He noted that the investment (cost/benefit) should be considered. From a risk
assessment point of view, one must know how more data will affect the type or cost of remedial activity.
In his opinion, if representativeness does not change the type or cost of remedial activity, further data
collection is unwarranted.
7-1
image:
Mr. Qawppd also commented that the risk model has three components: source, exposure, and
dgse-response. Has the sensitivity of exposure component been measured relative to the sensitivity of the
other two components? He noted the importance of the sensitivity of the source term, especially if fate
and transport are involved.
Mr. E)awoud briefly noted that, in practice, a Lognormal distribution is being fit with only a few
samples. Uncertainty of the source term in these cases is not quantified or incorporated into risk
predictions. Even if standard deviation is noted, the contribution to final risk prediction is not
considered. Mr. Dawoud noted that the workshop discussions on the distribution around exposure
parameters seem to be less important than variation around the source term. Likewise, he noted the
Uncertainties associated with the dose-response assessment as well (e.g., applying uncertainty factors of
10,, 100, etc.).
Response
, J, "'' U , ' T ' ,' l'1"1 " 'i' . ''
One participant noted that representativeness involves more than collecting more data.
Evaluating representativeness is often about choosing from several data sets. He agreed that additional
data are collected depending on how the collection efforts may affect the bottom line assessment answer.
He noted that if input does not affect output, then its distribution need not be described.
Relative tq Mr. Dawoud's second point, it was noted that source term evaluation is part of
exposure assessment. While exposure factors (e.g, soil ingestion and exposure duration) affect the risk
assessment, one expert emphasized that the most important driving "factor" is the source term. As for
dose-response, the industry is just beginning to explore how to quantify variability and uncertainty.
The workshop chair noted that methodologically, exposure and source terms are not markedly
different. The source term has representativeness issues. There are ways to distinguish between
variability and uncertainty in the variability estimate.
Lastly, more than one expert agreed that the prediction of risk for noncancer and cancer
endpoints (based on the reference dose [RfD] and cancer slope factor [CSF], respectively) is very
uncertain. The methods discussed during this workshop cannot be directly applied to RfDs and CSFs, but
they could be used on other toxicologic data. More research is needed in this area.
Comment 3
Ed Garvey, TAMS Consultants
Mr. Garvey questioned whether examining factors of 2 or 3 on the exposure side is worthwhile,
given the level of uncertainly on the source or dose-response term, which can be orders of magnitude.
Response
It was an EPA policy choice to examine distributions looking first at exposure parameters,
according to one EPA panelist. He also reiterated that the evaluation ofexposure includes the source
term (i.e., exposure = concentration x uptake/averaging time). One person noted that it was time to "step
up" on quantifying toxicity uncertainty. Exposure issues have been driven primarily by engineering
approaches (e.g., the Gaussian plume model), toxicity has historically been driven by toxicologists and
statisticians and are more data oriented.
7-2
image:
It was noted that, realistically, probabilistic risk assessments will be seen only when money is
available to support the extra effort. Otherwise, 95% UCL concentrations and point estimates will
continue to be used. Knowing that probabilistic techniques will enable better evaluations of variability
and uncertainty, risk assessors must be explicitly encouraged to perform probabilistic assessments. We
must accept that the existing approach to toxicity assessment, while lacking somewhat in scientific
integrity, is the only option at present.
Comment 4
Emran Dawoud, Human Health Risk Assessor, Oak Ridge National Laboratory
Mr. Dawoud asked whether uncertainty analysis should be performed to evaluate fate and
transport related estimates.
Response
One expert stressed that whenever direct measurements are not available, variability must be
assessed. He commented that EPA's Probabilistic Risk Assessment Work Group is preparing two
chapters for Risk Assessment Guidance for Superfund (RAGS): one on source term variability and
another on time-dependent considerations of the source term.
Comment 5
Zubair Saleem, Office of Solid Waste, U.S. EPA
Mr. Saleem stated that he would like to reinforce certain workshop discussions. He commented
that any guidance on probabilistic assessments should not be too rigid. Guidance should clearly state that
methodology is evolving and may be revised. Also, guidance users should be encouraged to collect
additional data.
Response
The workshop chair recognized Mr. Saleem's comment, but noted that the experts participating
in the workshop can only provide input and advice on methods, and is not in a position to recommend
specific guidelines to EPA.
DAY TWO: Wednesday, April 22,1998
Comment 1
Lawrence Myers, Research Triangle Institute
Mr. Myers offered a word of caution regarding GoF tests. He agrees that many options do not
work well but he stated that in an adversarial situation (e.g., a court room) he would rather be defending
data distributions based on a quantitative model instead of a graphical representation.
Mr. Myers noted that the problem with goodness of fit is the tightness of the null hypothesis (i.e.,
it specifies that the true model is exactly a member of the particular class being examined). Mr. Myers
cited Hodges and Layman (1950s) who generalized chi-square in a way that may be meaningful to the
issues discussed in this workshop. Specifically, because exact conformity is not expected, a more
7-3
image:
appropriate null hypothesis would be that the true distribution is "sufficiently close11 to the family being
examined.
Response
One expert reiterated that when a PDF is fitted, it is recognizably an approximation and therefore
rnakes application of standard GoF statistics difficult. Another expressed concern that practitioners could
1*0 oft a "fishing expedition," especially in an adversarial situation, to find a QoF test that gives the right
answer. He did not feel this is the message we want to be giving practitioners. A third expert noted a
definite trend in the scientific community away from GoF tests and; towards visualization,.
7-4
image:
SECTION EIGHT
REFERENCES
Buck, R.J., K.A. Hammerstrom, and P.B. Ryan, 1995. Estimating Long-Term Exposures from Short-
term Measurements. Journal of Exposure Analysis and Environmental Epidemiology, Vol. 5, No. 3, pp.
359-373.
Efron, B. and R.J. Tibshirani, 1993. An Introduction to the Bootstrap. Chapman and Hall. New York.
Frey, H.C. and D.E. Burmaster, "Methods for Characterizing Variability and Uncertainty: Comparison of
Bootstrap Simulation and Likelihood-Based Approaches," Risk Analysis (Accepted 1998).
RTI, 1998. Development of Statistical Distributions for Exposure Factors. Final Report. Prepared by
Research Triangle Institute. U.S. EPA Contract 68D40091, Work Assignment 97-12. March 18, 1998.
U.S. Environmental Protection Agency, 1996a. Office of Research and Development, National Center
for Environmental Assessment. Exposure Factors Handbook, SAB Review Draft (EPA/600/P-95/002Ba).
U.S. Environmental Protection Agency, 1996b. Summary Report for the Workshop on Monte Carlo
Analysis. EPA/630/R096/010. September 1996. . "
U.S. Environmental Protection Agency, 1997a. Guiding Principles for Monte Carlo Analysis.
EPA/63 O/R-97/001. March 1997.
U.S. Environmental Protection Agency, 1997b. Policy for Use of Probabilistic Analysis in Risk
Assessment at the U.S. Environmental Protection Agency. May 15, 1997.
Wallace, L.A., N. Duan, and R. Ziegenfus, 1994. Can Long-term Exposure Distributions Be Predicted
from Short-term Measurements? Risk Analysis, Vol. 14, No. 1, pp. 75-85.
8-1
image:
image:
APPENDIX A
ISSUE PAPERS
image:
r
image:
Issue Paper on Evaluating Representativeness
of Exposure Factors Data
This paper is based on the Technical Memorandum dated March 4, 1998, submitted by Research
Triangle Institute under U.S. EPA contract 68D40091.
1. INTRODUCTION
The purpose of this document is to discuss the concept of representativeness as it relates
to assessing human exposures to environmental contaminants and to factors that affect exposures
and that may be used in a risk assessment. (The factors, referred to as exposure factors, consist
of measures like tapwater intake rates, or the amount of time that people spend in a given
microenvironment.) This is an extremely broad topic, but the intent of this document is to
provide a useful starting point for discussing this extremely important concept.
Section 2 furnishes some general definitions and notions of representativeness. Section 3
indicates a general framework for making inferences. Components of representativeness are
presented in Section 4, along with some checklists of questions that can help in the evaluation of
representativeness in the context of exposures and exposure factors. Section 5 presents some
techniques that may be used to improve representativeness. Section 6 provides our summary and
conclusions.
2. GENERAL DEFINITIONS/NOTIONS OF REPRESENTATIVENESS
Representativeness is defined in American National Standard: Specifications and
Guidelines for Quality Systems for Environmental Data and Environmental Technology
Programs (ANSI/ASQC E4 - 1994) as follows:
The measure of the degree to which data accurately and precisely represent a
characteristic of a population, parameter variations at a sampling point, a process
condition, or an environmental condition.
Although Kendall and Buckland (A Dictionary of Statistical Terms, 1971) do not define
representativeness, they do indicate that the term "representative sample" involves some
confusion about whether this term refers to a sample "selected by some process which gives all
samples an equal chance of appearing to represent the population" or to a sample that is "typical
in respect of certain characteristics, however chosen." Kruskal and Mosteller (1979) point out
that representativeness does not have an unambiguous definition; in a series of three papers, they
present and discuss various notions of representativeness in the scientific, statistical, and other
literature, with the intent of clarifying the technical meaning of the term.
A-l
image:
In Chapter 1 of the Exposure Factors Handbook (EFH), the considerations for including
the particular source studies are enumerated and then these considerations are evaluated
qualitatively at the end of each chapter (i.e., for each type of exposure factor data). One of the
criteria is "representativeness of the population," although there are several other criteria that
clearly relate to various aspects of representativeness. For example, these related criteria include
the following:
'{iiii ''ii,.
EFH Study Selection Criterion
focus on factor of interest
. • , , • •• i; i •.,
data pertinent to U.S.
current information
adequacy of data collection period
validity of approach
representativeness of the population
variability in the population
•••?•
minimal (or defined) bias in study design
minimal (or defined) uncertainty in the data
EFH Perspective
studies with this specific focus are preferred
studies of U.S. residents are preferred
recent studies are preferred, especially if
changes over time are expected
generally the goal is to characterize long-term
behavior
direct measurements are preferred
U.S. national studies are preferred
studies with adequate characterizations of
variability are desirable
studies having designs with minimal bias are
preferred (or with known direction of bias)
large studies with high ratings on the above
considerations are preferred
3. A GENERAL FRAMEWORK FOR MAKING INFERENCES
Despite the lack of specificity of a definition of representativeness, it is clear in the
present context that representativeness relates to the "comfort" with which one can draw
inferences from some set(s) of extant data to the population of interest for which the assessment
is to be conducted., and in particular, to certain characteristics of that population's exposure or
exposure factor distribution, the following subsections provide some definitions of terms and
attempt to break down the overall inference into some meaningful steps.
3.1 Inferences from a Sample to a Population
In this paper, the word population to refers to a set of units which may be defined in
terms of person arid/or space and/or time characteristics. The population can thus be defined in
terms of its individuals' characteristics (defined by demographic and socioeconomic factors,
""; • ' A-2
image:
human behavior, and study design) (e.g., all persons aged 16 and over), the spatial characteristics
(e.g., living in Chicago) and/or the temporal characteristics (e.g., during 1997).
In conducting a risk assessment, the assessor needs to define the population of concern —
that is, the set of units for which risks are to be assessed (e.g., lifetime risks of all U.S. residents).
At a Superfund site, this population of concern is generally the population surrounding the site.
In this document, the term population of concern refers to that population for which the assessor
wishes to draw inferences. If it were practical, this is the population for which a census (a 100%
. sample) would exist or for which the assessor would conduct a probability-based study of
exposures. Figure 1 provides a diagram of the exposure assessor decision process during the
selection of data for an exposure assessment.
As depicted in figure 1, quite often it is not practical or feasible to obtain data on the
population of concern and the assessor has to rely on the use of surrogate data. These data
generally come from studies conducted by researchers for a variety of purposes. Therefore, the
assessor's population of concern may differ from the surrogate population. Note that the
population differences may be in any one (or more) of the characteristics described earlier. For
example, the surrogate population may only cover a subset of the individuals in the assessor's
population of concern (Chicago residents rather than U.S. residents). Similarly, the surrogate
data may have been collected during a short period of time (e.g., days), while the assessor may be
concern about chronic exposures (i.e., temporal characteristics).
The studies used to derive these surrogate data are generally designed with a population
in mind. Since it may not be practical to sample everyone in that population, probability-based
sampling are often conducted. This sampling scheme allows valid statistical (i.e., non-model-
based) inferences, assuming there were no implementation difficulties (e.g., no nonresponse and
valid measurements). Ideally, the implementation difficulties would not be severe (and hence
ignored), so that these sampled individuals can be considered representative of the population. If
there are implementation difficulties, adjustments are typically made (e.g., for nonresponse) to
compensate for the population differences. Such remedies for overcoming inferential gaps are
fairly well documented in the literature in the context of probability-based survey sampling (e.g.,
see Oh and Scheuren (1983)). If probability sampling is not employed, the relationships of the
selected individuals for which data are sought and of the respondents for which data are actually
acquired to the population for which the study was designed to address are unclear.
There are cases where probability-based sampling is used and the study design allows
some model-based inferences. For instance, food consumption data are often obtained using
surveys which ask respondents to recall food eaten over a period of few days. These data are
usually collected throughout a one-year period to account for some seasonal variation in food
consumption. Statistical inferences can then be made for the individuals surveyed within the time
frame of study. For example, one can estimate the mean, the 90th percentile, etc. for the number
A-3
image:
figure 1: Risk Assessment Data Collection
Process
Are the surrogate data
representative and adequate?
(Use checklists II, III, IV)
No
I
-No-
Can adjustments be made to
extrapolate to site/chemical of
interest?
-Yes-
Are the data
representative anc
adequate?
(Use checklist I)
Risk
Assessment
Data Needs
Are there
site/chemical
specific data?
-Yes-
A-4
image:
of days during which individuals were surveyed. However, if at least some of the selected
individuals are surveyed multiple periods of time during that year, then a model-based strategy
might allow estimation of a distribution of long-term (e.g., annual) consumption patterns.
If probability-based sampling is not used, model-based rather than statistical inferences
are needed to extend the sample results to the population for which the study was designed.
In contrast to the inferences described above, which emanate from population differences
and the sampling designs used in the study, there are two additional inferential aspects that relate
to representativeness:
• The degree to which the study design is followed during its implementation
• The degree to which a measured value represents the true value for the measured unit
Both of these are components of measurement error. The first relates to an implementation error
in which the unit selected for measurement is not precisely the one for which the measurement
actually is made. For instance, the study's sampling design may call for people to record data for
24-hr periods starting at a given time of day, but there may be some departure from this ideal in
the actual implementation. The second has to do with the inaccuracy in the measurement itself,
such as recall difficulties for activities or imprecision in a personal air monitoring device.
4. COMPONENTS OF REPRESENTATIVENESS
As described above, the evaluation of how representative a data set is begins with a clear
definition of the population of concern (the population of interest for the given assessment), with
attention to all three fundamental characteristics of the population — individual, spacial, and
temporal characteristics. Potential inferential gaps between the data set and the population of
concern -- that is, potential sources of unrepresentativeness -- can then be partitioned both along
these population characteristics. Components of representativeness are illustrated in Tablel: the
rows correspond to the inferential steps and the columns correspond to the population
characteristics. The inferential steps are distinguished as being either internal or external to the
source study:
4.1 Internal Components - Surrogate Data Versus the Study Population
After determining that a study provides information on the exposures or exposure factors
of interest, it is important that the exposure assessor evaluate the representativeness of the
surrogate study (or studies). This entails gaining an understanding of both the individuals
sampled for the study and the degree to which the study achieved valid inferences to that
population. The assessor should consider the questions in Checklist I in the appendix to help
establish the degree of representativeness inherent to this internal component. In the context of
the Exposure Factors Handbook (EFH), the representativeness issues listed in this checklist are
presumably the types of considerations that led to selection of the source studies that appear in
A-5
image:
Table 1* Elements of Representativeness
Component of Inference
Population Characteristics
Individual Characteristics
Spacial Characteristics
Temporal Characteristics
EXTERNAL TO STUDY
How well does the
surrogate population
represent the population
of concern?
• Exclusion or limited
coverage of certain
segments of population of
concern
* Exclusion or inadequate
coverage of certain regions
or types of areas (e.g., rural
areas) that make up the
population of concern
• Lack of currency
• Limited temporal
coverage, including
exclusion or inadequate
coverage of seasons
• Inappropriate duration for
observations (e.g., short-
term measurements where
concern is on chronic
exposures)
INTERNAL TO STUDY
How well do the
individuals sampled
represent the population
of concern for the study?
How well do the actual
number of respondents
represent the sampled
population?
How well does the
measured value represent
the true value for the
measured unit?
• Imposed constraints that
exclude certain segments
of study population
• Frame inadequacy (e.g.,
due to lack of current
frame information)
• Non-probability sample of
persons
• Excessive nonresponse
• Inadequate sample size
• Behavior changes
resulting from
participation in study
(Hawthorne effect)
• Measurement errors
associated with people's
ability/desire to respond
accurately to
questionnaire items
• Measurement error
associated with within-
specimen heterogeneity
• Inability to acquire
physical specimen with
exact size or shape or
volume desired
• Inadequate coverage (e.g.,
limited to single urban
area)
• Non-probability sample of
spatial units (e.g.,
convenience or judgmental
siting of ambient monitors)
• Inaccurate identification of
sampled location
• Limited temporal
coverage (e.g., limited
study duration)
• Inappropriate duration for
observations
« Non-probability sample of
observation times
• Deviation in times
selected vs. those
measured or reported
(e.g., due to schedule
slippage, or incomplete
response)
• Measurement errors
related to time (e.g., recall
difficulties for foods
consumed or times in
microenvironments)
A-6
image:
the EFH. As indicated previously, the focus for addressing representativeness in that context was
national and long-term, which may or may not be consistent with the assessment of current
interest.
4.2 External Components - Population of Concern Versus Surrogate Population
In many cases, the assessor will be faced with a situation in which the population of
concern and surrogate population do not coincide in one or more aspects. To address this
external factor of representativeness, the. assessor needs to:
• determine the relationship between the two populations
• judge the importance of any discrepancies between the two populations
• assess whether adjustments can be made to reconcile or reduce differences.
To address these, the assessor needs to consider all characteristics of the populations. Relevant
questions to consider are listed in Checklists II, III, and IV in the appendix for the individual,
spacial, and temporal characteristics, respectively.
Each checklist contains several questions related to each of the above bullets. For
example, the first few items of each checklist relate to the first item above (relationship of the
two populations). There are several possible ways in which the two populations may relate to
each other; these cases are listed below and can be addressed for each population dimension:
• Case 1: The population of concern and surrogate population are (essentially) the same
• Case 2: The population of concern is a subset of the surrogate population
Case 2a: The subset is a large and identifiable subset.
Case 2b: The subset is a small and/or unidentifiable subset.
• Case 3: The surrogate population is a subset of the population of concern.
• Case 4: The population of concern and surrogate population are disjoint.
Note that Case 2a implies that adequate data are available from the surrogate study to generate
separate summary statistics (e.g., means, percentiles) for the population of concern. For
example, if the population of concern was focused on children and the surrogate population was
a census or large probability study of all U.S. residents, then children-specific summaries would
be possible. In such a situation, Case 2a reverts to back to Case 1.
Case 2b will be typical of situations in which large-scale (e.g., national or regional) data
are available but assessments are needed for local areas (or for acute exposures). As an example,
suppose raw data from the National Food Consumption Survey (NFCS) can be used to form
meaningful demographic subgroups and to estimate average tapwater consumption for such
subgroups (e.g., see Section 5.1). If a risk assessment involving exposure from copper smelters
is to be conducted for the southwestern U.S., for instance, tapwater consumption would probably
be considered to be different for that area than for the U.S. as a whole, but the NFCS data for that
A-7
image:
area might be adequate. If so, this Would be considered Case 2a. But if the risk assessment
concerned workers at copper smelters* then an even greater discrepancy between the population
of concern and the surrogate data might be expected^ and the MFCS' data would likely be
regarded as inadequate, and more speculative estimates would be needed.
In contrast to Case 2, Case 3 will be typical of assessments that must use local and/or
short-term data to extrapolate to regional or national scales and/or to long-term (chronic)
exposures. Table 2 presents some hypothetical examples for each case. Note that, as illustrated
here and as implied by the bulleted items in Checklist IV, the temporal characteristics has two
series of issues: one that relates td the currency and the temporal coverage (study duration) of the
source study relative to the population of concern time frame, and one that relates to the time unit
of observation associated with me study.
• vijih i • • " '•• •• :-.j: '• ' ' •• : • ,rT •"$;•' ' •.• ',: ' . ' ' ' ••
Since most published references to the NFC'S rely on the 1977-7£ survey, exposure factor
data based on that survey might well be considered as Case 4 with respect to temporal coverage,
as trends such as consumption of bottled water and organic foods may not be well represented by
20 year-old data. A possible approach hi this situation would be to obtain data from several
NFCSs, to compare or test for a difference between them, and to use them to extrapolate to the
present or future, the NFCS also illustrates the other temporal aspect — dealing with a time-
unit mismatch of the data and the population of concern — since the survey involves three
consecutive days for each person, while typically a longer-term estimate would be desired3 e.g., a
person-year estimate (e.g., see Section 5,2).
While determining the relationship of the two populations will generally be
Straightforward (first bullet), determining the importance of discrepancies and making
adjustments (the second and third bullets) may be highly subjective and require an understanding
of what factors contribute to heterogeneity in exposure factor values and speculation as to their
influence on the exposure factor distribution. Cases 1 and 2a are the easiest, of course. In the
other cases, it will generally be easier to speculate about how the mean and variability (perhaps
expressed as a coefficient of variation (CV)) of the two populations may differ than to speculate
on changes in a given percentile. Considerations of unaffected portions of the population must
also be factored into the risk assessor's speculation. The difficulty in such speculation obviously
increases dramatically when two or more factors affect heterogeneity, especially if the factors are
anticipated to have opposite or dependent effects on the exposure factor values. Regardless of
how such speculation is ultimately reflected hi the assessment (either through ignoring the
population differences or by adjusting estimated parameters of the study population), recognition
of the increased uncertainty should be incorporated into sensitivity analyses. As a part of such an
analysis, it would be instructive to determine risks, when, for each relevant factor (e.g., age
category), several assessors independently speculate on the mean (e.g., a low., best guess, and
high) and on the CV.
A-8
image:
^ .
0 T3
d c C c
.O nj u ^o -g
•.• Sj C os "S71
a> £3 o O 3 'Q
a & g S §• s
CJ PL, 0 £ OH C3
.£} <L> O
£ g <£ o
co &D Ki -(_» cC C
u g 3 1 "3 8
U CO OH W O< 8
0
13 4^
0 OS « « ^
a ' „ « -§ -2
. . .2 — «g a> .0 g .«
-a
O _j 0) ^ w O
.g
f. O
0 T-J S
CH C! C)
O Cd (D o
'-' ^ g OB ^3
(U g o g 3 <u
« O1 g § §" |
o £ 8 a S, S
a
o
1
OH
O
PLH
S
pj _«
•2 J3
"3 S
p? O
a
S
2
^3
o
00
s
T3
§
CO
H)
Asthmatic U.S.
children
1
CO
p
i
•a
CO
•-1
OO
P
O
Population
concern:
en
"3
T3
ca
oo
+2
•s
CO
U.S. residents
+
1
12
»- rt
. "O
CO u
s
CO
"-1 -
00
p
ll
bfl S
p "3
fa 1
CO OH
CO
O
§*•"
"? 2
la
oo'
D
00
Near hazardous
waste sites
Northeast U.S
CO
o
Population
concern:
CO
•o
§
1
%
1
a
CO
p
R
g
CO 4^
p"§
00
p
Surrogate
population:
0
CO
o
.2 a
oo O
1
>,
I
(D
1
CO
t
oo W)
C> 0
Os g
i— I c/3
summer, 1998
oo
Os
Os
c3
o>
I
o
Population
concern:
Os
Os
i
rt ^
CO JQ
O g
one year, 1998
+
|a
1 1
oo
Os
u<
C3
1
Surrogate
population:
1
CO
O
to
"2 & E?
a a s
H O O
g
S £
J g
^^
13 e
eating occasions
(acute)
V.
eating occasioi
CO
Co
T3
g
.£2
(D
OH
-«(-,
O
Population
concern:
CO
is*
T3
O
1
.5
ts
T3
1
1
a
o
£2
V
OH
+ (j
i 1
O -1(
tO KJ KJ
<D ,—1 trj
OH S *O
CO
ca
T3
g
OH
€l
bO cd
O 'p
S &
00 OH
a
a
13 '•§
ag
H O
A-9
image:
5. "ATTEMPTING to IMPROVE
5.1 Adjustments to Account for Differences in Population Characteristics or Coverage.
If there is some overlap in information available for the population of concern and the
surrogate population (e.g., age distributions), then adjustments to the sample data can be made
that attempt to reduce the bias that would result from directly applying the study results to the
population of concern. Such methods of adjustment can all be generally characterized as "direct
standardization'5 techniques, but the specific methodology to use depends on whether one has
access to the raw data or only to summary statistics, as is often the case when using data from the
Exposure Factors Handbook. With access to the raw data, the applicable techniques also depend
Oil Whether one wants to standardize to a single known population ofeoncern distribution (e.g.,
age categories), to two or more marginal distributions known for the population ofeoncern, or
eyen to population.'ofconcern totals for continuous variables.
Summary ^Statistics Available. Suppose that the available data are summary statistics
such as the mean, standard deviation, and various percehtiles for an exposure factor of interest
(eg., daily consumption of tap water). Furthermore,suppose that these statistics are available for
subgroups based on age, say age groups g = 1,2,..., G. Furthermore, suppose we know that the
age distribution of the population ofeoncern differs from that represented by the sample data.
We can then estimate linear characteristics of the population ofeoncern, such as the mean or the
proportion exceeding a fixed threshold, using a simple weighted average. For example, the mean
of the population ofeoncern can be estimated as
- •••• '
XATP ~ \^pXg>
where £g represents summation over the population of concern groups indexed by g, Pg is the
proportion of the population ofeoncern that belongs to group g, arid xgis the sample mean for
group g.
Unfortunately, if one is interested in estimating a non-linear statistic for the population of
concern, such as the variance or a percentile, this technique is not algebraically correct.
However, lacking any other information from the sample, calculating this type of weighted
average to estimate a non-linear population ofeoncern characteristic is better than making no
adjustment at all for known population differences. In the case of the population variance, we
recommend calculating the weighted average of the group standard deviations, rather than their
variances, and then squaring the estimated population ofeoncern standard deviation to get the
estimated population ofeoncern variance.
Raw Data Available. If one has access to the raw data, not just summary statistics,
options for standardization are more numerous and can be made more rigorously. The options
depend, in part, on whether or not the data already have statistical analysis weights, such as those
appropriate for analysis of data from a probability-based sample survey.
A46
"ii;..
image:
Suppose that one has access to the raw data from a census or from a sample in which all
units can be regarded as having been selected with equal probabilities (e.g., a simple random
sample). In this case, if one knows the number, N^ of population of concern members in group
g, then the statistical analysis weight to associate with the z'-th member of the g-th group is
N
W (i) = -1,
where the sample contains ng members of group g. Alternatively, if one knows only the
proportion of the population and sample that belong to each group, one can calculate the weights
as
W (i) = -S-,
where pg is the proportion of the sample in group g. The latter weights differ from those above
only by a constant, the reciprocal of the sampling fraction, and will produce equivalent results for
means and proportions. However, the former weights must be used to estimate population totals.
In either case, the population of concern mean can be estimated as
x
'ATP
where xg(i) is the value of the characteristic of interest (e.g., daily tap water consumption) for the
z'-th sample member in group g.
In general, one may have access to weighted survey data, such as results from a
probability-based sample of the surrogate population. In this case, the survey analysis weight,
w(i), for the /-th sample member is the reciprocal of that person's probability of selection with
appropriate adjustments to reduce nonreponse bias and other potential sources of bias with
respect to the surrogate population. Further adjustments for making inferences to the population
of concern are considered below. These results can also be applied to the case of equally
weighted survey data, considered above, by considering the survey analysis weight, w(i), to be
unity (1.00) for each sample member.
If one knows the distribution of the population of concern with respect to a given
characteristic (e.g., the age/race/gender distribution), then one can use the statistical technique of
poststratification to adjust the survey data to provide estimates adjusted to that same population
distribution (see, e.g., Holt and Smith, 1979).1 In this case, the weight adjustment factor for each
member of poststratum g is calculated as
Sampling variances are computed differently for standardized and poststratified estimates, but these
details are suppressed in the present discussion (see, e.g., Shah et al, 1993).
A-ll
image:
^heif fhe summation is oyer all sample members belonging tp poststratum g. The poststratified
analysis weight for the z-th sample member belonging to poststratum g is then calculated as
tt'/O =
C^fng this weight, instead of the surrogate population weight, wft), standardizes the survey
estimates to, the population of concern.
If one knows multiple marginal distributions for the population of concern but not their
joint distribution (e.g., marginal age, race, and gender distributions), one can apply a statistical
Height adjustment procedure known as raking, or iterative proportional fitting, to standardize the
Survey weights (see, e.g., Oh and Scheuren, 1983). Raking is an iterative procedure for scaling
the survey weights to known marginal totals.
' ' '' "j ' ™" „ '•,,' ' •: '••,,' ..',•' '
If One knows population of concern subgpup totals for continuous variables, a
generalized raking procedure can be used to standardize the survey weights to known
distributions of categorical variables as well as known totals fpr continuous variables. The
generalized raking procedures utilize non-linear, exponential modeling (see, e.g., Fplspm, J.991
*
Of course, none of these standardjzatipn procedures results in inferences tp the population
of concern that are as defensible as those from a well-designed sample survey selected from a
Sampling frame that completely and adequately covers the population of concern.
S.2 Adjustments to Account for Time-Unit J)ifferenees.
A common way in which the surrogate population and population of concern may differ
is in the time unit of (desired) observation. Probably the most common situation occurs when the
study data represent short-term measurements but where chronic exposures are of interest. In
this case, some type of model is needed to make the time-unit inference ,(e-g,, from the
distribution of person-day or person-week exposures to the distribution of annual or lifetime
exposures). In general, it is convenient to break down the overall inference into two components:
from the time unit of measurement to the time duration of the study (data tp the surrogate
population), and from the time duration of the surrogate population to the tirne unit of the
population of concern. For specificity, let t denote the observation time (e.g., a day or a week);
b" T denote the duration of the study (i.e., t is the time duration associated with the surrogate
. 'I II" | il[| '" '"I1!" .l"i[|| '!!!' ' lip [ ;;l " '• »• "» • »" *til|» "I1 '•; f ,•? f1 7 ", ' ' " , „ «.',,'" ,, , •
population); and let T denote the time unit of the population of concern (e.g., a lifetime). In the
case of phronip exposure cpncerns, t<T<T.
Suppose that N denotes the number of persons in the surrogate population, and assume
there are (conceptually) 1C disjoint time intervals of length t that surrogate population T (i.e.,
442
image:
Kt=T). Thus a census of the surrogate population would involve NK short-term measurements
(of exposures or of exposure factors). This can be viewed as a two-way array with N rows
(persons) and K columns (time periods). Clearly, the distribution of these NK measurements,
whose mean is the grand total over the NK cells divided by NK, encompasses both variability
among people and variability among time periods within people (and in practice, measurement
error also). The average across the columns for a given row (the marginal mean) is the average
exposure for the given person over a period of length T. Since the mean of these T-period
"measurements" over the N rows leads to the same mean as before, it is clear that the mean of the
t-time measurements and the mean of the t-time measurements is the same. However, unless
there is no within-person variability, the variability of the longer T-period measurements will be
smaller than the variability of the shorter t-period measurements. If the distribution of the shorter
term measurements is right-skewed, as is common, then one would expect the longer term
distribution to exhibit less skewness. Note that the degree to which the variability shrinks
depends on the relation between the within-person and between-person components of variance,
which is related to the temporal correlation. For example, if there is little within-person
variability, then people with high (low) values will remain high (low) over time, implying that
the autocorrelation is high and that the shrinkage in variability in going from days to years (say)
will be minimal. If there is substantial within-person variation, then the autocorrelations will be
low and substantial shrinkage in the within-person variance (on the order of a t/T decrease) will
occur.
To make this t-to-u portion of the inference, we therefore would ideally have a valid
probability-based sample of the NK person-periods, and data on the t-period exposures or
exposure factors would be available for each of these sampling units. As a part of this study
design, we would also want to ensure that at least some of .persons have measurements for more
than one time period, since models that allow the time extrapolation will need data that, in
essence, will support the estimation of within-person components of variability. There are
several examples of models of this sort, some of which are described below.
Wallace et al. (1994) describe a model, which we refer to as the Duan- Wallace (DW)
model, in which data over periods of length t, 2t, 3t, etc. (i.e., over any averaging period of length
mt) are all conceptually regarded to be approximated by lognormal distributions, with parameters
that depend on a "lifetime" variance component and a short term variance component. While
such an assumption is theoretically inconsistent if exact lognormality is required, it may
nevertheless serve well as an approximation. The basic notion of the DW method is that, while
the mean of the exposures stays constant, the variability decreases as the number of periods
averaged together increases. Hence it is assumed that the total variability for a distribution that
averages over M periods (M=l,2,...) can be expressed in terms of a long-term component and a
short-term component. Let yL and Ys denote, respectively, the log-scale variances for these two
components. Under the lognormal model, Wallace et al. show that the log-scale variance for the
M-period distribution (i.e., the distribution that averages over M periods) is given by
VM =
, r,
log[l
A-13
image:
Note that an implication of the DW model is that the geometric means for the various
distributions will increase as M increases. In fact, the geometric mean (gm) associated with the
average of M short-term measurements will be
gm(M) = Fexp[-^/2]
where f is the overall population mean of the exposures. As a consequence, if data are adequate
fpr estimating the' variance components (and the mean of the exposures), then an estimate4
distrfbjitipn for anv averaging time can be inferred. In particular, the DW method can be applied
if ||a|g."are 'available for estimating VM for (atieast) two values of M, since one is then able to
d.ejermirjie, values of the two variance components. For instance, if two .observations per person
^ iaYi|iiafeig?rip'rie''carf estimate population mean and the population log-scale variance (V,) for
Jiflgi?*m^urements''(3VJ=i)', and by averaging the two short-term measurements and then taking
logs, one can estimate the population log-scale variance, Y2. (Sampling weights should be used
wf en applicable.). By substituting into the above VT equation for T=l and T=2, the following
fojjnujas fpr estimating the variance components can be determined:
The distribution for any averaging time can then be estimated by choosing the appropriate M
^^--^• — —^ j^eqsurement time is one day) and substituting estimates into the VM equation
above. Similarly, a "lifetime" distribution (also assumed to be lognormal) is then estimated by
lefjing M go to infinity (i.e., the influence of the short term component vanishes). Wallace et
a/,:(i$94) caution that the data collection period should encompass all major long-term trends
spch as seasonality.
Clayton $t al (1998) describe a study of personal exposures to airborne contaminants that
employs a more sophisticated study design and model (that requires more data); the goal was to
estimate (Jjsjributipns of annual exposures from 3-4ay exposure measurements collected
thjffpghQut a 12-month period! Two measurements per person (in different months) were
gygilable for some of the study participants. A multivariate lognormal distribution was assumed;
t|ie Ipgnormai parameters for each month's data were estimated, along with the correlations for
egcJi rjia.nthly lag (assumed to depend only on the length of the lag). Simulated data were
generated from tins multivariate distribution for a large number of "people;" each "person's"
exposures Were then averaged over the 12 months. This approach assumes that the an average
over 12 observations, one per month, produces an adequate approximation to the annual
(Jistribution of exposures, the model results were compared to those obtained via a modification
6f the DW modef?
444
image:
Buck et al. (1995, 1997) describe some general models (e.g., lognormally is not
assumed); these, too, require multiple observations per person, and if the within-person variance
is presumed to vary by person, then a fairly large number of observations per person may be
needed. These papers give some insight into how estimated distributional parameters based on
the short-term data relate to the long-term parameters. Reports by Carriquiry et al. (1995, 1996),
Carriquiry (1996), and a paper by Nusser et al. (1996) deal with the some of the same issues in
the context of estimating distributions of "usual" food intake and nutrition from short-term
dietary data.
The second part of the inference — extrapolation from study time period (of duration T)
to the longer time T — is likely to be much less defensible than the first part, if T and T are very
different. This part of the inference is really an issue of temporal coverage. If the study involves
person-day measurements conducted over a two-month period in the summer, and annual or
lifetime inferences are desired, then little can be said regarding the relative variability or mean
levels of the short-term and T-term data, basically because of uncertainty regarding the
stationarity of the exposure factor over seasons and years. The above-described approach of
Wallace et al, for instance, includes statements that recognize the need for a population
stationarity assumption that essentially requires that the processes underlying the exposure factor
data that occur outside the time period of the surrogate population be like those that occur within
the surrogate population. Applying some of the above methods on an age-cohort-specific basis,
and then combining the results over cohorts, offers one possible way of improving the inference
(e.g., see Hartwell et al, 1992).
6. SUMMARY AND CONCLUSIONS
Representativeness is concerned with the degree to which "good" inferences can made
from a set of exposure factor data to the population of concern. Thus evaluating
representativeness of exposure factor data involves achieving an understanding of the source
study, making an appraisal of the appropriateness of its internal inferences, assessing how and
how much the surrogate population and population of concern differ, and evaluating the
importance of the differences. Clearly, this can be an extremely difficult and subjective task. It
is, however, very important, and sensitivity analyses should be included in the risk assessment
that reflect the uncertainties of the process.
In an attempt to ensure that all aspects of representativeness are considered by analysts,
we have partitioned the overall inferential process into components, some of which are
concerned with design and measurement features of the source study that affect the internal
inferences, and some of which are concerned with the differences between the surrogate
population and the population of concern, which affect the external portion of the inference. We
also partition the inferential process along the lines of the population characteristics —
individual, spacial, and temporal — in an attempt to assess where overlaps and gaps exist
between the data and the population of concern. In the individual and spatial characteristics,
representativeness involves consideration of bounds and coverage issues. In the temporal
A-15
image:
characteristic, these same issues (i.e., study duration and currency) are important, but the time
uMt associated with the measurements or observations is also important, since time unit
differences often occur between the data and the population of concern. Checklists are provided
to aid in assessing the various components of representativeness.
When some aspect of representativeness is lacking in the available data, assessors are
faced with the task of trying to make the data "more representative.'5 We describe several
techniques (and cite some others) for accomplishing these types of tasks; generally making such
adjustments for known differences will reduce bias. However, it should be emphasized that these
adjustment techniques cannot guarantee representativeness in the resultant statistics. For
supporting future, large-scale (e.g., regional or national) risk assessments, one of the best avenues
fqr improving the exposure factors data would be to get assessors involved in the design process
— So that appropriate modifications to the survey designs of future source studies can be
considered. For example, the design might be altered to provide better coverage of certain
segments of the population that may be the focus of risk assessments (eig., more data on children
could be sought). The use of multiple observations per person also could lead to improvement in
those assessments concerned with chronic exposures.
7. BIBLIOGRAPHY
American Society for Quality Control (1994). American National Standard: Specifications and
Guidelines for Quality Systems for Environmental Data and Environmental Technology
Programs (ANSl/ASQC E4). Milwaukee, WI.
Barton, M., A. Clayton, K. Johnson, R. Whitmore (1996). "G-5 Representativeness." Research
Triangle Institute Report (Project 91U-6342-1 16), prepared for U.S. EPA under Contract No.
68D40091.
Buck, R.J., K.A. Hammerstrom, and P.B. Ryan (1995). "Estimating Long-Term Exposures from
Short-Term Measuremetns." Journal of Exposure Analysis and Environmental Epidemiology,
'
;, ' M ' „ „ '' Mi' ' ' ' '''!' • '!' ' ,
Burmaster, D.E. and A.M. Wilson (1996). "An introduction to Second-Order Random Variables
in Human Health Risk Assessments." Human and Ecological Risk Assessment, Vol. 2, No. 4, pp.
892-919.
I! , : "" • ' . : ' . . . . •
Carriquiry, A.L (1 996). ''Assessing the Adequacy of Diets: A Brief Commentary" (Report
prepared under Cooperative Agreement No. 58-3198-2-006, Agricultural Research Service,
USD A, and Iowa State University).
Carriquiry, A.L., J.J. Goyeneche, and W.A. Fuller (1996). "Estimation of Bivariate Usual Intake
Distributions" (Report prepared under Cooperative Agreement No. 58-3198-2-006, Agricultural
Research Service, USD A, and Iowa State University).
A-16
image:
Carriquiry, A.L., W.A. Fuller, J.J. Goyeneche, and H.H. Jensen (1995). "Estimated Correlations
Among Days for the Combined 1989-91 CSFII" (Dietary Assessment Research Series Report 4
under Cooperative Agreement No. 58-3198-2-006, Agricultural Research Service, USD A, and
Iowa State University).
Clayton, C.A., E.D. Pellizzari, C.E. Rodes, R.E. Mason, and L.L. Piper (1998). "Estimating
Distributions of Long-Term Particulate Matter and Manganese Exposures for Residents of
Toronto, Canada." Submitted to Atmospheric Environment.
Cohen, J.T., M.A. Lampson, and T.S. Bowers (1996). "The Use of Two-Stage Monte Carlo
Simulation Techniques to Characterize Variability and Uncertainty in Risk Analysis." Human
and Ecological Risk Assessment, Vol. 2, No. 4, pp. 939-971.
Corder, L.S., L. LaVange, M.A. Woodbury, and K.G. Manton (1990). "Longitudinal Weighting
and Analysis Issues for Nationally Representative Data Sets." Proceedings of the American
Statistical Association, Section on Survey Research, pp. 468-473.
Deville, J., Sarndal, C., and Sautory, O. (1993). "Generalized Raking Procedures in Survey
Sampling." Journal of the American Statistical Association, Vol. 88, No. 423, pp. 1013-1020).
Person, Scott (1996). "What Monte Carlo Methods Cannot Do." Human and Ecological Risk
Assessment, Vol. 2, No. 4, pp. 990-1007.
Folsom, R.E. (1991). "Exponential and Logistic Weight Adjustments for Sampling and
Nonresurrogate populationonse Error Reduction." Proceedings of the Social Statistics Section of
the American Statistical Association, 191-2Q2.
Francis, Marcie and Paul Feder, Battelle Memorial Institute (1997). "Development of Long-Term
and Short-Term Inhalation Rate Distributions." Prepared for Research Triangle Institute.
Hartwell, T.D., C.A. Clayton, and R.W. Whitmore (1992). "Field Studies of Human Exposure to
Environmental Contaminants." Proceedings of the American Statistical Association, Section on
Statistics and the Environment, pp. 20-29.
Holt, D. and Smith, T.M.F. (1979). "Post Stratification." Journal of the Royal Statistical Society,
Vol. 142, Part 1, pp. 33-46.
Kendall, M.G. and W.R. Buckland (1971). A Dictionary of Statistical Terms. Published for the
International Statistical Institute, Third Edition, New York: Hafner Publishing Company Inc p
129.
Kruskal, W. and F. Mosteller (1979). "Representative Sampling, I: Non-Scientific Literature."
International Statistical Review, Vol. 47, pp. 13-24.
A-17
image:
kal, . W. and I?.' Mosteller (1 979). 'TlepresentativeSSampling,1 II:'>Sciejitific Literature,
Excluding Statistics." International Statistical Review, Vpl.,47^ pp. i 111 - 127.
kal, . W, aij,d, F, Hosteller (1 979). "RepreseAtatiYe'Sainplijag, III: The Current Statistical
Literature." International Statistical Review, Vol. ,47,, pp. 245-265.
, SJvU AiUQarrj^uiry,; K,W.,Do.4d;,aiid,W.A.:Euller (1996). '/A,:SeOTiparametric
to. Estimating. Usual; D.aily. Jjgit^e^istribuKoris.'li/o^wa/ of the
\ OJj% H^L, arxd Scheureji^FJ, (1983). ".Weightia^rAdjustmejit for- Unit NoMefpanse."«In:
^
> B.,: R.,FolsjQmJ L.; LaYange^S; Wheeless,^
an.dMathewQfi^
' ..... ' ' ' ' .....
aniji,, L«E.." JyIyers,.,aad:M. J. Messner; (198^).,-^Evaluating.;and Presentingf''Quality
;6,SSPranQe Sampling Data. In.Keith,: LjH-i (EdO.^rinciplesofi Environmental-Sampling. American
iek,,III,,EJ.; (1 996). lEstimati^
Assessment." Human, and Ecological j4we^we»^'-
i Predictedifrpm^Short-TermMeasurements?."/:/?^ Andlysis,'-Vol. Ui4, No. : 1 ,, pp.
.75-85.
image:
CHECKLIST I. ASSESSING INTERNAL REPRESENTATIVENESS: POPULATION SAMPLED VS.
POPULATION OF CONCERN FOR THE SURROGATE STUDY
• What is the study population?
• What are the individual characteristics (i.e., defined by demographic, socioeconomic
factors, human behavior and other study design factors)?
• What are the spatial characteristics?
• What are the temporal characteristics?
• _ What are units of observation (e.g., person-days or person-weeks)?
• What, if any, are the population subgroups for which inferences were especially
desired?
• Are valid statistical inferences to the study population possible?
• Was the whole population sampled (i.e., a census was conducted) used?
• If not was the sample design appropriate and adequate?
• Was a probability sample used? If not, how reasonable does the method of
sample selection appear to be?
• Was the response rate satisfactory?
• Was the sample size adequate for estimating central tendency measures?
• Was the sample size adequate for estimating other types of parameters (e.g.,
upper percentiles)?
• For what population or subpopulation size was the sample size adequate for
estimating measures of central tendency?
• For what population or subpopulation size was the sample size adequate for
estimating other types of parameters (e.g., upper percentiles)?
• What biases are known or suspected as a result of the design or
implementation or the study? What is the direction of the bias?
• Does the study appear to have and use a valid measurement protocol?
• What is the likelihood of Hawthorne effects? What impact might this have on
bias or variability?
• What are other sources of measurement errors (e.g., recall difficulties)? What
impact might they have on bias or variability?
• Does the study design allow (model-based) inferences to other time units?
• What model is most appropriate?
• What assumptions are inherent to the model?
A-19
image:
CHECKLIST II. ASSESSING EXTERNAL REPRESENTATIVENESS: SURROGATE POPULATION
VS. EXPOSURE ASSESSOR'S POPULATION OF CONCERN - INDIVIDUAL CHARACTERISTICS
How does the population of concern relate to surrogate study population in terms, of the individuals'
characteristics?
« Case 1: Are the individuals in the two populations essentially the same?
• Case 2: Are the individuals in the population of concern a subset ©f those in the study
population? If so, is there adequate information available to allow for the analysis of
the population of concern? (Note: If so IJCase 2a], we can redefine the, surrogate data
to include only persons in the population: of concern and- then treat this, case as Case
l->
• Case 3':. Are the individuals in: the surrogate study population- a subset of those in- the
population of concern?
* Case 4: Are two populations disjoint — in terms of individual characteristics?'
k How important is the difference in, the two populations (population of concern: and surrogate
population) with regard to the individuals* characteristics? To what extent is the difference between.
the individuals of the two populations expected1 to-affect the population parameters?
* Witlfc respects to central tendency of the two- populations?
» With respect to the variability of the; too: populations?
« Witll respect to the shape, aiid/or upper percentiles of tlie two populations?'
i- Is. there a reasonable way of adjusting or extrapolating from- the surrogate population to the:
population of concern — in terms of the individuals' characteristics?
« What method(s): should be used?'
« Is,there adequate informatlba avail'abte. to- Implement it?
image:
CHECKLIST III. ASSESSING EXTERNAL REPRESENTATIVENESS: SURROGATE
POPULATION VS. EXPOSURE ASSESSOR'S POPULATION OF CONCERN - SPATIAL
CHARACTERISTICS
How does the population of concern relate to surrogate population in the spatial characteristics?
• Case 1: Do they cover the same geographic area?
• Case 2: Is the geographic area of the population of concern a subset of the area of
surrogate population? If so, is there adequate information available to allow the
analysis of the population of concern? (Note: If so [Case 2a], we can redefine the
surrogate population to include only regions or types of geographic areas in the
population of concern and then treat this case as Case 1.)
• Case 3: Is the geographic area covered by the surrogate population a subset of that
covered by the population of concern?
• Case 4: Are two populations disjoint -- in the spatial characteristics?
How important is the difference in the two target populations with regard to the spatial
characteristics? To what extent is the difference in the spatial characteristics of the two populations
expected to affect the population parameters?
• With respect to central tendency of the two populations?
• With respect to the variability of the two populations?
• With respect to the shape and/or upper percentiles of the two populations?
Is there a reasonable way of adjusting or extrapolating from the surrogate-population to the
population of concern — in terms of the spatial characteristics?
• What method(s) should be used?
• Is there adequate information available to implement it?
A-21
image:
CHECKLIST IV. ASSESSING EXTERNAL REPRESENTATIVENESS: SURROGATE
POPULATION VS. EXPOSURE ASSESSOR'S POPULATION OF CONCERN - TEMPORAL
CHARACTERISTICS ^ ._
How does the population of concern relate to surrogate population in terms of currency and temporal
coverage (study duration)?
• Case 1: Are the duration and currency of the surrogate data compatible with the
population of concern needs?
• Case 2: Is the temporal coverage of the population of concern a subset of the
surrogate population? If so, is there adequate information available to allow the
analysis of the population of concern? (Note: If so [Case 2a], we can redefine the
surrogate population to include only time periods (e.g., seasons) of interest to the
assessor and then treat this case as Case 1.)
• Case 3: Is the temporal coverage of the surrogate population a subset of that covered
by the population of concern?
• Case 4: Are the two populations disjoint — in terms of study duration and currency?
• How does the population of concern relate to surrogate population in terms of the time unit (either
the observed time unit or, if appropriate, a modeled time unit)?
• Case 1: Are the time units compatible?
• Case 2: Is the time unit for the population of concern shorter than that of the surrogate
population? If so, are data available for the shorter time unit associated with the
population of concern. (If so .[Case 2a], this can be treated as Case 1.)
• Case 3: Is the time unit for the population of concern longer than that of the surrogate
population?
• How important is the difference in the two populations (i.e., population of concern and surrogate
population) with regard to the temporal coverage and currency? To what extent is the difference in
the temporal coverage and currency of the two populations expected to affect the population
parameters?
• With respect to central tendency of the two populations?
• With respect to the variability of the two populations?
• With respect to the shape and/or upper percentiles of the two populations?
• Is there a reasonable way of adjusting or extrapolating from the surrogate population to the
population of concern -- to account for differences in temporal coverage or currency?
• What method(s) should be used?
• Is there adequate information available to implement it?
• How important is the difference in the two populations (i.e., population of concern and surrogate
population) with regard to the time unit of observation? To what extent is the difference in the
observation time unit of the two populations expected to affect the population parameters?
• With respect to central tendency of the two populations?
• With respect to the variability of the two populations?
• With respect to the shape and/or upper percentiles of the two populations?
• Is there a reasonable way of adjusting or extrapolating from the surrogate population to the
population of concern -- to account for differences in observation time units?
• What method(s) should be used?
• Is there adequate information available to implement it?
A-22
image:
Issue Paper on Empirical Distribution Functions and
Non-parametric Simulation
Introduction
One of the issues facing risk assessors relates to the best use of empirical distribution
functions (EDFs) to represent stochastic variability intrinsic to an exposure factor. Generally,
one of two situations occurs. In the first situation, the risk assessor is reviewing an assessment in
which an EDF has been used. The risk assessor needs to make a judgement whether or not the
use of the EDF is appropriate for this particular analysis. In the second situation, the risk
assessor is conducting his/her own assessment and must decide whether a parametric
representation or non-parametric representation is best suited to the assessment. The objective of
this issue paper is to help focus discussion on the key issues and choices facing the assessor
under these circumstances.
We make the initial assumption that the data are sufficiently representative of the
exposure factor in question. Here, representative is taken to mean that the data were obtained as
a simple random sample of the relevant characteristic of the correct population, that the data were
measured in the proper scale (time and space), and that the data are of acceptable quality
(accuracy and precision).
We also make the assumption that the analysis involves an exposure/risk model which
includes additional exposure factors, some of which also exhibit natural variation. Ultimately,
we are interested in estimating some key aspects of the variation in predicted exposure/risk. As a
minimum, we are interested in statistical measures of central tendency (e.g., median), the mean,
and some measure of plausible upper bound or high-end exposure (e.g., 95th, 97.5th, or 99th
percentiles of exposure). Thus, how variable factors algebraically and statistically interact is
important.
Further, we assume that Monte Carlo methods will be used investigate the variation in
exposure/risk. Obviously, other methods can be used, but it is clear from experience that
simulation-based techniques will be used in the vast majority of applications.
Conventional wisdom advises that when there is an underlying theory supporting the use
of a particular theoretical distribution function (TDF), then the data should be used to fit the
distribution and that distribution should be used in the analysis. For example, it has been argued
that repeated dilution and mixing of an environmental pollutant should eventually result in a
lognormal distribution of concentrations. While this is an agreeable concept in principle, it is
rare situation when a theory-based TDFs are available for particular exposure factors.
Furthermore, theory-based TDFs are often only valid in the asymptotic sense. Convergence is
may be very slow, and, in the early stages, the data may be very poorly modeled by the
A-23
image:
asymptotic form of the TDP. For this issue paper, we assume that no theory^based TDFs are
available.
The issue paper is written in two parts, Part 1 addresses the strengths and weakness of
empirical distribution functions; Part II addresses issues related to judging quality of fit for
theoretical distributions.
Pgrt I, Empirical Distribution Functions
Definitions. Given representative data, X = {xif x2, ••; xtt }, the risk assessor has two basic
techniques for representing an exposure factor in a Monte Carlo analysis:
, :. •" i , , nil"I1 ' ' if ,.• ,„ " « ,',',:• ,." . ''"' ,,,; ''» ' . ' , • ;V',i,''• ,' ',"
parametric methods which attempt to characterize the exposure factor using a TDF. For
example, a lognormal, gamma, or Weibull distribution is used to represent the exposure factor,
and the data are used to estimate values for its intrinsic parameters.
• lj "..si. ' "' „ • " in , • • ' ' • ' '»"!
non-parametric methods which use the sample data to define an empirical distribution function
(EDF) or modified version of the EDF.
KDF. Sorted from smallest to largest, x, <. x2 z •" xn, the EDF is the cumulative distribution
function defined by
number ofx,^x
or
n
n k=i
Figure 1. Example of EOF
where H(u) is the unit step function which jumps from 0 to 1 when u ^ 0. The values of the EDF
are the discrete set of cumulative
probabilities (0, i/n, 2/n, •", nln). Figure 1
illustrates a basic EDF for 50 samples
drawn from lognormal distribution with a
geometric mean of 100 and a geometric
standard deviation of 3, i.e., X ~
tN(ldO,3).
In a Monte Carlo simulation, an EDF is
generated by randomly sampling the raw
data with replacement (simple
bootstrapping) so that each observation in
the data set, xk, has an equal probability of
selection, i.e., prob(x^) = \ln.
100 200 300 400 500 600 700 800
Random Variate
A-24
image:
Properties of the EDF. The following summarizes some of the basic properties of the EDF:
1. Values between any two consecutive samples, xk and xk+l cannot be simulated, nor can
values smaller than the sample minimum, xl5 or larger than the sample maximum, xn, be
generated, i.e., x > x, and x <xn
2. The mean of the EDF is equal to the sample mean. The variance of the EDF mean is
always smaller than the variance of the sample mean; it is equal to (»- l)/n times the
variance ofthe sample mean.
3. The variance of the EDF is equal to (n-l )/n times the sample variance.
4. Expected values ofthe EDF percentiles are equal to the sample percentiles.
5. If the underlying distribution is skewed to the right (as are many environmental
quantities), the EDF will tend to under-estimate the true mean and variance.
Figures 2 and 3 below illustrate typical Monte Carlo behavior ofthe EDF in reproducing the
sample mean, variance, and 95th percentile ofthe underlying sample. Here X ~ LN(100,3) with
a sample size ofN= 100 and the relative error is defined as 100 x [simulated-samplej/sample.
The oscillatory nature ofthe simulated 95th percentile reflects the normalized magnitude ofthe
difference between adjacent order statistics in the sample, jc(95), and jc(96) and shows the Monte
Carlo estimate flip-flopping between these two ranks
Figure 2. Convergence of the Mean and Varianci
1000 10000 100000 1000000
Numer of Monte Carlo Simulations
Figure 3. Convergence ofthe 95th Percentile
1000 10000 100000 1000000
Number of Monte Carlo Simulations
Linearly Interpolated EDF (Linearized EDF). For continuous random variables, it may be
troubling to define the EDF as a step function and so extrapolation is often used to estimate the
probabilities of values in between sample values. Generally, for values between observations,
linear interpolation is favored, although higher order interpolation is sometimes used. Figure 4
compares a linearly interpolated EDF with the basic EDF. The linearly interpolated EDF will
A-25
image:
tend to underestimate the sample mean and variance. It will converge to the appropriate sample
percentile, but take longer to do so when compared to the simple EDF. These differences tend to
diminish as the sample size increases. Table 1 illustrates differences between the EDF,
linearized EDF and best fit TDF for residential room air exchange rates. The EDF statistics are
,'' I1 "•"' , ' ' '« IF '" ,' ' ' , '' ' ',""'' ' ' '';i1 • n ' i i '"" • *-^ , , ,
based on a Monte Carlo simulation with 25,000 replications. Clearly the simple EDF is best at
reproducing sample moments and sample percentiles.
• J ! , , '.I. :...', " : : • ,: . • -,,- . .,:l . i , ;'"l si , I :." ' !
Statistic
mean
variance
skewness
kurtosis
5%
10%
50%
90%
95%
ACH
Sample
N = 90
0.6822
0.2387
1.4638
6.6290
0.1334
0.1839
0.6020
1.2423
1.3556
EDF
0.6821
0.2358
1.4890
6.7845
0.1320
0.1840
0.6160
1.2390
1.3820
Linearized
EDF
0.6747
0.2089
1.2426
5.6966
0.1307
0.1840
0.6032
1.2398
1.3600
Best Fit
WeibDlt
PDF
0.6782
0.2479
1.2329
4.9668
0.0881
0.1452
0.5691
1.3592
1.6450
Figure 4. Comparison of Basic EDF and
Linearly Interpolated EDF
0.950
0.70Q .
200 250 300
Random Variate
350
400
Table 1 Comparison of key summary statistics
Extended EDF, Neither the simple EDF nor the interpolated EDF can produce values beyond
the sample minimum or maximum. This may be an unreasonable restriction in many cases. For
example, the probability that a previously observed largest value in a sample based on n
observations will be exceeded in a sample of N future observations may be estimated using the
relationship prob = 1 - nl(N+n). If the next sample size is the same as the original sample size,
there is a 50% likelihood mat the new sample will have a largest value greater than the original
sample's largest value. Restricting the EDF to the smallest and largest sample values will
produce distributional tails that are too short. In order to get around this problem, one may
extend the EDF by adding plausible lower and upper bound values to the data. The actual values
ate usually based on theoretical considerations or on expert judgement. For right skewed data,
adding a new minimum and maximum would tend to increase the mean and variance of the EDF.
This same sort or rational is used when continuous, unbounded TDFs are truncated at the low
and high end to avoid generating unrealistic values during Monte Carlo simulation (e.g., 15 kg
adult males, females over 2.5m tall, etc.)
'. '!• ! i '•
'"ill '':»
Mixed Empirical-Exponential Distribution. An alternative approach to extending the upper
tail of an empirical distribution beyond the sample data has been suggested by Bratley et al. In
their method, an exponential tail is fit to the last five or ten percent of the data. This method is
A-26
image:
based on extreme value theory and the observation that extreme values for many continuous,
unbounded distributions follow an exponential distribution.
Starting Points
The following table summarizes the results of an informal survey of experts who were asked to
contribute their observations and thoughts on the strengths and weaknesses of EDFs by
addressing a list of questions and issues. Based on this survey:
1. The World seems to be divided into TDF'ers and EDF'ers.
2. There are no clear-cut, unambiguous statistical reasons for choosing EDFs over TDFs or
vice versa.
3. Many of the criticisms leveled at EDFs also apply to TDFs (e.g., the data must be simple
random samples)..
4. One aspect of which may have important implications for our discussion is the nature of
the decision and how sensitive an outcome is to the choice of an EDF.
5. Generally, contributors did not express much support for either the linearized EDF or the
extended EDF. Why they seem to be comfortable with TDFs, which essentially
interpolate between data points as well as extrapolated beyond the data, is unclear.
A-27
image:
,
,1:
1
il!
Comments
o
/A
K
Jtfl
to
t—
known about the quantity for which the dislributibi
.S2
"to
•5
c
o
"TT.
Yes, but perhaps an incomplete 'representat
needed.
II
ro ra
§ E
ijj o
££
cu o
CD M
15 °
Q. >,
o ™
8 -g
CD o
S -c
> JS
e 5
O.JD
co co
U. T)
O 0>
LU j£
53
^°
i ^.5
2 £• £ o .-
Dn parametric assumptions) is true arid is awelf-kr
letric procedures make some strdng^ssumptioris.
thd class of pbssible probability distributions to a
finite number of real numbers, or parameters. In
ledians of two sets of data) the data are mddeled 1
s" thatthd. members of each pair "are. the same ^ t
, using a'ri EDF is sbrrieithirig entireiy'different'than
'distributions. Usually, ybuuse'ari EDF as 'a'tooi t
73 SS:E ro fc ro iP. .2
i If Iff Ills
I 11:? Iff! Si
m C = CD 3 0 S5 U- tO -m
C ~ CO .c .„ -.— ,n »5
One has to assume a representative randor
As another example, advantage 2 (EDFs'dc
advantage. Less Well known is that almost
Technically, a parametric situation is onewl
collection that can be Described in a natural
common non-parametric situations (such as
pairs of distributions, but there is stilla restr
'distribution except for a change of location.
set of assumptions you make about the clas
make an estimate: that is; as a computation
_o
^3
CD
E
ro
&§-
ro £
ol
"O "O
CD (Q
CD ^
Q S
C
« ">
o c
;O
u- |i- to
Q E -35
UJ S "a
<e o
. CO rZ
CM ro E
C? "° &• 0
>•£ ra ^_ CD o
^S.o>--5 .-> ,^
2 that the EDF cd'nvfe'rges in prpbability to the undf
ntso^data.' bWd'irn^bVia'hfissuelnrls^k^aSsessmb
usually means' we'afe rfbw'here''riearsa" liniitmg pas
ig Maximum Likelihbo'd/withput direful evaluation
iverge'yERYsfdwl/to\he;'underiying;d1s^^^^^
i). therefdre this convergence" pheftorrienb'n is1 'hoi
' Accuracy of any 'iritervafls 'd'riye^pyla sfen'dard
iracy, with 20 iritervals.'you wo'ulcf'riesd mbVe'tharf
cal purposes, Unless'yoU'reHhe Qi^isUs' Bure|iu.
% -§ .!2 =5 ^g.ic ^2 o t3
_c o • .c » co "• jS ^ j — *
j^""* tjs ,;J5 ™ . 3 ,SJ '-co o - co
:s2;;<2^ro w » .£ .n:^
jti.'jS ^ ^ p -£ CD ,ro 42
"tO 5> 3 ^ LL.t? S> C <D
Although for 'most well-behaved 'distribution
Distribution, cbnve'rgence 'often requires urii
'the near universalsituatibn of having toofe
!ithat'we shbuld beware ALL 'asymptotic /met
their applicability to bur small 'data:set. 'EDI
'(especially if-ybU're trying to characterize e:
cornforting or useful.
: EDFs are almost Usetess,' except in veryla'l
-deviation of sqrt(ri) in that interval. For eve
underlying observations.
Thisls useless, since "large" is Uriattainabi
S
* Q) ^i
d) *o
^
§1
"to -ra
uj ^
to °
CD ^*~
ts
±±
{0 ^
w ^i
CD 1*2
S>'iS2
ll
co.^
*s
ra
•7-1 >
quite wfde in some da'ses. ! For example, 'a'smali i
frbm a pbsifively^skewed pdpuiatibh.
^tB-jgj
ro >§
||
Yes,' but the confidence lirriits orf those esti
set that 1s negatively skewed could be a rai
CD .
C Q) -Q_
C 0 '= T3
~o ,2 . y
1 I ^ c
i— "Jr: LJJ ,2
•g ~ •^'ro
•^ o>i= O
o c ro r;
CD '5, TO c:
^ ^ .i §
^D C "^ "-P*
•5 ,= -o ro
0 m c E
2 S co E
°- *= co £ c
yj <4~ (^ .(— C
Q CD C '-^ i
^t- to co >_-D
CD !;
ing data 'to" a! fitted 'paramefficaWibtiti6nbr''mi5Sui
Q.
1
"c
CD
'^
Maybe. Not sure hbwihis' is different than
distributions.
2
ro
o
_c
CD
»3
It ^
^3 .S
c 5
ro j
•^ CD
ro -2
o -,„
U- cS
Q :•=
^.1
< i:
. c
in Q.
CD
o) J2
c c:
'5. CO
-^ ^™ rn
2-2 o1
D> CO CO
CD E2
^£-5
> .C CD
'^ O ^
2 to CD
c to .c
CO O '^
CD — i:
o -g .•§
73 ro ^
0 cSl
LU •£ -ro
"C 3 '0
**• - 8
CD ^ ro
4-28
image:
co
co
Ic
.c
5
£
CO
•s
z=
1
CO
CO
CO
c.
o
3
.a
'i_
!M
-M
rametric c
CO o
Q. tn
o ro
*- co
11
Z3 Q
E 2
o ^5
11
in
S3
how? They can be
ige for EDFs and nol
o- **
TO CD
.IZ >
0 C
U. CO
i^
CD
CD
CD
CO
CO
CO
CD
C
8
g "S
§1
o 3
. CO
N- O
CD"
£
l~. ,
•fl)
•5
CD
CD
(D
~
±±
O
"c
o
Q.
E
•^
o
f=
CD
.C
ifs
^ CD
*^ "co
•22
73 C
CD —
11
_O £
(Q O)
>.?
'co O
co -n
CD •£,
CD C
CO «
"CD 2
ien confidence interv
average claimed of tl
o
^ CD
O •*-*
•E g>
S5
CD
1
5*
CO
£
2
CD
4=
. .
o
i
_
o
>>
CO
1J-
Q
LLI
'antage ol
iclusions.
-g o
CO O
"*"* "G
W •r'
S >*
.c ^
1- E
•d J5
^ 0
E S
— (D
CD E
J- CD
CO i_
»n CD
ude if measurement!
. I.e., the choice of p
5.2
o —
il
CD
f"
O
CO
CO
£
a.
c
8
CD
^ ^o
§!
\— o
-§ E
^1
8 §
!s2 ^
^
*-> CO
CO CD
CD o
CD |
H 8
.£2 0
CD CD
co x:
^-*
E f
CO M—
Is
CD c
S§2
0 0)
= CO
— co
CD C
J= S3
.CO ^
CO '£
E .<2
1- -D
SCO
c
c o
E = =
o £ o
~° CD c
§^8
0 T3 ^
CO
CD CD 'C
"co 1o co
g"° o
CO CD —
CD *- CD
& C CD
C
CO C Jf
U_ CD E
Q fc CD
UJ 3£
. 8 c
OO O O
Q)
CO
.CO
£
"o
essence
fli
^
03
£
.CO
CO
^
J3
"CD
CO
"co
"o.
CO
CO
1
CD
^
^
O
CO
CO
^
<D
Q.
E
CD
^1
—
CD
E
"co
"o
CD
Q.
CO
CD
c ^5
s I
CO ID
Ic N
H 'co
A-29
image:
'!• "K
' ii
Whafryou need is" random -representative-data and tdfeel comfortable that your data incIudeWe lower!airid upiper
; bounds of the quantity. The number of data points in itself is not particularly important
CO
i
1
i g ul
la
0> 3
f How many <Jata? Two. This somewhat flippant answer simply highlights the-lmportant facltriat ytiii 'need 1b ask
<V\e question in the context of (a) what decision is being made and (b) what its risk' fundtkm is (hbw bad is it if the
deciston is'incorrect?). Ifthe risk function is low (it-doesn't matter much if we are'wrbng) andlhe decision is
: really-obvious, then sometimes all you need is a reality check. Hencethe need forone datum. People makes
^mistakes and-Murphy's Law^pplies, sb experience^ictates'aseebW^atum.' l-kno%you;'gnys»at EPA-and'in the
: states are cbmpetehf and sensible an'd often very good aithis stuff; :buj1there are still many ^people and many
• agencies outtherethat are just tod tincorhfoVtable with eorriifion sensevlikd this; sojittpays1tolfepeat'it.t (The
•.comment cuts both ways: sometimes 1 am askeW by clienfsto gather mote'data'ld shdw'th'at they;dbn't hav'e a
t prdblemj-when all their 'data point- tb serious^cbfrtamination.' Most of 'them-Back'doWn fight dway' when confrbnted
^withthe'cbhimon-serise approach-"you obvio'lisly have a problem, so'let's^alk'insteadabbut hbW'td remedy it,
tsince" honest statistics wbri't'make it go away.")
i 1 would not appfoacK thetopicthis'way. ! 1 woiild ask, "instead,1 h'bw:idb 1 characterize an-amdilht of 'data; and
« giverr these surrimary charaGteristies,:what m'ethbds are appropriate.
/At^rrtrflfnumh'C^pom^-per^
s:iriterp'6latibri PfWPst ^density ^^ curves. For ^^birrio'dal'efc.'ybubTe the' 'ntrmber of :iflten?als.
J6ee -'thafrdepetids. M thlnkte
no'p'lace tf1orbMeightbrfthe.99Wperc'enttlle,4heri^OO-!^^^ ^ .
%you-Wart1?folknoW.-*y-ou-areptim^
?gobd. Thisis-similar^tothe'-'hoWmany-iteratibnglst^rib&gh?1^^
:iiterati6'ns,«theni Mhink;ft Is^pretty hard" tb justify NOT us1no;;anSE§F
ii|fiyou*are=gdfng to^place;af lot of-weight bnthe 99W pereentne/thenHW olatSlibint^areielf^
lito^know.
lEXCEPTt'ON- • Nbt^with much accuracy. 'The'theory is ^simple^rid ^ie- exahn^fe%s/ilf1llustrile%ie:issue. ;By
ideation of percertle,%ereMs' 01 9^
idi^tributibh does' ndtoGcurintf random sampFe. !In a sa^
^distribution values' betweerflhe 99th and H GfiWpereentnes-therifore-dor* n6t-occar'W)ith:Prbbal3ility^O:99)AWD,
!-which5is extremely ;close^o 'Me, or4lmost 40%. 'therefbre ; there are|lrnb^
^ouShaVembt^ensseen^nylhing as-high^the 99th -percerrtfleryet. ToW^irlyiUre^f seeit%a^lue W^rfjiigri,
^yoU'need^to.sdlVe [0:99)*N <=%ssurance-viruev(such asi5%norN. TfiaMo^ltf^
ithfe^xample and even1hen^ybusotily haVe-^5% aDnfiSenee^'a^^^^
1
1
1
1
'c
ji
1
image:
CD
A true EOF uses step functions-this is resampling of the data in which each data point has a probability 1/n.
use of linear interpolation will typically lead to lower estimates of the standard deviation, since you are not
guaranteed to sample the min and max data points.
CD
1,
.Q CO
fel
111 o
<D i
.=•**=
"*"* O.
CD 0)
N -*-•
CO <D
CD CO
.£ =
— o
33 co
Zi CD
O --5
CO ci
CJ
CD CD
•<- Q.
CD
.. -C •
Now you're going down a slippery slope. As soon as you linearize your EOF you are entering into the land o
semi-parametric techniques, smoothing, modeling, and assumptions. You're not using the EOF any more. T
EOF is accurately and correctly described by its cumulative distribution function, which will be a step functior
* IS
CD M- m "O
If your aren't using a continuous distribution, why not just go with the data? The diversity of distributions is v>
rich. For example, see Evans, Hasting, and Peacock, Statistical Distributions, 2nd Ed., Wiley (1993) for 39 o
them. Using some kind of test for fit of the continuous distribution to your data, e.g., quantiles, you usually c;
obtain a reasonable fit. See JWTukey, Exploratory Data Analysis Addison-Wesley (1977). If not, e.g., bimo
you will have to decompose or transform your data, and you already start to make important assumptions.
£
.«
Smoothing EDFs within the bulk of the probability curve causes no serious errors. Extrapolation beyond the I
of data violates the very concept of EOF, and is intrinsically dependent on the parameterization used.
CO
73 en
'rt CD
The simple solution is to use the midpoint rule (apply prob. at the interval midpoint). Alternatively, use trapezi
rule (st. line interpolation). For a continuous curve, a straight line interpolation averages properly and improv
discretization bias. I, however, would suggest using resampling as a better approach than smoothing.
I usually use percentiles, but you have enough data to use an EOF, then it shouldn't matter much.
CD
en
o
In this case, the difference between step functions and linear interpolations becomes small. Why bin? You I
information that way. If you have large segments of the CDF that are approximately piecewise uniform, then
binning the data won't result in much loss of information.
_§.£!
"5 "§ <& .S
J2 g_eo 3
en <n <B -S
Sap
*- o > c
co *« - °
m ^~ CO "^
•*-* CO '> -i ^
73 -8 c- g-
1 'co -I ~
§5 to .2 en
f <B .i -8
•~^ •£ a> >
^ C CD o
T- .a £ 1
c
— CO O
CD
here's a lot of literature on binning data, mostly in terms of how the perception of the histogram can change.
would suggest, in the spirit of the response to question 1 , that you consider the effect the binning process ha
the outcome of your work, since your question really is one of computational practice, not conceptual approai
Bin the data to speed your process (simulation, bootstrapping, whatever) but in a way in which you can
demonstrate your answers are not materially different than what you would get with a more accurate procedu
How do you know what a material difference is? Look at your decision space and your risk function.
o-
CO
"co
73
CD
.C
CD
_N
m
No! This approach causes more mischief in epidemiology than in exposure analysis, but anytime you summ;
the data, you lose information. If the data set is large, feel grateful.
CO
The intervals or bins used are mathematical estimators of the underlying density or distribution curve. This is
numerical integration or interpolation issue. Typically 10-20 intervals gives good performance on a unimodal
density function. Particularly if linear interpolation is used.
No.
A-31
image:
sli
at
he data talk to you
Le
no.
CD
CD
"CO
|
you wan
3
O
>.
OJ
c
Hi oj
ll
^,'5
f§ 0)
%*
CO _
m §
j=-o
is o
r- O
1.2
5 a)
(0-°
S2
•c 0
ro S
--. ns
CD ^
jr -—
a tail problem
udgment on wi
1 punt. This
you base yo
E
_.
S
offers in co
as described above. 1 don't see what
he analyst up for excessive criticism.
od
o o p
lly
is)
g care
pproa
sents
ch (after
plicable
tio
an
th
, s
don't like this meth
ould seem to open
like
ist
tati
he mi
ution
ical r
st
is,
me
•a
CD
•n •<- a> -a
'a. " -0
to
c
.
cJ
Sm §5 8.SS
£ S
A-32
image:
5
i—
0)
1
TO
jl
CD
"S
$
E
o
spresentative rand
£_
CO
S
j assume the da
o^
^*
H~*
Probably nothing needs to be done
CD
o
o.
CD
£
~
TO
TO"
5.
to
8
jCt
M—
^:
T~
1_
TO
c
M—
m
-1
CD
_C
t~
*->
TJ
TO
.Q
CO
~0
.CO
t
CD
*2
.—
*4—
TJ
C
TO
C
"ro
J2
s
XI
CD
.c
f—
'^
CD
*•
'§
TS
CD
TO
Q
£
C
^
?
CO
o
s and other statisti
f"*
Q
1
S
TJ
ates of standard
E
'^5
(f)
look noisy or jumpy due to the gaps
interpolations can lead to different e
s=
o"
3!
~3
o
.c
TO
g
of
.c
"c
o
o
.CO
'CD
3
CO
>
£
C
CD
£
CO
1
^
"3
o
ro
CD
J
^
TO
Ll-
Q
LU
CD
tn
S
Q.
CD
to
C
*2
1
1
'o
1
.c
IE
3
.5
Q.
to
TJ
O-
CO
CO
s
o
Q.
CD
C
"o.
ro
CO
CD
CD
•*-
m
3
§,
CD
fitting suggestion.
•—
ro
!™
"c
CD
CD
•rr
co
CD
You have partially answered this qu
t|2
ro
o
E
o
**"~
^
o
>
ro
"il
CD
o
ie purely parametr
_i_
L_
<D
O)
C
1
1 1 1
t_
*~i
interpolating and fitting curves to yo
S
CD
•*••
to
can't trust using ju
-^
Q
co
CD
CO
T>
>^
^^
.J
of the EOF advantages you so carel
_fO
H
d.
CD
to
tfe
ro
CO
ro
TO
to
Tl
rameterizing your i
TO
Q.
g
"5
.g
%
T)
TO
CD
CD
CD
C
CD
CD
As 1 understand bootstrap, you mus
step takes care of interpolation.
•5
13
s
CO
"5
i
^
_c
£
tstraps. 1 wouldn't
o
o
1
take percentiles
^
o
of
You could bootstrap from percentile
difference.
A-33
image:
Part II. Issues Related to Fitting Theoretical Distributions
Suppose the following set of circumstances:
(1) that we have a random sample of an exposure parameter which exhibits natural variation
'' ' '('I ' ' ' , " i ...... ., ' ' - ," i!
'>! ' : :'"! ' .•"' i ':i " '' ''. ;: '''•' • ' "• ' . , ' "; . .
(2) that the collected data are representative of the exposure parameter of interest (i.e., the
data measure Jie right population, in me right time and spatial scales etc.)
:• '. *' • ' ' ;;ij • ' ..... 1;. • ' • '. ' - . ' • ,,;,
(3) that estimates of measurement error are available.
'. • •• "!1| , '/: • , ' ... ' .. ,,1 ...... , ' . i . .
(4) that there Is no available physical model to describe the distribution of the data (i.e., there
is no theoretical basis to say that the data are lognormal, gamma, Weibull, etc).
(5) that we wish to characterize and account for the variation in the parameter in an analysis
of environmental exposures.
(6) we run the data through our favorite distribution-fitting software and get goodness of fit
statistics (e.g., chi-square, Kolmogorov-Smirnov, Cramer-von Mises, Anderson-Darling,
Watson, etc.) and their statistical significance.
(7) rankings based on the goodness of fit results are mixed, depending on the statistic and p-
values.
(8) graphical examination of the quality of fit (QQ plots, PP plots, histogram overlays,
residual plots, etc) presents a mixed picture, reinforcing the differences observed in the
goodness of fit statistics.
Questions
1), A statistician might say that one should pick the simplest distribution not rejected by the
data. But what does that mean when rejection is dependent on the statistic chosen and an
arbitrary level of statistical significance?
2). On what basis should it be decided whether or not a data set is adequately represented by a
fitted analytic distribution?
3). Specifically, what role should the p-value of the goodness of fit statistic play in that
judgment?
4), What role should graphical examination of fit play?
A-34
image:
Respondent #1
All distributions are, in fact empirical. Parametric distributions are merely theoretical constructs.
There is no reason to believe that any given distribution is, in fact, log-normal (or any other
specific parametric type). That we agree to call a distribution log-normal is (or at least should
be) merely a shorthand by which we mean that it looks sufficiently like a theoretical log-normal
distribution to save ourselves the extra work involved in specifying the empirical distribution.
Other than analyses where we are dealing strictly with hypothetical constructs (e.g, what if we
say that such-and-such distribution is lognormal and such and such distribution is normal....), I
can see no theoretical justification for a parametric distribution other than the convenience
gained. When the empirical data are sparse in the tails, we, of course, run into trouble in needing
to specify an arbitrary maximum and minimum to the empirical distribution. While this may
introduce considerable uncertainty, it is not necessarily a more uncertain practice than allowing
the parametric construct to dictate the shape of the tails, or for that matter arbitrarily truncating
the upper tail of a parametric distribution. This becomes less of .a problem if the analysts goal in
constructing an input distribution is to describe the existing data with as little extrapolation as
necessary rather than to predict the "theoretical" underlying distribution. This distinction gets us
close to the frequentist/subjectivist schism where many, if not all MC roads eventually seem to
lead.
Respondent #2
...if you use p-bounds you don't have to choose a single distribution. You can use the entire
equivalence class of distributions (be it a large or small class). I mean, if you can't discriminate
between them on the basis of goodness of fit, maybe you do the problem a disservice to try. And
operationalizing the criterion for "simplest" distribution is no picnic either.
Respondent #3
Why not try the KISS method: Keep It Simple & Sound. The Ranked Order Data assuming
uniform probability intervals is a method that makes no assumptions as to the nature of the
distribution. I also tends to the true distribution function as the number of data points increases.
If you have replicate measurements (on each random sample) then the mean of these should be
used.
The method yields simple rapid random number generators and one can obtain and desired
statistical parameter of the distribution. However, use of the distribution function in any estimate
is advised. Given the high level of approximation and/or bias in most risk assessment data and
models, any approximation to the true PDF should be adequate.
There is one occasion when the theoretical PDF may be better than the empirical PDF. That is
when it comes from the solution of equations based on fundamental laws constraining the
solution to a specified form. Even in this case agreement with data is required. This in not
usually the case in risk assessment PDFs.
A-35
image:
: ' • jj! , , . •,',', , ! . i '' !'' - ' • ••• ;
Respondent #4
Since I am blessed not to be a statistician, I have no problem disputing their "statement" about
the "simplest" distribution^ I don't know what they mean either. What really matters physically
is picking a distribution that has the fewest variables and that is easy to apply, given the kind of
analysis you want to do. You want one that does not make assumptions in its construction that
cifttradict processes operating; in your data. If your are generating equally bad fits with a variety
of the usual distributions anyway, by all means chose the one that is easiest to use. For time
sliced exposure data, the "right" distribution almost always means a lognormal distribution. A
physical basis for the lognormal does exist for exposure data, and empirically, most exposure
data fit Ipgnormals. [Your assumption "A" does not hold for typical exposure processes.]
Wayne Ott, who probably does not even remember it, taught me this one afternoon in the back of
a meeting room. See "A Probabilistic methodology for analyzing water quality effects of urban
runoff on rivers and streams," Office of Water, February 15, 1984. Just tell people that you have
used a lognormal distribution for convenience, although it does not fit particularly well, then
provide some summary statistics that describe the poorness of fit.
Problems begin when you get a poor fit to a lognormal distribution but a good fit with a different
distribution. Say you get a better fit to the Cauchy distribution, because the tails of your pdf
have more density. Now things get more fun. Statisticians would say that you should use the
Cauchy distribution, because it is a better fit. I say that you should still use the lognormal,
because you can interpret manipulations of the data more easily, and just note that the lognormal
fit is, poor. Problems will arise, however, if you want to reach conclusions that rely on the tails of
the distribution, and you use the lognormal pdf formulation, instead of your actual data. I
somewhat anticipated your dilemma in my previous E-mail to you. If you don't need to use a
continuous distribution, just go with the data!"
For time dependent exposure data, the situation gets much more complex. I prefer to work with
Weibull distributions, but I see lots of studies that use Box-Jenkins models.
And you also asked: On what basis do I decide whether my data are adequately represented by a
fitted analytic distribution? Specifically, what role should the p-value of the goodness of fit
statistic play in my choice? What role should graphical examination of fit play?
To me, the data are adequately represented, when the analytical distribution adequately fills the
role you intend it to have. In other words, if you substitute a lognormal distribution for your
data, as a surrogate, then carry out some operations and obtain a result, the lognormal is
adequate, unless it leads to a different conclusion than the actual data would support. The same
statement is true of any continuous distribution.
Similarly, as a Bayesian, I think that the proper role of a p-value is the role you believe it should
play. I don't think that p-values have much meaning in these kinds of analyses, but if you think
tlipy should, you should state the desired value before beginning to analyze the data, and not
proceed until you obtain this degree of fittedness or better. If small differences in p-value make
. ' ' ' : ' , "'" ' A-36
image:
much difference in your analysis, your conclusions are probably too evanescent to have much
usefulness. The quantiles approach that I previously commended to you, is a graphical method.
[See J.W. Tukey, Exploratory Data Analysis. Addison-Wesley (1977)]. In it, you would display
the distribution of your data, mapped against the prediction from the continuous distribution you
have chosen, with both displayed as order statistics. If your data fit your distribution well, the
points (data quantiles versus distribution quantiles, will fall along a straight (x=y) line.
Systematic differences in location, spread, and/or shape will show up fairly dramatically. Such
visual inspection is much more informative than perusing summary statistics. No "statistical
fitting" is involved. [Also see J.M. Chambers et al, Graphical Methods for Data Analysis. Cole
Publishing (1983)].
Respondent #5
I have several thoughts on the goodness of fit question. First, visual examination of the data is
likely to yield more insight into the REASONS for the mixed behavior of the various statistics;
i.e., in what regions of the variable of interest does a particular theoretical distribution not fit
well, and in what direction is the error? Then choosing a particular parametric distribution can
be influenced by the purpose of the analysis. For example, if you are interested in tail
probabilities, then fitting well in the tails will be more important than fitting well in the central
region of the distribution, and vice versa.
A good understanding of the theoretical properties of the various distributions is also handy. For
example, the heavy tails of the lognormal mean that the moments can be very strongly influenced
by relatively low-probability tails. If that seems appropriate fine; if not the analyst should be
aware of that, etc. I don't think there is a simple answer; it all depends on what you are trying to
do and why!
Respondent #6
In broad overview, I have these suggestions - all of which are subject to modification,
depending on the situation.
1. Professional judgment is **unavoidable** and is **always** a major part of every statistical
analysis and/or risk assessment. Even a (dumb) decision to rely **exclusively** on one
particular GOF statistic is an act of professional judgment. There is no way to make any decision
based exclusively on "objective information" because the decision on what is considered
objective contains unavoidable subjective components. There is no way out of any problem
except to use and to celebrate professional judgment. As a profession, we risk assessors need to
get over this hang up and move ahead.
2. It is **always** necessary and appropriate to fit several different parametric distributions to a
data set. We make choices on the adequacy of a fit by comparison to alternatives. Sometimes we
decide that one 2-parameter distribution fits well enough (and better than the reasonable
A-37
image:
alternatives) so trial we will use this distribution. Sometimes we decide that it is necessary to use
a'Ifior;d cdmplicated parametric distribution (e.g., a 5-parameter "mixture" distribution) to fit the
data well (and better than the reasonable alternatives). And sometimes, we decide that no
parametric distribution can do the job adequately well, hence the need for bootstrapping and
other methods.
3, The human eye is far, far better at **judging** the overall match (or lack thereof) between a
fitted, distribution and the data under analysis than any statistical test ever devised. GOF tests are
"blind" to the data! We need to visualize, visualize, and visualize the data - as compared to the
alternative fitted distributions - to **see** how the various fits compare to the data. Mosteller,
f ukey, and Cleveland, three of the most distinguished statisticians of the last 50 years, have all
stressed the **essential** nature of visualization and human judgment relying thereon (in lieu of
GOF tests). BTW, these graphs and visualizations *must* be published for all to see and
understand. r _ i : t
4, In situations where no single parametric distribution provides an **adequate** fit to the data,
there are several possible approaches to keep moving ahead. Here are my favorites.
A (standard approach) Fit a "mixture" distribution to the data.
B. Use the two or three or four parametric distributions that offer the most appealing fit in a
sensitivity analysis to see if the differences among the candidate distributions really make a
difference in the decision at hand. Get the computer to simulate the results of choosing
among the different candidate distributions. This leads to keen insights as to the "value of
information1^ •
C. (see references below, and references cited therein) By extension of the previous idea,
analysts can fit and use "second-order" distributions that contain both **Variability** and
**Uncertainty**. These second-order distributions have many appealing properties,
especially the property that they allow the analyst to propagate Variability and Uncertainty
^separately** so the risk assessor^ the risk manager, and the public can all see how the Var
and Unc combine throughout the computation / simulation into the final answer.
Respondent #7
[RE comments #-1, #3, respondent #6].... the motivation behind having standardized methods:
Professional judgment does not always produce the same result. Your professional judgment
does not necessarily coincide with someone else's professional judgment. Surely, you've noticed
tills. Tjie problem isn't that no one is celebrating their professional judgement - the problem is
that we have more than one party ^
;iP'!< r^T'"-! '; '"'.*• ,'i'T:' •'*''• '" '':':i!-"" ' ;' ;. ' ':':'' .'--I'
ffie bigger and triore unique the problem, the less standardization matters. But if you are trying
to compare, say, the risk from thousands of superfund sites, you can't very well reinvent risk
A-38
image:
analysis for every one and expect to get comparable results - whatever you do for one you must
do for all.
Have you tried to produce a GOF statistic that matches your visual preference? I have. For
instance, I think fitting predicted percentiles produces better looking fits than fitting observed
values (e.g., maximum likelihood) - because this naturally gives deviations at extreme values less
weight - where 'extreme value' is model dependent.
A-39
image:
r
•
image:
APPENDIX B
LIST OF EXPERTS AND OBSERVERS
image:
r
image:
?xEPA
United States
Environmental Protection Agency
Risk Assessment Forum
Workshop on Selecting Input
Distributions for Probabilistic Assessment
U.S. Environmental Protection Agency
New York, NY
April 21-22, 1998
List of Experts
Sheila Abraham
Environmental Specialist
Risk Assessment/Management
Northeast District Office
Ohio Environmental
Protection Agency
2110 East Aurora Road
Twinsburg, OH 44087
330-963-1290
Fax: 330-487-0769
E-mail:sabraham@epa.state.oh.us
Hans Allender
U.S. Environmental
Protection Agency
401 M Street, SW (7509)
Washington, DC 20460
703-305-7883
E-mail: allender.hans@
epamail.epa.gov
Timothy Barry
Office of Science Policy,
Planning, and Evaluation
U.S. Environmental
Protection Agency
401 M Street, SW(2174)
Washington, DC 20460
202-260-2038
E-mail: barry.timothy@
epamail.epa.gov
Robert Blaisdell
Associate Toxicologist
California Office of Environmental
Health Hazard Assessment
2151 Berkeley Way
Annex 11 - 2nd Floor
Berkeley, CA 94704
510-540-3487
Fax: 510-540-2923
E-mail: bblaisde@
berkeley.cahwnet.gov
David Burmaster
President
Alceon Corporation
P.O. Box 382669
Cambridge, MA 02238-2669
617-864-4300
Fax:617-864-9954
E-mail: deb@alceon.com
Christopher Frey
Assistant Professor
Department of Civil Engineering
North Carolina State University
P.O. Box 7908
Raleigh, NC 27695-7908
919-515-1155
Fax: 919-515-7908
E-mail: frey@eos.ncsu.edu
Susan Griffin
Environmental Scientist
Superfund Remedial Branch
Hazardous Waste
Management Division
U.S. Environmental
Protection Agency
999 18th Street (8EPR-PS)
Suite 500
Denver, CO 80202-2466
303-312-6651
Fax: 303-312-6065
E-mail: griffin.susan@
epamail.epa.gov
Bruce Hope
Environmental Toxicologist
Oregon Department of
Environmental Quality
811 Southwest 6th Avenue
Portland, OR 97204
503-229-6251
Fax: 503-229-6977
E-mail: hope.bruce@deq.state.or.us
William Huber
President
Quantitative Decisions
539 Valley View Road
Merion, PA 19066
610-771-0606
Fax:610-771-0607
E-mail: whuber@quantdec.com
Printed on Recycled Paper
B-1
image:
Robert Lee
Risk Analyst
Colder Associates, Inc.
4104 148th Avenue, NW
Redmond, WA 98052
206-367-2673
Fax:206-616-4875
E-mail: rciee@u.washington.edu
•' : lliii'ii .;•;!' ;'('. ";!: i "li ' '.' <\ .,
David Miller
Chemist
Officetof Pesticide Programs
Health Effects Division
U.S. Environmental
Protection Agency
401 M Street, SW (7509)
Washington, DC 20460
703-305-5352
Fax: 703-305-5147
E-mail: miller.david©
eparnail,epa.gbv
! «.' ;, " ,,'ii; . . . •
Samuel Morns
Environmental Scientist'
Deputy Division Head
Braqkhaverj National Laboratory
Building 815
815 Rutherford Avenue
Upton, NY 11973
516-344-2018
Fax: 516-344-7905
E-mail: morris3@bnl.gbv
Jacqueline Moya
Environmental Engineer
National Center for
Environmental Assessment
Office of Research and Development
U.S. Environmental
Proteptjqn Agency
401 M Street. SW(8623D)
Washington, DC 20460
202-664-3245
Fax: 202-565-0052
E-mail: moya.jacqueline@
epamail.epa.gov
Christopher Portier
Chief
Laboratory of Computational
Biology and Risk Analysis
National Institute of
Environmental Health Sciences
P.O. Box 12233 (MD-A306)
Research Triangle Park, NC 27709
919-541-4999
Fax: 919-541-1479 " ' ' " '
E-mail: portier@niehs.nih.gov
P. Barry Ryan
Professor
Exposure Assessment and
Environmental Chemistry
Rollins School of Public Health
Emory University
1518 Clifton Road, NE
Atlanta, GA 30322
404-727-3826
Fax: 404-727-8744
E-mail: bryan@sph.emory.edu
Brian Sassaman
Bioenvironmental Engineer
US. Air Force
DET1.HSC/OEMH
2402 E Drive
Brooks Air Force Base, TX 78235-
5114
210-536-6122
Fax:210-536-1130
E-mail: bria'n.sassaman®
guardian.brooks.af.mil
Ted Simon'
Toxicologist
Federal Facilities Branch
Waste Management Division
U.S. Environmental Protection Agency
Atlanta Federal Center
61 Forsyth Street, SW
Atlanta, GA 30303-3415
404-562-8642
Fax: 404-562-8566
E-mail: simon.ted@epamail.epa.gov
Mitchell J. Small
Professor
Departments of Civil &
Environmental Engineering and
Engineering
& Public Policy
Carnegie Mellon University
Porter Hall 119, Frew Street
Pittsburgh, PA 15213-3890
412-268-8782
Fax:412-268-7813
E-mail: ms35@andrew.cmu.edu
Edward Stanek
Professor of Biostatistics
Department of Biostatistics
and Epidemiology
University of Massachusetts
404 Arnold Hall
Amherst, MA 01003-0430
413-545-4603
Fax:413-545-1645
E-mail:
stanek@schoolph.umass.edu
Alan Stern
Acting Chief
Bureau of Risk Analysis
Division of Science and Research
New Jersey Department of
Environmental Protection
401 East State Street
P.O. Box 409
Trenton, NJ 08625
609-633-2374
Fax: 609-292-7340
E-mail: astern@dep.state.nj.us
Paul White
Environmental Engineer
National Center for
Environmental Assessment
Office of Research and Development
U.S. Environmental
Protection Agency
401 M Street, SW (8623D)
Washington, DC 20460
202-564-3289
Fax: 202-565-0078
E-mail: white.paul@epamail.epa.gov
(over)
B-2
image:
SEPA
United States
Environmental Protection Agency
Risk Assessment Forum
Workshop on Selecting Input
Distributions for Probabilistic Assessment
U.S. Environmental Protection Agency
New York, NY
April 21-22, 1998
Final List of Observers
Samantha Bates
Graduate Student/
Research Assistant
Department of Statistics
University of Washington
Box 354322
Seattle, WA 98195
206-543-8484
Fax: 206-685-7419
E-mail: sam@stat.washington.edu
Steve Chang
Environmental Engineer
Office of Emergency and
Remedial Response
U.S. Environmental
Protection Agency
401 M Street, SW (5204G)
Washington, DC 20460
703-603-9017
Fax: 703-603-9103
E-mail: chang.steve@
epamail.epa.gov
Helen Chernoff
Senior Scientist
JAMS Consultants, Inc.
655 Third Avenue
New York, NY 10017
212-867-1777
Fax: 212-697-6354
E-mail: hchernoff@
tamsconsultants.com
Printed on Recycled Paper
Christine Daily
Health Physicist
Radiation Protection &
Health Effects Branch
Division of Regulatory Applications
U.S. Nuclear Regulatory Commission
(T-9C24)
Washington, DC 20555
301-415-6026
Fax: 301-415-5385
E-mail: cxd@nrc.gov
Em ran Dawoud
Human Health Risk Assessor
Toxicology and Risk Analysis Section
Life Science Division
Oak Ridge National Laboratory
1060 Commerce Park Drive (MS-
6480)
Oak Ridge, TN 37830
423-241-4739
Fax: 423-574-0004
E-mail: dawoudea@ornl.gov
Audrey Galizia
Environmental Scientist
Program Support Branch
Emergency and Remedial
Response Division
U.S, Environmental Protection Agency
290 Broadway
New York, NY "10007
212-637-4352
Fax: 212-637-4360
E-mail: galizia.audrey@
epamail.epa.gov
(over)
B-3
Ed Garvey
TAMS Consultants
300 Broadacres Drive
Bloomfield, NJ 07003
973-338-6680
Fax: 973-338-1052
E-mail: egarvey@
tamsconsultants.com
Gerry Harris
UMDNJ-RWJMS
UMDNJ-EOSHI
Rutgers University
170 Frelinghuysen Road - Room 234
Piscataway, NJ 08855-1179
732-235-5069
E-mail: gharris@gpph.rutgers.edu
David Hohreiter
Senior Scientist
BBL
6723 Towpath Road
P.O. Box 66
Syracuse, NY 13214
315-446-9120
Fax:315-446-7485
E-mail: dh%bbl@mcimail.com
image:
Nancy Jafolla
Environmental Scientist
Hazardous Waste
Management Division
U.S. Environmental
Protection Agency
841 Chestnut Building (3H541)
Philadelphia, PA 19107
215-556-3324
E-mail; jafol!a.nancy@
epamailepa.gov
Alan Kao
Senior Science Advisor
ENVIRON Corporation
4350 North Fairfax Drive - Suite 300
Arlington, VA 22203
703-516-2308
Fax: 703-516-2393
Steve Knptt
Executive Director, Risi
Assessment Forum
Office of Research and Development
National Center for
Environmental Assessment
U.S. Environmental
Protection Agency
401 M Street, SW (8601-0)
Washington, DC 20460
202-564-3359
Fax: 202-565-0062
E-mail:
knottiteve@epamail.epa.gov
. * ,.ii, , • ii ; '' " ' '
Stephen Kroner
Environmental Scientist
Office of Solid Waste
U.S. Environmental
Protection Agency
401 !i Street; SW (5307W)
Washington, DC 20460
703-308-0468
E-mail: kroner.stephen@
epamaH.epa.gov
Anne LeHuray
Regional Risk Assessment Lead
Foster-Wheeler
Er)vir<3n|Tienta| Corporation
8100 Professional Place - Suite 308
Lanham, MD 20785
301-429-2116
Fax:301-429-2111
E-maJ; alehuray@fwenc.com
Toby Levin
Attorney
Advertising Practices
Federal Trade Commission
601 Pennsylvania Avenue, NW
Suite 4110
Washington, DC 20852
202-326-3156
Fax:202-326-3259
Lawrence Myers
Statistician
Research Triangle Institute
P.O. Box12194
Research Triangle Park, NC 27709
919-541-6932
Fax: 919-541-5966
E-mail: Iem@rti.org
Marian Olsen . '., ', ' ".' ' '""",
Environmental Scientist
Technical Support Section
Program Support Branch
Emergency and Remedial
Response Division
U.S. Environmental Protection Agency
290 Broadway
New York, NY 10007
212-637-4313
Fax:212-637-4360
E-mail:
olsen.marion@epamail.epa.gov
Lenwood Owens
President
Boiler Servicing
1 Laguardia Road
Chester, NY 10918
Zubair Saleem
Office of Solid Waste
U-S. Environmental Protection Agency
4Q1 M Street, SW (5307W)
Washington, DC 20460
703-308-0467
Fax: 703-308-0511
E-mail: saleem.zubair©
epamail.epa.gov
SwatiTappin
Research Scientist
New Jersey Department of
Environmgntaj Protection
401 East State Street
P.O. Box 413
Trenton, NJ 08625
609-633-1348
Joan Tell
Senior Environmental Scientist
Exxon Biomedical
Mettlers Road (CN 2350)
East Millston, NJ 08875
732-873-6304
Fax:732-873-6009
E-mail: joan.tell@exxon.sprint.com
Bill Wood
Executive Director, Risk
Assessment Forum
Office of Research and Development
National Center for
Environmental Assessment
U.S. Environmental
Protection Agency
401 M Street, SW (8601-D)
Washington, DC 20460
202-564-3358
Fax:202-565-0062
E-mail: wood.bill@epamail.epa.gov
B-4
image:
APPENDIX C
AGENDA
image:
r
image:
&EPA
United States
Environmental Protection Agency
Risk Assessment Forum
Workshop on Selecting Input
Distributions for Probabilistic Assessment
U.S. Environmental Protection Agency
New York, NY
April 21-22, 1998
Agenda
Workshop Chair:
Christopher Frey
North Carolina State University
TUESDAY, APRIL 21, 1998
8:OOAM Registration/Check-ln
9:OOAM
9:10AM
9:30AM
9:45AM
10:OOAM
10:15AM
10:30AM
10:45AM
11:OOAM
12:OOPM
Welcome Remarks
Representative from Region 2, U.S. Environmental Protection Agency (U.S. EPA), New York, NY
Overview and Background
Steve Knott, U.S. EPA, Office of Research and Development (ORD), Risk Assessment Forum,
Washington, DC
Workshop Structure and Objectives
Christopher Frey, Workshop Chair
Introduction of Invited Experts
Presentation: Issue Paper #1 - Evaluating Representativeness of Exposure
Factors Data
Jacqueline Moya, U.S. EPA, National Center for Environmental Assessment (NCEA),
Washington, DC
Presentation: Issue Paper #2 - Empirical Distribution Functions and Non-
Parametric Simulation
Tim Barry, U.S. EPA, NCEA, Washington, DC
BREAK
Charge to the Panel
Christopher Frey, Workshop Chair
Discussion on Issue #1: Representativeness
LUNCH
(over)
C-l
image:
TUESDAY, APRIL 21, 1998 (continued)
1;30PM Discussion on Issue #1 Continues
3:OOPM BREAK
3:15PM Discussion,on Issue #1 Continues _
/Christopher Frey, Workshop Chair
4:15Piyl Obseryer Comments
4-.45PM Review of Charge for Day Two
Christopher Frey, Workshop Chair
-. Writing Assignments
5:OOPM ADJOURN
WEDNESDAY, APRIL 22, 1998
8:30AM Planning and Logistics
Christopher Frey, Workshop Chair
8:40AM Summary of Discussion on Issue #1
10:OOAM BREAK
10:15AM Discussion on Issue #2: Empirical Distribution Functions and Resembling
Versus Parametric Distributions
12;OOPM LUNCH
1:30PM Discussion on Issue #2 Continues
'.<!, ' ill ' ' '" „ ' ' ' ' •
3:OOPM BREAK
3:f5PM Summary of Discussion on Issue #2
Christopher Frey, Workshop Chair
- Writing Assignments/Session
4-.15PM Observer Comments
4:45PM Closing Remarks
5:OQPM ADJOURN
C-2
image:
APPENDIX D
WORKSHOP CHARGE
image:
Jl,
image:
Workshop on Selecting Input Distributions for
Probabilistic Assessment
U.S. Environmental Protection Agency
New York, NY
April 21-22, 1998
Charge to Experts/Discussion Issues
This workshop is being held to discuss issues associated with the selection of probability
distributions to represent exposure factors in a probabilistic risk assessment. The workshop
discussions will focus on generic technical issues applicable to any exposure data. It is not the
intent of this workshop to formulate decisions specific to any particular exposure factors. Rather,
the goal of the workshop is to capture a discussion of generic issues that will be informative to
Agency assessors working with a variety of exposure data.
On May 15, 1997, the U.S. Environmental Protection Agency (EPA) Deputy
Administrator signed the Agency' s "Policy for Use of Probabilistic Analysis in Risk
Assessment." This policy establishes the Agency's position that "such probabilistic analysis
techniques as Monte Carlo Analysis, given adequate supporting data and credible assumptions,
can be viable statistical tools for analyzing variability and uncertainty in risk assessments." The
policy also identifies several implementation activities designed to assist Agency assessors with
their review and preparation of probabilistic assessments. These activities include a commitment
by the EPA Risk Assessment Forum (RAF) to organize workshops or colloquia to facilitate the
development of distributions for exposure factors.
In the summer of 1997, a technical panel, convened under the auspices of the RAF, began
work on a framework for selecting input distributions for use in Monte Carlo analyses. The
framework emphasized parametric methods and was organized around three fundamental
activities: selecting candidate theoretical distributions, estimating the parameters of the
candidate distributions, and evaluating the quality of the fit of the candidate distributions. In
September of 1997, input on the framework was sought from a 12 member panel of experts from
outside of the EPA. The recommendations of this panel include:
expanding the framework's discussion of exploratory data analysis and graphical methods
for assessing the quality of fit,
• discussing distinctions between variability and uncertainty and their implications,
• discussing empirical distributions and bootstrapping,
discussing correlation and its implications,
making the framework available to the risk assessment community as soon as possible.
D-l
image:
Subsequent to receiving tibis input, some changes were made to the framework and it was
applied to selecting distributions for three exposure factors: water intake per body weight,
inhalation rate, and residence time. The results of this work are presented in the attached report
entitled "Development of Statistical Distributions for Exposure Factors."
Applying the framework to the three exposure factors highlighted several issues. These
issues resolved into two broad categories: issues associated with the representativeness of the
d|ta, and issues associated with using the empirical distribution function (or resampling
techniques) versus using a theoretical parametric distribution function. Summaries for these
issues are presented in the attached issue papers. These issues will be the focal point for
"i" 'i , N,:.ii r , , ,„ „ •*;„,, -\ „ , „, ,, , ,
discussions during this workshop. The following questions are intended to help structure and
guide these discussions. In addressing these questions, workshop participants are asked to
consider: what do we know today that can be applied to answering the question or providing
additional guidance on me topic; what short term studies (e.g., numerical experiments) could be
conducted to answer the question or provide additional guidance; and what longer term research
may be needed to answer the question or provide additional guidance.
1 ,/ „„ ,» • :i; is . :•.,„ B' iiMi "... . i . ' -. 'i,, » " i ,",',•' ; • , , ,, ' : . r ,„ ,, • . •
Representativeness (Issues Paper #1)
h ,,, "' 111!, <h
1) The Issue Paper
rf • ,i "; " JUNE: r
Checklists I through IV in the issue paper present a framework for characterizing and evaluating
the representativeness of exposure data. This framework is organized into three broad sets of
questions: questions related to differences in populations, questions related to differences in
spatial coverage and scale, .and questions related to differences in temporal scale. Do these issues
cover the most important considerations for representativeness? Are the lists of questions
associated with each issue complete? If not, what questions should be added?
In a tiered approach to risk assessment (e.g., a progression from simpler screening level
assessments to more complex assessments), how might the framework be tailored to each tier?
;,jj|; ,j. ' . , ' ' , ' n , " ; MJjljjjij , „ * ' -^ ,„ <? ^ ,„, ,
For example, is there a subset of questions that adequately addresses our concerns about
representativeness for a screening level risk assessment?
2) Sensitivity
The framework asks how important are (or how sensitive is the analysis to) population, spatial,
atid temporal differences between the sample (for which you have the data) and the population of
interest. For example, to what extent do these differences affect our estimates of the mean and
variance of the population and what is the magnitude and direction of these effects?
' .i1";;,! . • , ' , : " ••. '• '• " • ' ' - "' ','' ' '' •• i '' , '" ' ' .•• '
^Vhat guidance can be provided to help answer these questions? What sources of information
exist to help with these questions? Having answered these questions what are the implications
for the use of the ..'data (e.g., use of the data may be restricted to screening level assessments in
D-2
.•"•:: +i&l
image:
certain circumstances)? What differences could be considered critical (i.e., what differences
could lead to the conclusion that the assessment can't be done without the collection of
additional information)?
3) Adjustments
The framework asks, is there a reasonable way of adjusting or extrapolating from the sample (for
which you have data) to the population of interest in terms of the population, spatial, and
temporal characteristics? If so, what methods should be used? Is there adequate information
available to implement these methods?
What guidance can be provided to help answer these questions? Can exemplary methods for
making adjustments be proposed? What sources of information exist to help with these
questions? What research could address some of these issues?
Section 5 of the issue paper on representativeness describes methods for adjustments to account
for differences in population and temporal scales. What other methods exist? What methods are
available for spatial scales? Are there short-term studies that can be done to develop these
methods further? Are there data available to develop these methods further? Are there
numerical experiments (e.g., simulations) that can be done to explore these methods further?
Empirical Distribution Functions and Resampling Versus Parametric Distributions
(Issues Paper #2^
1) Selecting the EDF or PDF
What are the primary considerations for assessors in choosing between the use of theoretical
parametric distribution functions (PDFs) and empirical distribution functions (EDFs) to represent
an exposure factor? Do the advantages of one method significantly outweigh the advantages of
the other? Is the choice inherently one of preference? Are there situations in which one method
is clearly preferred over the other? Are there circumstances in which either method of
representation should not be used?
2) Goodness of Fit
On what basis should it be decided whether or not a data set is adequately represented by a fitted
analytic distribution? What role should the goodness-of-fit test statistic play (e.g., chi-square,
Kolmogorov-Smirnov, Anderson-Darling, Cramer-von Mises, etc.)? How should the level of
significance, i.e., p-value, of the goodness of fit statistic be chosen? What are the implications or
consequences for exposure assessors when acceptance/rejection is dependent on the goodness of
fit statistic chosen and an arbitrary level of statistical significance? What role should graphical
examination of the quality of fit play in the decision as to whether a fit is acceptable or not?
D-3
image:
When the only data readily available are summary statistics (e.g., selected percentiles, mean, and
variance), are fits to analytic distributions based on those summary statistics acceptable? Should
any limitations or restrictions be placed in these situations?
\\&n the better knpwn theoretical distributions (e.g., lognormal, gamma, Weibull, log-logistic,
etc.) cannot provide an acceptable fit to a particular set of data, is there value in testing the fit of
the more flexible generalized distributions (e.g., the generalized gamma and generalized F
distributions) even though they are considerably more complicated and difficult to work with?
: „ ' • .ii"1 • ' : ,'i • " • : "
3) Uncertainty
Are there preferred methods for assessing uncertainty in the fitted parameters (e.g., methods
based on maximum likelihood and asymptotic normality, bootstrapping, etc.)?
D-4
image:
APPENDIX E
BREAKOUT SESSION NOTES
image:
'"i'j'i
'''I
•;ii
image:
APPENDIX E
SMALL GROUP DISCUSSIONS/BRAINWRITING SESSIONS
During the workshop, the experts worked at times in smaller groups to discuss specific technical
questions. Some of these sessions involved open discussions. Other sessions involved "brainwriting,"
during which individuals captured then- thoughts on paper, in sequence, and then discussed similar and/or
opposing views within each group. The outcomes of these sessions were captured by group rapporteurs
and individual group members and are summarized below. This summary represents a transcription of
handwritten notes and are, as such, considered rough working notes. Information from these smaller
group discussions was presented and deliberated in the plenary session, and partially forms the basis of
the points presented in the main text of this report.
What information is required to fully specify a problem definition ?
Population at risk
Sample under study (include biases)
Spatial extent of exposure—micro, meso, macro scale
Exposure-dose relationship
Dose-response-risk relationship
Temporal extent (hours, days, months, years)
Temporal variability about trend
What is the "acceptable error"?
— yes/no
— categorization
— continuous
— quantitative
• Variability/uncertainty partitioning
— not needed
— desirable
— mandatory .
• User of output
scientific community
— regulatory community
general public
One expert noted that the "previous problem definition" forces the blurring of the boundaries between
modeling and problem description—for example, many may not consider the dose-exposure-risk
relationship to be part of the problem definition.
Another expert asked, "How much information do we have to translate from measured value to
population of concern?" He described the population of concern, surrogate population, individuals
sampled from the surrogate population, and how well measured value represents true value. Another
agreed, emphasizing the importance of temporal, spatial, and temporal-spatial representativeness (e.g.,
Idaho potatoes versus Maine potatoes).
E-l
image:
Other issues in problem definition include:
* In the context of environmental remediation, a problem is defined in terms of what level of
residual risk can be left on the site. The degree of representativeness needed is dependent on the
land use scenario.
this might dictate limits on future land use and the need for evaluation.
A problem needs to be specified in space (location), time (over what duration), and whom
(person or unit). Some of these definitions may be concrete (e.g^, in terms of spatial locations
around a site) while some may be more vague, such as persons who live on a brownfield site
(which may change over time with mobility, new land use, etc.). The problem addresses a future
context, and must therefore be linked to observable data by a model/set of assumptions. The
problem definition should include these models (no population change over time) or assumptions
(exposure calculated over 50- year duration/time frame).
One must define the health outcome being targeted (e.g., acute vs. cancer vs. developmental).
Define how you will link the exposure measure to a model for hazard and/or risk (margin of
exposure has different data needs from an estimate of population risk). Also, one should
consider the type of observation being evaluated (blood measurements vs. dietary vs. ecological).
Tli is is more likely to have an impact on the representativeness of the data sample than anything
else.
" ' ' "i,1 '• - , ' ' • ,' • • i, "' ,i :,. •
Define the target risk level; this will dictate what kind of data will be necessary.
Xnother panelist agreed these are important points but questioned, however, whether these
factors were part of problem definition.
Specify the scope and purpose of the assessment (e.g., regulatory decision, set cleanup standards,
etc.)
Determining how much error we are willing to live with will determine how representative the
data are.
Specify the population of concern (who they are, where they live, what kinds of activities they
are involved with).
Problem definition is the most critical part of the process, and all stakeholders should be
involved as much as possible. If the stakeholders come to a common understanding of the
objectives of the process, the situation becomes focused.
Although EPA has provided much guidance for problem definition (DQOs, DQAs, etc.), what
data are necessary (and to what extent it must be representative) is a function of each individual
problem. Certain basic questions are common to all problem definitions (who, what, when,
E-2
image:
how); the degree to which each basic question is important is a function of the actual
problem/situation.
Decision performance requirements: What is acceptable at a specific site for a specific problem
(i.e., what is the degree of decision error)? An answer to this question should be decided up
front as much as possible to alleviate "bias" concerns.
Attributes of the exposed population are key issues:
— Who are they?
— What are their activities/behaviors?
—• Where are they?
— When do they engage in activities and for how long?
— Why are certain activities performed?
The potential imprecision of "national" populations seems significant. Scale is important; maybe
regional is as large as it gets.
If representativeness is a property of the population, then we should focus on methods for
collecting more specific data.
Variability within a super-population (e.g., a national study) provides useful, quantifiable bounds
to potential bias and gives an upper bound on the variability that could be found in a
subpopulation. This suggests that there are quantitative ways to guide the use "reduce
sparingly."
The assessor needs to ask the following questions: Is a risk assessment necessary? What is the
level of detail needed for the decision at hand? What is the scope of the problem? For example,
-T- Who is at risk?
— Who has standing [e.g., stakeholders]?
— Who has special concerns?
— What is of concern?
— When are people exposed? (timeframe [frequency and duration], chronic vs. acute, level
of time steps needed)
— Where are people exposed—spatial considerations; scope of the problem (national,
regional, site?)
— How are people exposed?
The time step used in the model must be specified. The assessor must distinguish between
distribution needed for a one-day time step as compared to a one-year time step. Some models
may run at different time steps (e.g., drinking water at a one-week time step to include seasonal
variation; body weight at a one-year time step to include growth of a child.)
Consideration of a tiered approach is important in problem formulation. How are data to be
used? If data are to be used in a screening manner, then conservativeness is even more important
than representativeness. If more than a screening assessment is proposed, the assessor should
E-3
image:
Jffl , , i , ' ' ' l,n; lr n 'il " I'
consider what is the value added from more complex analyses (site-specific data collection,
modeling, etc.).
I. I „ "'„, ,i i 'I». ,. • '•'':, • ' ' ''
As probabilistic methods continue to be developed, it will become increasingly important to
specify constraints in distribution. Boundaries exist. For example, no person can eat multiple
food groups at the 95th percentile.
Two panelists noted that tiered approaches would not change the problem definition. Generally,
the problem is: Under an agreed set of exposure conditions, will the population of concern
experience unacceptable risks? This question would not change with a more or less sophisticated
(tiered) assessment.
'''Hill , • ' ' • , I, '11 I, ' 'I" ,' , ' , ' ,. „, •« , • " , , i "
"V^hen evaluating unknown future population characteristics, we are dealing with essentially
unknown conditions. It is not feasible, therefore, to have as a criterion that additional
information will not significantly change the outcome of the analysis. Instead, the problem needs
to be defined in terms of a precise definition of population (in time and space) which is to be
protected. To the extent that this is uncertain, it needs to be defined in a generalized, generic
manner.
Considerations of the "external" representativeness of the data to the population of concern is
absolutely critical for "on the ground" risk assessments. The "internal" validity of the data is
often a statistical question. It seems more important to ensure that the outcome of the assessment
will not change based on the consideration of "external" representativeness of the data set to the
population of concern.
What constitutes (lack of) representativeness?
General
The issue of data representativeness begs the question "representative of what?" In many (most?) cases,
we are Working backwards, using data in hand for purposes that may or may not be directly related to the
rfason the data were collected in the first place. Ideally, we would have a well-posed assessment
problem with well-defined assessment endpoihts. From that starting point, we would collect the relevant
data necessary for good statistical characterization of the key exposure factors.
More generally, we are faced with the question, "Can 1 use these data in my analysis?" To make that
judgment fairly, we would have to go through a series of questions related to the data itself and to the use
we intend to make of the data. We usually ignore many of these questions, either explicitly or implicitly.
The following is an attempt at listing the issues that ought to affect our judgment of data relevance.
Sources of Variability and tJncertainty Related to the Assessment of Data Representativeness
EPA policy sets the standard that risk assessors should seek to characterize central tendency and
plausible upper bounds on both individual risk and population risk for the overall target population as
Well as for sensitive subpopulations. To this extent, data representativeness cannot be separated from the
assessment endpoint(s). the following outlines some of the key elements affecting data
febfeSehtativeness. The elements are not mutually exclusive.
I. *• ' . ill. ' . MI!, < I ' . • 'I .. ' „''!!" "I' ' , , ' ,, ', 'I 1 i, " . „ , , ,,
E-4
image:
Exposed Population
general target population
particular ethnic group
known sensitive subgroup (children, elderly, asthmatics, etc.)
occupational group (applicators, etc.)
age group (infant, child, teen, adult, whole life)
sex
activity group (sport fishermen, subsistence fishermen, etc.)
Geographic Scale, Location
trends (stationary, non-stationary behaviors)
past, present, future exposures
lifetime exposures
less-than-lifetime exposures (hourly, daily, weekly, annually, etc.)
temporal characteristics of source(s), continuous, intermittent, periodic, concentrated (spike),
random
Exposure Route
inhalation
ingestion (direct, indirect)
dermal (direct) contact (by activity, e.g., swimming)
multiple pathways
Exposure/Risk Assessment Endpoint
cancer risk
non-cancer risk (margin of exposure, hazard index)
potential dose, applied dose, internal dose, biologically effective dose
risk statistic
mean, uncertainty percentile of mean
percentile of a distribution (e.g., 95th percentile risk)
uncertainty percentile of variability percentile (upper credibility limit on 95th percentile risk)
plausible worst case, uncertainty percentile of plausible worst case
Data Quality Issues
direct measurement, indirect measurement (surrogates)
modeling uncertainties
measurement error (accuracy, precision, bias)
sampling error (sample size, non-randomness, independence)
monitoring issues (short-term, long-term, stationary, mobile)
• Almost all data used in risk assessment is not representative in one or more ways. What is
important is the effect the lack of representativeness has on the risk assessment in question. If
the water pathway, for example, is of minor concern, it will not matter if the water-consumption
rate distribution is not representative.
A lack of representativeness could mean the risk assessment results fail to be protective of public
health or grossly overestimate risks.
E-5
image:
The Issue Paper is helpful in describing the ways in which distributions can be nonrepresentative.
It can guide the selection of the input distributions.
Representativeness needs to be considered in the context of the decision performance
requirements. Factors that could have a major impact in terms of one problem/site need not have
the same impact across all problems/sites. Decision performance requirements should therefore
be considered with problem-site-specific goals and objectives factored into the process.
The definition of representativeness depends on how much error we are willing to live with.
What is "good enough" will be case specific. Going through some case studies using
assessments done for different purposes can shed some light on defining representativeness.
"With regard to exposure factors, we [EPA] need to do a better job at specifying or providing
better guidance on how to use the data that are available." For example, the soil ingestion data
for children are limited, but may be good enough to provide an estimate of a mean. The data are
not good enough to support a distribution or a good estimate of a high-end value.
Representativeness measures the degree to which a sample of values for a given endpoint
accurately and precisely (adequately) describes the value(s) of that endpoint likely to be seen in a
target population.
A number of issues relate to the lack of representativeness which one can use to decide upon use
of a sample in a given case: The context of the observation is important. In addition to those
mentioned in the Issues Paper (demographic, technical, social), other concerns include what is
being measured: environmental sample (water, air, soil) versus human recall (diet) versus tissue
samples in humans (e.g., blood). In most cases, provided good demographic and social
information is available on key issues associated with the exposure, adjustment can be made to
make a sample representative for a new population, technical issues sometimes must be
"guessed" from one sample to another (key issues like different or poor analytic techniques,
altered consumption rates, etc.).
A sample should not be used if it is flawed due to one of the following factors:
1) inappropriate methods (sample design and technical methods)
2) lacjc of descriptors (demographic, technical, social) to make adjustments
3) inadequate size for target measure
The above applies to the internal analysis of a sample. Human recall includes behavioral
activities (e.g., time spent outdoors or indoors, number of days away from site).
Identifying differences (as defined by the final objective) between characteristics of the subject
population and the surrogate population will generally be subjective because there is usually no
data for the subject population. Differences might be due to socioeconomic differences, race, or
climate. Lack of representativeness should not be "too rigid" partly due to uncertainties and
partly because the subject population usually includes a future population that is even less well
defined than the current population.
E-6
image:
The surrogate population may overlap (as in age/sex distribution) with the target population. A
context is needed to determine what constitutes "lack of representativeness." For example, if soil
ingestion is not related to gender, then while the surrogate population may be all female, it may
not imply that the estimates from the surrogate population cannot be used for a target population
(including males and females). Bottom line: the factor being represented (such as gender) needs
to be related to the outcome (soil ingestion) before the non-representativeness is important. Lack
of representativeness "depends" in this sense on the association.
Another panelist expanded on the above, noting that the outcome determines the
representativeness of the surrogate data set. If in the eyes of the "beholder" the data are
"equivalent" they represent the actual population well. Defining.representativeness is like
defining art. One cannot describe it well; it is easily recognized but recognition is observer-
dependent. We should strive to remove subjectivity as best as possible without making inflexible
choices.
Representativeness suggests that our exposure/risk model results are a reasonable approximation
of reality. At minimum, they pass a straight-face test. Representativeness could therefore be
assessed via model calibrations and validation.
Representativeness often cannot be addressed unless an expert-judgment-based approach is used.
It requires brainstorming based upon some knowledge of how the target population may differ
from the surrogate one. In the long run, collection of more data is needed to reduce the non-
representativeness of those distributions upon which decisions are based.
Define the characteristics to be examined, define the population to be evaluated, select a
statistically significant sample that reflects defined characteristics of the population (another
expert noted that statistical significance has little relevance to the problem of '
representativeness—the issue is the degree of uncertainty or bias). Ensure randomness of a
sample to capture the entire range of population characteristics. (Another noted that the problem
is that we usually don't have such a sample but have to make a decision or take action now. If we
can quantitatively evaluate representativeness, then we can at least make objective
determinations of whether this lack of representativeness will materially affect the decisions.)
The degree of bias that exists between a data set or sample and the problem at hand—is the
sample even relevant to the problem? Types:
Scenario: Is a "future residential" scenario appropriate to the problem at hand?
Model: Is a multiplicative, independent-variable model appropriate?
Variables: Is a particular study appropriate to the problem? Is it biased? Uncertain?
Two experts agreed that statistical significance has little relevance to the problem of
representativeness. A well-designed controlled randomized study yielding two results can be
"representative" of the mean and dispersion, albeit highly imprecise.
E-7
image:
Representativeness exists when the data sample is drawn at random from the population
(including temporal and spatial characteristics) of concern, or is a census in the absence of
ifieaSurerngit error. This condition is potentially lacking when using surrogate data that are for a
tabulation that differs in any way from the population of concern. Important differences include:
^ v^ , ' ' iini ' , " ' ill , . - - • „ " '";,,,i » i ",
' . I i ;;: •. j«l . ! '. „',;• ,. . i '"-"!;' ", ^ '• , >.'.• : , .:,' '',!.•' ! I (>!,'. .: -. ' • :,••• ' . .
,-r-r chafacteristics of individuals (e.g., age, sex, etc.)
^ geographic locations
,—, '"" averaging time'
^ dynamics of population characteristics over the time frame needed in the study
-^- large measurement errors
Npn-representativeness poses a problem if we have biases in any statistical interest (i.e., lack of
representativeness can lead to biases in the mean, standard deviation, 95th percentile, etc).
Bias, or lack of accuracy, is typically more important than lack of precision. For example, we
can expect some imprecision in our estimate of the 95th percentile of a population characteristic
(e.g., intake rate) due to lack of relevant "census" data, but we hope that on average our
assessment methods do not produce a bias or systematic error.
Conversely, if we have a large amount of uncertainty in our estimates for a sample distribution,
then it is harder to claim non-representativeness than when a particular distribution for a
surrogate is estimated.
In the following example, the distribution for the surrogate population is non-representative of
the target population since it has too wide a variance. However, the uncertainty in the surrogate
encompasses outcomes which could include the target population. Thus, in this case it may be
difficult to conclude, based upon the wide range of uncertainty, that the surrogate is non-
representative.
i ••!
Distribution for
Target Population
Nominal Distribution for
Surrogate Population
Range of Uncertainty on
Surrogate Population Distributio
Due to Measurement Error,
Small Sample Size, etc.
E-8
image:
Representativeness in a given exposure variable is determined by how well a given data set
reflects the characteristics of the population of concern. Known characteristics of the data that
distinguish the data set from the population of concern may indicate a need for adjustment.
Areas of ignorance regarding the data set and the population of concern should be considered
uncertainties. Representativeness or lack thereof should be determined in a brainstorming
session among stakeholders, lexicologists, statisticians, engineers, and others may all have
information that bears on the representativeness of the data. Known or suspected difference
between the data set and the population of concern diminish representativeness.
The question as to what constitutes representativeness is contingent on the problem
definition—that is, who is to be represented, at what point in time, etc. If the goal is to represent
a well-characterized population in the present, representativeness for a given parameter (e.g.,
drinking water consumption) should be evaluated based on the match of the surrogate data to the
data for the population of concern relative to key correlates of the parameter (e.g., for drinking
water volume, age, average ambient temperature, etc.). If, on the other hand, the population of
concern is not well characterized in the present, or if the intent of the risk assessment is to
address risk into the indefinite future, representativeness does not appear to have a clear
meaning. The goal in such cases should be to define reasonable screening characteristics of a
population at an indefinite point in time (e.g., maximum value, minimum value, estimated 10th
percentile, estimated 90th percentile) and select such values from a semi-quantitative analysis of
the available surrogate data.
A representative surrogate sample is one that adds information to the assessment beyond the
current state of knowledge. However, both the degree to which it adds information and the
remaining uncertainty in the risk characterization must be identified.
Suggestion: Replace the word representative with "useful and informative."
A data set is representative of a characteristic of the population if it can be shown that
differences between the data set and the population of concern will not change the outcome of
the assessment. In practice, a data set should be considered in terms of its similarity and
difference to the population of concern and expectations as to how the differences might change
the outcome. Of course, these expectations may lead to adjustments in the data set which would
make it potentially more representative of the population.
In part, what degree of comfort the risk assessor/reviewer needs to have for the population under
consideration determines how representative data have to be. Also of concern is where in the
population of concern observations will take place. Are we comparing data mean or tails
(outliers)? What degree of uncertainty and variability between the population of concern and the
surrogate data is the assessor willing to live with?
We may be using the term "representativeness" too broadly. Many of the issues seem to address
the "validity" of the study being evaluated. However, keeping with the broad definition, the
following apply to internal representativeness:
E-9
image:
— Measurement reliability. Measurement reliability refers whether the study correctly
measures what it set out to measure and provides some basis for evaluating the error in
1,111 •' ' * • „• < '.. -. :, • "' ' . V i , ....Vi. , ..•. . ....... ... „
measurement.
„ ' ' ' ' ' '„ ' . ' ' '.I" ' "" ' ' ,' ' , '" '" I'll"' • ' ' •
-p- Bias in sampling. Bias in sampling presupposes that there is a "population" that was
sampled and not just a haphazard collection of observations and measurements.
Statistical sampling error.
\\ in ,„. I,, >i,i • ''i ' , .'". .... ! .;; , ' • . • . .
The following issues apply to external representativeness:
— Di9 the study measure what we need to know (e.g., short-term vs. long-term studies). If
there is a statistical procedure for translating measurements into an estimate of the
'I"!' "'jl"'"!T!"J|][" ' i !,, III/!"'.' |h| nif ,|i| l|l||lli'!'l*| |i , ,, ,,[, f, ,m|. , ',: i , n''„,!," i , iiih' •' ,,„ • • ,.
needed values, the validity and errors involved must be considered.
—'. "Representativeness" implies that the sample data is appropriate to another population in
an assessment.
:u i"!
considerations should'be included in, added to, or excluded from the checklists?
Expand to include other populations of concern (e.g., ecological, produce). The issue paper and
checklist seem to presuppose that the population of concern is the human population.
Include more discussion on criteria for determining if-question is adequately and appropriately
answered.
Clarify definitions (e.g., internal versus external)
Include "worked" examples:
,: T-T- Superfund-type risk assessment
— Source-exposure-dose-effect-risk example
'" :— Include effect of bias, misclassification, and other problems
Ask if factors are known or suspected of being associated with the outcome measured? Was the
distribution of factors known or suspected to be associated with the outcome spanned by the
sample data? Focus on outcome of risk assessments (if populations are different, does it make
any real difference in the outcome of the assessment?).
i i ".....I,
How will the exposures be used in risk assessment? For example, is the sample representative
enough to bound the risk?
In judging the quality of a sample, especially with questionnaire-based data, determine whether a
consistency check was put in the forms and the degree to which individual samples are
consistent. Risk assessors must be able to review the survey instrument.
£-,10
i , if :.,. i
ill*,. ,!,
I III
image:
Internal and external lists may each need some reorganization (for example, measurement issues
vs. statistical bias and sampling issues for "internal;" extrapolation to a different population vs.
reanalysis/reinterpretation of measurement data for "external").
Is a good set of subject descriptors (covariates such as age, ethnicity, income, education, or other
factors that can affect behavior or response) available for both the population sampled and
population of concern to allow for correlations and adjustments based on these?
How valuable would some new or additional data collection be for the population of concern to
confirm the degree of representativeness of the surrogate population and better identify and
estimate the adjustment procedure?
What is the endpoint of concern and what decision will be based on the information that is
gathered? Since risk assessment involves a tiered approach, checklist should focus around the
following type of question: Do I have enough information about population (type, space, time)
that allow answering the questions at this tier and is my information complete enough that I can
make a management decision? Do I need to go through all of the checklists before I can stop?
(Questioning application/implementation)
The checklists should address how much is known about the population of concern relative to the
adaptation of the surrogate data. If the population of concern is inadequately characterized, then
the ability to consider the representativeness of the surrogate data is limited, and meaningless
adjustment will result.
One consideration that is missing from the checklists is the fact that risk assessments are done for
a variety of purposes. A screening level assessment may not need the level of detail that the
checklists include. The checklists should be kept as simple and short as possible, trying to avoid
redundancy.
The checklist should be flexible enough to cover a variety of different problems and should be
only a guide on how to approach the problem. The more considerations included the better.
Guidance is needed on how to address overlap of the checklists. For example, when overlap
exists (e.g., in some spatial and temporal characteristics), which questions in the checklist are
critical? The guidance could use real life case studies to help focus the risk assessor on the
issues that are critical to representativeness.
Move from a linear checklist format to a flowchart/framework centered around the "critical"
elements of representativeness.
Fold in nature of tiered analysis. The requirements of a screening level assessment must be
different from those of a full-blown risk assessment.
Identify threshold (make or break) issues to the extent possible (i.e., minimum requirements).
When biases due to lack of representativeness are suspected, how can we judge which direction
those biases take (high or low?).
E-ll
image:
Include a "box" describing cases when "nonrepresentative" and "inadequate" will need to be
used in a risk assessment (which is common)....Figure 1?
; , •' 'i|i •'" , ;. •'' ',"'"; . ' ',; > '" - ": • • I- ; '• '• ' ' '... ' "
Define ambiguous terms, such as "reasonable" and "important."
•* .i " J,«.. •• ; " :.. ', : ' i ' • "' : ' -« ; • • ,'" •" :"!": '•;.-!' ;";, '.' ,'.."•. ::, • . 1'
Make checklist more than binary (yes, no)—allow for qualitative evaluation of data.
Key questions: Can data be used at all? If so, do we have a great deal of confidence in it or not?
Is data biased high or low? Can data be used in a quantitative, semi-quantitative, or only a
qualitative manner? Standards according to which checklist items are evaluated should be
consistent with stated objective (e.g., a screening assessment will require less stringent
evaluation of data set than a site assessment where community concerns or economic costs are
critical issues).
Allow for professional judgement and expert elicitation.
What are the representativeness decision criteria? Data only have to be good enough for the
problem at hand; there are no perfect data. List some considerations pertaining to the
acceptance?rejection criteria.
•t The 95th percentile of each input distribution is not needed to forecast.risk at the 95th percentile
with high accuracy and low uncertainty.
* What is the study population doing? (i.e., were the sample population and study population
engaged in similar activities?) Consider how their behavior affects ability to represent.
« Combine Checklists II, III, and IV into one.
I*1 *.s '', ', ' ?! 'kY- ' • : '-.••'. '"' '."'', • . i ''.•',"' ' '' '";'" ,:":' ' A'V " " '•' i '"' V'""' ' • 4
• Distinguish between marginal distributions vs. joint distributions vs. functional relationships.
* Distinguish variability from uncertainty. Add a crisp definition of each (e.g., Burmaster's
premeeting comments).
" Add explicit encouragement and positive incentives to collect and analyze new data.
« Add an explicit statement that the agency encourages the development and use of new methods
and that nothing in this guidance should be interpreted as blocking the use of alternative or new
methods.
« Add an explicit statement that it is always appropriate to combine information from several
studies to develop a distribution for an exposure factor. (This also applies to toxicology and the
development of distributions for reference doses and cancer slope factors.)
flow can one perform a sensitivity analysis to evaluate the implications of non-representativeness?
How (to we assess the importance of non-representativeness?
E-12
image:
The assessor should ask, "under a range of plausible adjustments from the surrogate population
to the population of concern, does (or can) the risk management decision change?" That is, do
these particular assumptions and their uncertainty matter? (among all others)
Representativeness is often not that important, because risk management decisions are usually
not designed to protect just the current population at a particular location, but a range of possible
target populations (e.g., future site or product users) under different possible scenarios.
Theoretically, we can come up with a "perfect" risk assessment in terms of representativeness,
but if the factor(s) being evaluated is not important, then the utility of this perfectly
representative data is limited. The important question to ask is: If one is wrong, what are the
consequences, and what difference do the decision errors make in the estimate of the parameter
being evaluated?
The question of data representativeness can be asked absent the context/model/parameter or it
can be asked in the context of a decision or analysis (are the data adequate?).
The key is placing bounds on the use of the data. Assessments should be put in context and the
level at which surrogate data may be representative. It should be defined in the context of the
purpose of the original study. Two other factors are critical: sensitivity and cost/resource
allocation. The question, therefore, is situation-specific.
A sensitivity analysis can be conducted in the context of the following tiered approach The
importance of a parameter (as evidenced by a sensitivity analysis) is determined first, making the
representativeness or non-representativeness of the non-sensitive parameters unimportant.
Representativeness is not a standard statistical term. Statistical terms that may be preferable
include bias and consistency.
When evaluating the importance of non-representativeness, one needs to evaluate the uncertainty
on the data set and on the individual. At the first level the assessor may choose a value biased
high (could be a point value or a distribution that is shifted up). At the second level, can use an
average, but must still be sensitive to whether acute or chronic effects are being evaluated. When
looking at the individual sample it is more important to have a representative sample because the
relevant data are in the tails (more important for acute toxicity). When using a mixture,
representativeness is less of a problem.
Adjustments
Take more human tissue samples to back calculate—-this makes local population happier.
Determine the need for cleanup based on tissue sample findings.
Re-do large samples (e.g., food consumptions, tapwater consumption).
Look at demographics, etc. and determine the most sensitive factor(s).
E-13
image:
'.ili!
SJIf
1
Given:
Model, Parameters
I
YES
Sensitivity
Analysis
Enough Data to
Characterize Paramete
Variability?
NO
YES
Representative of
Population?
NO
YES
Risk
Analysis?
Enough Data to Bounc
Parameter Estimate?
1 YES
1 :'. i T
Bounding
Estimate
t
Enough Data for
Sensitivity Analysis'
NO
»-
Collect More Data
If Possible
i
NO
i
i i
l
Adjustment
•ft
Use a general model. Discuss with stakeholders the degree of inclusion in general. Adjust the
model with survey data if it is not applicable to stakeholder. Use a special model for
Subpopulations if necessary.
"Change of support" analysis; time-series analysis — non-CERCLA, important to the Food
Quality Protection Act/
Conduct three-day surveys with year-long adjustments.
E-14
image:
Hypothesis methods will work, but need to be tested.
The group recommended holding a workshop for experts in related fields to share existing theory
and methods on adjustment (across fields).
General guidelines for adjustments will be acceptable, but often site-specific needs dictate what
adjustments must be made.
Example adjustment:
Fish consumption: If you collect data 3 days per week, you may miss those who might eat
less—a case of inter- versus intra-individual variability.
Adjustment is often difficult because of site specifics and evaluator bias or professional
judgement.
Sometimes it is not possible to adjust. Using an alternate surrogate data set makes it possible to
set some plausible bounds to perform a screening risk assessment.
Stratify data to see if any correlation exists.
Start with brainstorming.
Regression relationship versus threshold.
Covariance; good statistical power to sample population.
Correlation is equivalent to regression analysis as long as you keep the residual (Bayesian
presentation).
Instead of looking at the population, look at the individual (e.g., breathing rates or body weight
for individuals from ages 0 to 30) to establish correlations.
What if the population was misrepresented? For example, population of concern is sport
fishermen but the national data represent other types of fishermen.
Set up a hierarchy:
— do nothing (may fall out when bounded)
— conservative/plausible upper bound
— use simple model to adjust the data (may be worth the effort if credibility issues
are dealt with)
. — resample/collect more data
Before considering a bounding approach (model development), consider if refining is necessary
or cost/beneficial.
Are there situations in which "g-estimates" are worthwhile?
E-15
image:
•ft
What is gained by making adjustments?
Short-term studies overestimate variability because they do not account for interindividual
1 • 'i , ' • ' ' * 'H!'i|j: • ' 'i'" »J •.'. i " ' ' ' i " ., : , «i, ' ,i • „ ,, i'
variability (upper tail is overstated).
J' ilJ " ' : ' ' , / , ' ni ,;' ' "' ,|:: "',!! ' ; ','
Can we estimate the direction of biases when populations are mismatched?
• r rrJ| , : '":!" r.S ;,-: ; '- " ; '! ; ",; '•(• ; - ', ; „ .• .. . i
|f the fyias is conservative, then we are being protective. But what if the bias is nonconservative
(e.g., drinking water in the Mojave Desert or by construction workers)?
Appropriate models
Simplistic:
HOJV speculative? identify potential damage due to credibility issues.
Complex:
Identify the bias: high (conservative); or low (different scenario used than plausible
bounding analysis)?
. ,;,!"« '• .. :" '•, •= , .'.'.•''' ' 1 , '• • •(' I •'.''. : . ii::,. '' ;: ' . ' ; • '' :,
Unless one has a sense of the likelihood of the scenario, what does one do?
i>i i .1:i'us ' '' "; ''i'1 '". i ! ' :i''•'"' "' 'i ' ''„'': ' „
— Risk management can address it.
— Present qualitative statementsiabout uncertainty.
— Value of information approaches (e.g., does weather change drinking water data?).
Short-term Research:
Evaluate : short-term data set: make assumptions, devise models on population variability (Ryan paper)
(Wallace and Buclc). Look at behavior patterns, information biases. Flesh out Chris Portier's suggestion
oil extrapolating 3-day data to 6 months, years. This would give the assessor some confidence in
for interindividual variability.
'
Long-term Research:
Collect more data! Possible ORD funding? Look at breathing rates, soil ingestion, infrequently
bonsumed, items, frequently consumed items.
image:
APPENDIX F
PREMEETING COMMENTS
image:
'..: iii '':-:!
image:
Workshop on Selecting Input Distributions
for Probabilistic Assessments
Premeeting Comments
New York, New York
April 21-22, 1998
Compiled by:
Eastern Research Group, Inc.
110 Hartwell Avenue
Lexington, MA 02173
F-1
image:
Table of Contents
Reviewer Comments
'
Sjieila Abraham
Robert Biaisdelt
David Burmaster
Bruce Hope....
William Huber ,.
Robert Lee ..."
Samuel Morris .
P. Barry Ryan .".
Mitchell Small ...
idward Stanek .
Alan Stern ...".".
F-3
F-10
F-17
F-22
F-25
F-33
F-38
F-43
F-50
F-56
F-63
F-2
image:
Sheila Abraham
EPA Probability Workshop
COMMENTS ON THE ISSUE PAPERS / DISCUSSION ISSUES FOR THE EPA
WORKSHOP ON SELECTING INPUT DISTRIBUTIONS FOR PROBABILISTIC
ASSESSMENT
Probabilistic analysis techniques are, as stated in EPA's May 1997 "Guiding Principles
for Monte Carlo Analysis", viable tools in the risk assessment process provided they are
supported by adequate data and credible assumptions. In this context, the risk
assessor (or risk assessment reviewer) needs to be sensitive to the real-life implications
on the receptors of site-specific decisions based on the analysis of variability and
uncertainty. The focus should be on the site, in a holistic manner, and all components
of the risk assessment should be recognized as tools and techniques used to arrive at
appropriate site-specific decisions.
Preliminary (generalized) comments from a risk assessment perspective on the issue
papers are provided below, as requested.
Evaluating Representativeness of Exposure Factors Data (Issue Paper #1)
1) The Issue Paper (Framework/ Checklists):
Overall, the issue paper provides a structured framework for a systematic approach for
characterizing and evaluating the representativeness of exposure data. However, one
of the clarifications that could be provided (in the narrative, checklists and figure) relates
to the explicit delineation of the objectives of the exercise of evaluating data
representativeness. The purpose of the original study should also be evaluated in the
context of the population of concern. In other words, factoring the Data Quality
Objectives (DQOs) and the Data Quality Assessment (DQA) premises into the process
could help define decision performance requirements. It could also help to evaluate
sampling design performance over a wide range of possible outcomes, and address the
necessity for multi-staged assessment of representativeness. As stated in the DQA
F-3
image:
'•; • .'• • " • '•'$ ' :' • ';•' : " ' ••• •; •' J; ; " ,;';;' "i, • : • i; Sheila Abraham
p^t „.'•..'.. i ^.vi:' -. , . .. . ' - ' • \. ',.,<. . ii -. EPA" Probability Workshop
Guidance (1997), data quality (including representativeness) is meaningful only when it
relates to the intended use of the data.
On the query related to the tiered approach to ("forward") risk assessment; site-specific
screening risk assessments typically tend to be deterministic and have been conducted
using conservative default assumptions; the screening level tables provided by certain
U.S. EPA regions have to this point also been deterministic. Therefore the utility of the
checklists at this type of screening level might be extremely limited. As one progresses
through increasing levels of analytical sophistication, the screening numbers generated
from probabilistic assessment may require a subset of the checklists to be developed;
the specificity of the checklists should be a function of the critical exposure parameters
identified through a sensitivity analysis. Such analyses might also help refine the
protocol (criteria and hierarchy) for assessing data set representativeness in the event
of overlap of the individual, population and temporal characteristics (example, inhalation
activity in elementary school students in the Columbus area exposed to contaminants at
a school baiifield).
mi • »: 11:1 , . ','••, '"!' • ' , • i,. • »,,".,i" ' , • ,
fj Sensitivity:
The utility of a sensitivity analysis cannot be overemphasized. Currently, there appears
to be a 'tendency to use readily available software to generate these analyses; guidance
on this in the context of project/ site-specific risk assessments should be provided.
'fe'iil ! „ , i , "I, ' '' 'SHI| , In II" i1 ',- ' ,,.'". , ,r „ ', ,.», » ,|,i|i|||,,j • i , III ' , „ I1 i . „ ii •',:• •"•
Providing examples as done in the Region VIII guidance on Monte Carlo simulations
facilitates the process.
On the issue of representativeness in making inferences from a sample to a population
grid the ambiguity of the term "representative sample", process-driven selection might
be appropriate for homogenous populations, but for the risk assessor, sampling that
captures the characteristics of the population might be more relevant in the context of
F-4
'Si I,;:
image:
Sheila Abraham
EPA Probability Workshop
the use of the data. This issue appears to have been captured in the discussion on
attempting to improve representativeness.
Empirical Distribution Functions (EDFs) versus Parametric Distributions (PDFs)
(Issue Paper #2)
1) Selection of the Empirical Distribution Functions (EOF) or Parametric Distribution
Function (PDF):
The focus of the issue paper is the Empirical Distribution Function (EOF), and a number
of assumptions have been made to focus the discussion on EDFs. However, for a
clearer understanding of the issues and to facilitate the appropriate choice of analytical
approaches, a discussion of the PDF, specifically the advantages/ disadvantages and
constraining situations would be beneficial. The rationale for this is that the decision on
whether to apply the EOF or the PDF should not be a question of choice or even mutual
exclusivity, but a sequential process that is flexible enough to evaluate the merits and
demerits of both approaches in the context of the data.
In general, from a site/ project perspective, there may be definite advantages to PDFs
when the data are limited, provided the fit of the theoretical distribution to the data is
good, and there is a theoretical or mechanistic basis supporting the chosen parametric
distribution. The advantages to the PDF approach are more fully discussed in several
references (Law and Kelton 1991). These advantages need to be evaluated in a
project-specific context; they could include the compact representation of observations/
data, and the capacity to extrapolate beyond the range of observed data, as well as the
"smoothing out" of data. (In contrast, the disadvantages imposed by the possible
distortion of information in the fitting process should not be overlooked. Further, the
(traditional use of) EDFs that limit extrapolation beyond the extreme data points,
perhaps underestimating the probability of an extreme event, may need to be
considered. This is could be a handicap in certain situations, where the risk
F-5
image:
, i , • ' *y .•: , : : ' '•' ' ' '.. .''• •'- • Sheila Abraham
EPA Probability Workshop
assessment demands an interest in outlier values. In such situations, a fuller
discussion of alternate approaches such as a mixed-distribution (Bratelvef a/.. 1987)
may be warranted.) Finally, the PDFs, given their already established theoretical basis,
may lend themselves to more defensible and credible decision-making, particularly at
contentious sites_,
'"| ,NI'l!i ! ' , , !'• ' " "•''„'' i , '! ' '"' '
This predisposition to PDFs certainly does not preclude the evaluation of the EOF in the
process. The advantage accruing from having the data "speak" to the risk assessor/
:,:; . . 'Hi " V; •• ; : 'U " , • " ',..•,. : • - '• !- J. . : :• •" .' • ; > , '• '
reviewer should not be minimized. Depending on the project/ site involved, the benefits
Of the complete representation of data, the direct information provided on the shape of
the underlying distribution, and even on peculiarities such as outlier values should be
discussed, as well as relevant drawbacks (sensitivity to random occurrences, potential
underestimation of the probability of extreme events, perhaps cumbersome nature if the
data points are individually represented). In this context, some of the comments in the
"Issue/ Comments" Table ("issues" presumably derived from D'Agostino and Stephens,
1986) can serve as the basis for additional discussion.
2) Goodness of Fit:
The decision whether the data are adequately represented by a fitted theoretical
disjribu|iqn is an aggregative process, and goodness-of-fit is part of the sequential
exefeise. Preliminary assessments of the general families of distributions that appear
: ". ' • , ; YM" i ': 'V:. ; -,•;. .. >, • - ': • •• • - •';.v:i' ... '. .' . •.',
tp best match tie data (based on prior knowledge and exploratory data analysis) are
often conducted initially; the mechanistic process for choice of a distributional family,
the discrete/continuous and bounded/ unbounded nature of the variable are evaluated.
Surornary statistics, including measures of shape are evaluated and the parameters of
the (candidate) family are estimated. The goodness-of-fit statistics should factor into
the whole process, as should graphical comparisons of the fitted and empirical
distributions. Goodness-of-fit tests can be an excellent confirmatory tool for verifying
iF-6
image:
Sheila Abraham
EPA Probability Workshop
the chosen distribution, when used in conjunction with statistical measures and
probability plots.
However, caution should be exercised in situations where these tests could conceivably
lead an analyst to support a distribution that a visual inspection of the data does not
support. Also, it should be emphasized that (for example for certain physiological
parameters), even if the distribution fits, maintaining the integrity of the (biological) data
should override goodness-of-fit considerations. Ultimately, the persuasive power of
graphical methods for assessing fit should not be underestimated.
On the question how the level of significance of the goodness-of-fit statistic should be
chosen, this is often a function of the data quality assessment (DQA) for that particular
site or situation; an idea of the consequences in terms of real-life examples can be
gathered from EPA's Guidance for Data Quality Assessment (1997). On the whole, I
tend to agree with the respondent (#4) who states that the desired level of significance
should be determined prior to analyzing the data. Again, as the respondent states, if
minor differences in the p-value impinge substantially on the analysis, the "conclusions
are probably too evanescent to have much usefulness".
Summary statistics are useful, particularly in the initial characterization of the data (as
previously mentioned). Given the constraints imposed by the project/ site logistics, all
too often these are the only data available, and they have been used as the basis for
analytical distribution fits (Ohio EPA, 1996). Caution should be exercised in implying a
level of accuracy based on limited knowledge. Sensitivity analyses might help clarify
the limitations that need to be placed in such situations particularly when dealing with
an exposure parameter of considerable impact; further, the utility of such an exercise
for a parameter with minor impact (as revealed by the sensitivity analysis) could be
questionable.
F-7
image:
Sheila Abraham
EPA Probability Workshop
:,„ ' ir '''in' ' H ' i," ' , , '' •'<
On the question of the value of testing the fit of the more generalized distributions -
(presumably in lieu of the EOF), this could be an useful exercise, but the project
logistics may factor into this, as also the DQA premises. Project resources available
and the defensibility of the decision-making process need to be factored into the
situation. The issue of fitting an artificial distribution to a data set, and ultimately
arriving at a distribution removed from reality also needs to be evaluated in the project-
specific context.
3) Uncertainty:
f . ..... ,,11 ,. • , i ,,_ , . • , , . • ; ..... , ' ..• •:' , • ' / "'
The discussion !p "Development of Statistical Distributions for Exposure Factors"
(Research triangle Institute) paper is interesting in terms of the approaches suggested
for evaluating parameter uncertainty; Hattis and Burnmaster's comment cited in the
paper that only a trivial proportion of the overall uncertainty may be revealed is
important. Certain methods (example, bootstrapping) appear to have intriguing
potential for accounting for "hot spots".
111 '." ' •.' li'iL! ' '" , « , , . ••• , , ,. '„,•••'.,. ,, , ' , ii
Finally, the risk assessor/ reviewer needs to be aware that the analysis of variability and
uncertainty is a simulation, based on hypothetical receptors. However, as stated
initially, this sometimes academic exercise can have multi-million dollar implications,
and intimately affect real-life human and ecological receptors; the risk assessor/
reviewer should always be cognizant of this consequence.
References:
Bratelyi'P., B.L Fox, L.E. Schrage (1987) "A Guide to Simulation". Springer-Verlag,
New York.
D'Agostino, R.B. and M.B. Stevens (1986) "Goodness of Fit Techniques". Marcel
Deker.
F-8
image:
Sheila Abraham
EPA Probability Workshop
Law, A.M. and Kelton, W.D. (1991) "Simulation Modeling and Analysis" (Chapters,
325-419). McGraw-Hill, New York.
Ohio EPA (1996) "Support Document for the Development of Generic Numerical
Standards and Risk Assessment Procedures". The Voluntary Action Program, Division
of Emergency and Remedial Response, Ohio EPA.
U.S. EPA (1994) "Guidance for the Data Quality Objectives Process" (EPA/QA/G4).
EPA/600/R-96-055
U.S. EPA (1997) "Guidance for Data Quality Assessments - Practical Methods for Data
Analysis" (EPA QA/G-9, QA-97 Version) EPA/600/R-96/084 (January 1998)
F-9
image:
-IS!
ii'V ! •/' ' t ,„), "
.: •, •. I I*
Robert J. Blaisdell, Ph.D.
Comments on Issue Paper on Evaluating Representativeness of Exposure
Factors Data
1 , . '''III i ' ! „ '
The Issue Paper on Evaluating Representativeness of Exposure Factors Data is a well
written, clear discussion of the theoretical issues of representativeness. I was
particularly interested in the discussion of time unit differences, the (Sffice of
Environmental Health Hazard Assessment (OEHHA) is grappling with this issue with
Several of the distributions which we want to use for determining chronic exposure.
•• v „. ' •;, Hit! , ,1, " ' •. •', • ,''.,,'•..,•' • , •'....' ":„::, .'• ,';•'. ,„• ,,'i- , ' • •
' . • , '• i, I ijllifl • ' . i'!,'1"- ' I '•" ;," , '" ' i :,' ., V .''.'.I'1' " , /'•<(. " i ". '"I." ' .' • ' ' • ' , '.' , '•
The issue of representativeness of a sample is often complicated by lack of knowledge
•.':(i: ' . i"" . :' • ".' 11^ ; '-I- " .\' " •;,",:;„! . '• •• " " .;. : i; "^K •;'"! ' .. i ,. '"'• , j> . ; . •• * ,
about the demographics of the population under consideration. An accurate
•;1 ;..••'"> j I. ' .--iiS ' ' „ :.,' " , i •'..';,• : , ' ' • , •. ' "; • •.• ", •( ;:,:- ,! ';; '', .'.. ' .'. '.' • ,:
l^term jnation of the population under' consideration may not be part of the risk
'i;r:'l'i , ' • ' " i||!< |lf jji'iil 'ililpn Til ;, ,;, ... Jin i, ' ''"I;,!., i ' s::: ! : • , .1'! i|h» !! fin, ',» 'i\!i 1 ,:' 'i,i"i', "", .•»ii!"l''i: "' ; '.ii '•'•» : ,,illi": , • ' if . ''•" ' , •• ,!',.,„
assessment requirements of regulatory programs. If the population of concern has not
been characterized, the determination of the representativeness of the data being used
o:i•.'..' ;';;•; 'i*:/ r;?4l ', • ;, >., , >., • ' ', ! •„• - ':•.•••. ;;:;;, •„,';• ' ; " i '• " "
in the assessment is not possible.
,• It
I..,!
the issue of representativeness of the sample to the population is an important
question. For example, populations which are exposed to Super Fund toxicants or
airborne pollution from stationary sources may be from lower socioeconomic groups.
Unfortunately, most of the information which is available on mobility is from the general
population. It may be that low income home owners have a much longer residency time
than people of median or higher income. It may also be that low income non-home
owners in certain age groups have a higher mobility than the general population. We
therefore suspected that the available distributions were not representative. In addition,
the U.S. Census data, the basis for the available residency distributions are not
longitudinal. Another problem with the residency data when evaluating stationary
l^urces is the issue of where the person moves to. A person moving may not
tiecessarily move out of the isopleth of the facility, the likelihood of moving out of the
tsopleth of a stationary facility also may be related to socioeconomic status.
' : "': " ™: ' : ' ' F-10"
r ' ,.•>!•
image:
Robert J. Blaisdell, Ph.D.
In order to address this problem, OEHHA proposed not using a distribution for
residence time in our Public Review Draft Exposure Assessment and Stochastic
Analysis Technical Support Document (1996).. Instead we proposed doing a separate
stochastic analysis scenario for 9, 30 and 70 years. We did not think that the 9, 30 or
70 years time points evaluated were necessarily representative of actual residence
times, but that these were useful, reasonably spaced intervals for residents to compare
with their own known residency time.
Using three scenarios complicates the analysis, but we felt that the approach had some
advantages over using a distribution. The California *Hot Spots* program is a public
right to know act which assesses risks of airborne pollutants from stationary sources.
Public notification is required above a certain level of risk. An individual resident who
has received notice is aware of the amount of the time that he or she has lived, or in
many cases plans to live, in vicinity of the facility. Therefore the individual could more
accurately assess his or her individual cancer risk. The relationship between the
residency time assumption and the resulting risk are clear, not buried in the overall
range of the uncertainty or variability of the risk estimate.
This approach might possibly be used in other cases where representative data in not
available or where the representativeness is questionable. For example if the drinking
water pathway is of concern and representative information is not available for the
population of a Mojave Desert town, the range or point estimate of cancer risk from
drinking 1,2,4 and 8 liters of contaminated tap water per day could be presented.
In some cases, each situation that a regulatory risk assessment program will be
evaluating will be almost unique, and therefore anything other than site-specific data will
not be representative. OEHHA characterized a fish consumption distribution for anglers
consuming non-commercial fish using the Santa Monica Bay Seafood Consumption
Study Final Report (6/94) raw data. We compared the Santa Monica Bay distribution to
F-11
image:
" 1
•••'•I
Robert j; §iaisd§ll, Ph.D.
tjfje fish consumption distribution for the <5reat Lakes (Murray and Burmastef, i§94).
i,M|'|ii',j, , ''''' '' ;. 'I: ' i|| i,, |i||| ' |!"|. , ., I 'I i ' i. '''1|!1 ..'' 1 ' ' '' |.' ' ! » if'I!' 1 ' i' "ill i ii. l''!li " ' ' "i'!1 ' ' " I " ; 111 " '".'!' | ,. ' • ' , i '' . ", i1
Y| e, fburid that the differences in the fwo distrjbiitidns |0u|d joe attributed fo
f^e'|lQ<|piogicaraiflerertces in the two studies, thbs trie assumption that a salt water
fish consumption distribution Was comparable fo a fisfi consumption distributibn for
large fresh water body was not implausible. However, the data gathered from large
bodies of water are probably not representative of small lakes and ponds With limited
jafoductivity and where other fishing options may exist. For such bodies of water a
life-specific' angler survey is probably the 6hjy way of obtaining representative data.
For cost rea'sojif, this option is not likely to be pursued Except In a risk assessment with
^Iry high financial stakes. wVchose to recommend using the Santa K/ionlca Say fish
donsUmption. If could be multiplied by a fraction to be determined by expert Judgment
to Adjust for site-specific conditions such as productivity etc. the Santa Monica Bay
fish distribution may not be representative Sn other ways in a given situation but may still
be the most practical option. It is clearly ndt temporally representative for chronic
cancer risk assessment.
Cost is often a factor that limits representativeness.
On page 8, paragraph 5 of the Issues paper there is a discussion of determining the
relationship between two populations and making adjustments in distributions based on
Speculative estimates of the differences in means and the coefficients of variation.
Perhaps in many instances, another option would be to stale that the Information from a
sTJffSgjate population is being used and that the actual population is known to be
Sifferent, or may be different by an unknown amount! f here are many questions in risk
Is'seS^fnefll for which expert opinion Is no better than uninformed opinion in attempting
''ll:;;l' • .'i * •" .".'h ; .' :' .''"..' '" JiMiii „ i1 "i .:: • * s!,*!' " '• • i;,.. ».' i. ' ,;. •» , • ,.!»' „:, ,,,',ii''- ,'',i , ..i.:'.1:;,!!!1"1 | • ,- •,"', i ... *"i • r ; '"n. „ ' " .1. j. '•<' '
a ; i,,'1 ! '' " ,L '.« » .. ill'1,Sill,, • *'.lir: ' ' " " " * ''"" .. !|l||: ' ' . 'V *" i' " ' ' V « - •'' f ' „:!:"'!', i"1 «' i * ,' , ' ,' ,1 ' ,", ..." , , '! " , ' " '!!!' ' V! :
tp cjUantify the unknown. An example of this is the shape bf the dbse^retponse curve
% ,; '"•' «. '" ,!, • ii S'lliil iij!;,,;"1** , 'Wll. f; ' ,ii „, ,,,'j.i1" ' ,','!' , ' , "nji " " ,: f II'1 " „„,, , ' 'V i ',i!i ' I1 , r.1 ."j; !ir',, '"'ii,1;', , M"!if' JSP"''.j ' Lj. HI";;;.' I '„, „, '.flj i ., ! •:„ 'T,:,1'' ,'„ ,,', ,"k, '
for cancer for most chemicals at low concentrations., A frank admSsslon of ignorance
may be more credible than ah attempted qUantificaioti of Igfibrahce In many cases.
:!• ' • ; ',: ; "' "'"i: IS .;.. • f:; ":.'' .' '. y •. . "M, '• '•' " ;,!: :'•.•.! • ,: f ',^r-'\-'"- ,; ,,'- „,!'"•' : ' *'; "; ',': • '
i;(!"»; In 4"'" I
1*1,
fe J
image:
Robert J. Blaisdell, Ph.D.
Comments on Temporal Issues
The methods discussed for estimating intraindividual variability from data collected over
varying short periods of time relative to the longer time period of interest are interesting
and would appear to be useful for the NFCS data. OEHHA is giving some
consideration to using the techniques described by Nusser et al. 1996 to adjust the
distributions for food consumption that we have developed for food consumption using
the Continuing Survey for Food Intake for Individuals 1989-91 raw data. I would be
curious to know if these methods have been validated on any actual longitudinal data.
The assumption of the lognormal model needed by the method of Wallace et al. (1994)
may in some cases be limiting. We have discovered when we evaluated broad
categories of produce consumption using the CSFII 89-91 data that some of the
distributions for certain age groups were closer to a normal model than a lognormal
model.
The Representativeness Issue paper discusses the importance of using current data.
The continued use of the 1977-78 NFCS study is cited as an example. The raw data
from the 1989-91 CSFII has been available for some time as an alternative to the
1977-78 NFCS survey. Raw data from the 1992-93 CSFII survey is now available.
OEHHA has used that data to develop produce, meat and dairy products consumption
distributions for the California population. It is admittedly not a trivial exercise to extract
the relevant data from the huge raw CSFII data sets but this alternative has existed for
several years. The 1989-91 CSFII data is clearly different in some cases from the
1977-78 NFCS. Beef consumption appears to have declined. As a matter of policy,
there should be a stated preference for using the available data over.attempting to use
expert judgment to guess at the appropriate means, coefficients of variation and
parametric model. In some of the Monte Carlo risk assessment literature, the
preference appears to be for expert judgment rather than data.
F-13
image:
•'i, I!
'' "ll'l
Robert J Blaisdell, Ph.D.
Jheuse of related" data may in some cases be useful in giving some insight into the
representativeness of data collected over the short term for chronic scenarios. OEHHA
has used the data on total energy expenditure as measured by the doubly labeled water
method to look at the representativeness of our breathing rate distribution, based in part
on a one day 24 hour activity pattern survey. The information on total energy
expenditure gave an indication that intraindividual variability was a huge fraction of the
total variability (intraindividual plus interindividual variability).
The intraindividual variability for a broad category of produce such as leafy vegetables
i in
may not be very great relative to the interindividual variability. The intraindividual
Variability for a single item less frequently consumed item such as strawberries is
probably much greater than for broad categories. Thus, short term survey data which
looks at broader categories of produce are probably more applicable to chronic risk
assessment than single item distributions.
Research Needs
i i '• '• v1'. •" , v ; v '!'''" •S'vi';.1';:: •s.1'.' ' •'. , <": '.: ":;. ^^'^ /" I
The information which is needed to develop more accurate distributions for many if not
most variates needed for chronic stochastic human health risk assessment are simply
libt available. In particular there is a lack of longitudinal data for breathing rates, soil
ingestion, water consumption rates, produce ingestion, non-commercial fish
consumption, dairy product consumption and meat ingestion. Some distributions in
common use, such as water consumption, are based on out of date studies. More
research is needed on bioconcentration and biotransfer factors. Longitudinal'data on
activity patterns and mobility patterns would also be very useful. There needs to be
much more research on dermal absorption factors and factors which influence dermal
yi '';/,; r.i'1 : ir , :\tm <-. •,.,• • i _;;:/,.;:,•'.if'tiii ,i!;, .;•» .• \, ,,,, ., „.,, ..,.., ^ ,,,, _ ,, ,., ,. ./; , , , -, „,, i.,,,^,,,-
Absorption! More research needs to be done oh children and the ways that they differ
from adults.
F-14
image:
Robert J. Blaisdeli, Ph.D.
Summary
The overall lack of data, particularly longitudinal data, for risk assessment variates is
probably the most important single factor limiting representativeness. If the purpose of
the risk assessment is to inform the exposed public, it may be possible and even
preferable to use point estimates for multiple scenarios in the absence of some
representative data. The statistical methods for adopting short term data for use in
chronic risk assessment presented the Issue paper appear to be reasonable
approaches in instances where the required data is available. More longitudinal studies
would be valuable for validation of these methods as well as improving the temporal
representativeness of distributions used in risk assessment. Most of the data used in
stochastic risk assessment will probably be nonrepresentative in one or more of the
ways discussed in the Issues paper for a long time into the future.
References
Murray DM., and Burmaster DE. (1994). Estimated distribution for average daily
consumption of total and self-caught fish for adults in Michigan angler households.
Risk Analysis 14, 513-519.
Nusser, S.M., Carriquiry, A. L, Dodd, D.W., and Fuller, W. A. A semiparametric
transformation approach to estimating usual daily intake distributions. J. Am. Statistical
Association 91: 1440-1449, 96.
Southern California Coastal Water Research Project and MBC Applied Environmental
Sciences (SCCWRP and MBC). (1994). Santa Monica Bay Seafood Consumption
Study. Final Report. June.
F-15
image:
-!„ .> • , ;,, ^ ,.; -: i>j;i • ', Jf ;• ,• : • .; ,,. . ... | v i , ; •;.; vi , .:;; i , Robert J. Blajsdell, Ph.D.
USDA(U.S. Department of Agriculture) 1989-91. Nationwide Food Consumption
Survey. Continuing Survey of Food Intakes of individuals (tSata Tapes) Hyattsville, Md:
>. ':,„ | .. . ;.j ........ . ...... • '1 ' .r; i! I1 ',"' '• i ...... ,]",•• ..... t" j ''• ! 'j''1'1' i!-i ....... '! ..... ' '! ..... i1- '•'•', « ...... I',"' ' " ..... "' ''' ' '' |i;!' " I '"'
Nulrilion ybnitormg Division, Human Nutrition Information Service.
'.. i 'fll-
,.»,i 1i;i'-'
HIM
iiiiiji 1,»
''III
F-16
image:
David Burmaster
13 April 1998
Memorandum
To:
Via:
From:
Participants, US EPA's Workshop on Selecting Input Distributions
for Probabilistic Analyses
Beth A. O'Connor, ERG
David E. Burmaster
Subject: Initial Thoughts and Comments,
and Additional Topics for Discussion
Thank you for inviting me to participate in this Workshop in New York City.
Here are my initial thoughts and comments, along with suggestions for additional topics
for discussion. Since I have just returned from 3 weeks of travel overseas, I will keep
these brief.
1. Models and Data
In 1979, George Box wrote, "All models are wrong, but some are useful."
May I propose a new corollary for discussion? "All data are wrong, but some are
useful."
Alceon ® Corporation • PO Box 382669 • Harvard Square Station • Cambridge, MA 02238-2669 • Tel: 617-864-4300
F-17
image:
David Burmaster
2. Definitions for Variability and Uncertainty
The Issue Papers lack crisp definitions for variability and uncertainty as well as a
discussion about why variability and uncertainty are important considerations in risk
.. i. " ;; i' 'I-1,. i'1:- ;,: '• • ' };"jfif••)' •::"• \, •*. ': ",,^ , ir ;' y: ! ?( ,:
assessment and risk management. (See, for example, NCRP,1996.) In particular, I
recommend definitions along these lines for these two key terms:
iit I ;fl ',;,;
<"'*!;! 91
Variability represents true heterogeneity in the biochemistry or physiology (e.g.,
body weight) or behavior (e.g., time spent showering) in a population which
• •, ' ', " ' ; i'htii , • ! f ,!, -' ', ,"i: i' 'if1' ,' •' , . i" • i f ,",""! "ii,i.,1" 'i i, •«„'' ,• "''ji' i 'i, . !,, '• , v1 ' " ' ,
cannot be reduced through further measurement or study (although such
heterogeneity may be disaggregated into different components associated with
different subgroups in the population). For example, different children in a
population ingest different amounts of tap water each day. Thus variability is a
fundamental property of the exposed population and or the exposure scenario(s)
in the assessment. Variability in a population is best analyzed and modeled in
terms of a full probability distribution, usually a first-order parametric distribution
with constant parameters.
Uncertainty represents ignorance - or lack of perfect knowledge -- about a
phenomenon for a population as a whole or for an individual in a population
which may sometimes be reduced through further measurement or study. For
example, although we may not know much about the issue now, we may learn
more about certain people's ingestion of whole fish through suitable
measurements or questionnaires. In contrast, through measurements today, we
cannot now eliminate our'uncerlaihty about the number of children who will play
in a new park scheduled for construction in 2001. Thus, uncertainty is a property
of the analyst performing the risk assessment. Uncertainty about the variability in
a population can be well analyzed and modeled in terms of a full probability
Alceoq © Corporation • PO Box 382669 • Harvard Square Station • Cambridge, MA 02238-2669 • Tel: 617-864-4300
image:
David Burmaster
distribution, usually a second-order parametric distribution with nonconstant
(distributional) parameters.
Second-order random variables (Burmaster & Wilson, 1996; references therein) provide
a powerful method to quantify and propagate V and U separately.
3. Positive Incentives to Collect New Data and Develop New Methods
I urge the Agency print this Notice inside the front cover and inside the rear cover of
each Issue Paper / Handbook / Guidance Manual, etc. related to probabilistic analyses
~ and on the first Web page housing the electronic version of the Issue Paper /
Handbook/Guidance Manual:
This Issue Paper / Handbook / Guidance Manual contains guidelines and
suggestions for use in probabilistic exposure assessments.
Given the breadth and depth of probabilistic methods and statistics, and given
the rapid development of new probabilistic methods, the Agency cannot list all
the possible techniques that a risk assessor may use for a particular
assessment.
The US EPA emphatically encourages the development and application of new
methods in exposure assessments and the collection of new data for exposure
assessments, and nothing in this Issue Paper / Handbook/ Guidance Manual
can or should be construed as limiting the development or application of new
methods and/or the collection of new data whose power and sophistication may
rival, improve, or exceed the guidelines contained in this Issue Paper /
Handbook/ Guidance Manual.
Alceon ® Corporation • PO Box 382669 • Harvard Square Station • Cambridge, MA 02238-2669 • Tel: 617-864-4300
F-19
image:
David Burmaster
4," Truncating the Tails of LogNormal Distributions
'l:',:,, • ' ' ,». ill . • „ ' 'i,! .! , : '. •' : i,:''1' i ..:.' - !'<
1 i', ' lidi ;" , " - , • ,>: :' • ; ; = ,,' ; i:< •!, ',, ; - ,•, „ : .:••'
,i|| ,,' .i"' r ' .! •''• :, '' , ,
While LogNormal distributions provide excellent fits to the data for many exposure
variables, e.g., body weight, skin area, drinking water ingestion rate (total and tap),
ft' /r
Showering time, arid others, it is important to truncate the tails of these distributions. For
i ..... ''•' .......... ij:'1, ...... "" . ::]»! :• .. >,f ..... .'.; i -..'• 2 ..... ' ......... ........ ';• •• • ••• •' ! ....... ••" -! .................. '5 2 .....................
example, no individual has 1 cm of skin area, no individual has 10 cm of skin area,
and no individual can shower 25 hr/d.
<i . •. •• • 'i •••' ";;;;i:i, *• •.,, .<: ' •- • .',". i •• • " • • • ; - • .'•>••'•
: .• " : ' " * "A > " ' , , : "; ' :";! il1'',"' •. ''' ' ' •.. ' "';'"' •• ' :i.,i . " -' ; :, i'1 •:
5. Ivlixing Apples and Oranges
. , .
It is wholly inconsistent for the Agency to proceed with policies that legitimize the use of
Ij"l '•' ....... :'":;' "'!1:'., : -•:.' :••$ ;i, ';;• , -'i" •'!, .,' , .';,v.r. ,i" ' :; ","! •"• ..• > ", ......... : , •• i -r . ...... i :!•', ...... ,;;• ..... ; . , • "' ,
prbbaBilistic techniques for exposure factors while preventing the use of probabilistic
Si,1 ,;. '_ «" if" ,•! i .' '(IE i IT _ "> , _!'„',!. ,;" > '• :". , , '•••'>;• ' • , ;:i"ii .'1 1 ' j'.. . ; :] :. " :
techniques in dose-response assessment. By doing so, the Agency double counts the
effects of variability and uncertainty, all on a Iog10 scale - i.e., by several orders of
magnitude.
6.
*;
Report by RTI
. " ' , • , i'li'i
> It,,,
I disagree strongly with many of the approaches and conclusions found in RTI's Final
Report dated 18 March 1998.
,i!. 'i , i ,'i ' ' , ,,t! II,! I ' , ,!' ' „ ii, i ',,, , "i, 'r, , ' , , ' ,, ' i, ,
eferences.
Box, G.E.P., 1979, Robustness is the Strategy of Scientific Model Building, in
Robustness in Statistics, R.L Launerand G.N. Wilkinson, eels., Academic Press,
New York, NY
Alceon ® Corporation • PO Box 382669 • Harvard Square Station • Cambridge, MA 02238-2669 • Tel: 617-864-4300
• • •-" ":• -"" F-20 '" " '""""" '
image:
David Burmaster
Burmaster& Wilson, 1996
Burmaster, D.E. and A.M. Wilson, 1996, An Introduction to Second-Order
Random Variables in Human Health Risk Assessment, Human and Ecological
Risk Assessment, Volume 2, Number 4, pp 892 - 919
Alceon ® Corporation • PO Box 382669 • Harvard Square Station • Cambridge, MA 02238-2669 • Tel: 617-864-4300
F-21
image:
' , •-, '•••• - it :;;7
1 '- • s:.:\ I/''1"1
;;, • , in.'/in.
, :,, • .• a^i'W
Bruce Hope
REPRESENTATIVENESS (Issue Paper #1)
1) The Issue Paper
We would use probabilistic methods specifically for the purpose of assessing
risks from the uncontrolled release of hazardous substances at a specific location (site).
Our overall goal will be to feel confident that the entire risk assessment (and not just a
few of its components) is representative of site-specific conditions. Our objective is
better risk manasement decisions. This requires us to keep a few other considerations
in mind.
The issue of representativeness in terms of a fit between available exposure
factors data and resulting distributions is dealt with in the issue paper. However, a risk
assessment cannot be performed with exposure factor distributions alone - some type
of exposure model is required. We should therefore also be concerned with the
representativeness of the exposure model within which the individual exposure factors
,,'ilth ' '" i " ',,, •'! nil , ,; • •, ' I •' i1 ', i " '',',' : ',• '•. • , , „!, ' , , ," •' i' |
are used.
Correlation between exposure factors could significantly affect the
ll ij,' ••• ' .' 'l!:"'" • II'" jLiic! ' „ •" '"'.i,,''1 " ,' I"1 'I"'"'' ••' T!" '!!., i,:" , " i,' • "'.''''.'„ .' ' i li'ifl'!1''!1'" :'' ''• i«. " ',„ i •!•'"! " i !• ' ''
representativeness of the resulting risk assessment. It appears possible to have too
rtiuch or little correlation between factors. In some cases, the correlation is not
necessarily with body weight and/or age but with an underlying activity pattern (human
behavior) that may not be fully known. This nature and extent of correlation should be
a factor in evaluating representativeness.
The issye of data and statistical inferences at thev'exltre,me upper bounds (e.g.,
j'1 ! j1; ' '...'. - '/fl , ': ••( ' " in1. ',' . ": - '!'", ': •••. •.,!: ' i-• ;:" '• li'1'"!'1;:!"1' '••'. , •:'• , : , •'','•:
99.9tlr percentile) of a distribution has been raised in the literature, on the Web, and in
?:."""', , ".•'. 'M ..'••l'..:,, v, ' • : • ' '• ..",.' i..:, ^, •.:' -1-1' C- , , ...'.|l!i^-: ..•. ,; ',. »• >•:„ • .' ' ,<•
other U- Sj EPA forums. As a matter of policy, we regulate at the 90th percentile, feel
ji?" '. i ', J, " ""; ^rJllllj , ,11
||ia| decisions Ipsed on extreme upper bound estimates are potentially unreasonable,
f • " •', , , ''•'.!' •11 ;. n
and thus have truncated the upper bound (not allowed its extension to +°°) of many of
ill - v i ••• ; in | ff \ I J
?,-'•; •' .; ; "iii ' II I
: • '", .':••: • ' , , F-22
image:
Bruce Hope
the exposure factor distributions. How any such truncation of a distribution affects its
representativeness should also be discussed.
The suggestion that probabilistic methods could be used in any form of
"screening-level" risk assessment is of concern. We view screening has a quick but
highly conservative comparison of environmental media concentrations with published
toxicity data that occurs early in a remedial investigation (Rl) for the sole purposes of
narrowing the focus of the baseline risk assessment. Under our current guidance, we
are preserving probabilistic methods for use only in a baseline assessment.
2) Sensitivity
When various exposure factors are combined within a given exposure model, it
is typically the case that a few of them have a disproportionate influence on the
outcome. For example, soil ingestion rate, soil adherence factor, and exposure
duration are often primary drivers, as well as major sources of uncertainty. We should
broaden the discussion to consider whether all exposure factors are of equal
importance, in terms of their influence on the outcome of the risk assessment, so as to
better focus our distribution development efforts.
3) Adjustments
Concern has been expressed that any "default" exposure factor distributions
proposed by U. S. EPA will, perhaps unintentionally, will evolve into inflexible or
"standard" requirements. To counter this, as well as allow for inclusion of regional and
local influences, U. S. EPA should propose, in addition to any de facto "default"
distributions, an exemplary method(s) for establishing exposure factor distributions.
This exemplary method should be as straightforward, transparent, and explainable
(primarily to risk managers) as possible. It should also describe quality assurance (QA)
and quality control (QC) procedures to allow for the expedient and thorough review of
probabilistic risk assessments submitted to regulatory agencies by outside contractors.
F-23
image:
Bruce Hope
EMPIRICAL DISTRIBUTION FUNCTIONS (Issue Paper #2)
{I did not have time to fully review paper #2, so only have input on this one item at this
Siii ', : ": :!l . '. :, , v ; " • : ' ll! , •' 'i •' . , i .'.
time}
(Goodness of pit
We should also ask, if the overall risk assessment is sensitive to both the
exposure model and only a few of many exposure factors, just how "good*1 does every
other distribution have to be in order to support credible risk management decisions?
For example, if a relatively esoteric and hard to conceptualize distribution best fits
available data, but a much more common and more easily understood distribution fits
almost as well (say within 20%), would there not be some advantage in use of the
letter? In addition, if toxicity data remain as point estimates with uncertainty
Ippfolfching an prder-ofmagnitude, it would appear that there should be some leeway
1(1 how we choose or definecertain exposure factors.
Hi-' I''?:
image:
William A. Huber
Representativeness (Issue Paper #1)
1) The Issue Paper
1.1 The checklists
Section 3 of the Issue Paper regards the inferential process as consisting of several
stages of inference and measurement: Population of interest -> Population(s) actually
studied -> Set of individuals measured (the "sample") -> The measurements. The three
stages are denoted "external" inference, "internal" inference, and measurement,
respectively.
This appears to be a useful framework. However, the four checklists address the first
two stages only. Checklist i concerns the "internal" inference; Checklists II through IV
concern the "external" inference. No checklist specifically addresses measurement.
This approach is unbalanced. The obvious parallelism among Checklists II through IV
emphasizes the lack of balance. We should consider whether a better organization of
checklists might be achieved. One possible organization could be:
Checklist A: Assessing measurement representativeness
Checklist B: Assessing internal representativeness
Checklist C: Assessing external representativeness
Checklist D: "Reality checks," or overview.
Checklist B and checklist I would nearly coincide. Checklist C would incorporate the
(common) questions of checklists II through IV. Checklists^ and D are new. Checklist
A would incorporate certain questions sprinkled throughout Checklists I-IV, such as:
•. Does the study appear to have and use a valid measurement protocol?
•. To what degree was the study design followed during its implementation?
F-25
image:
P'V . „, IF !' 'i ' ,„'
t . / ' • -il ' -.,• " ,; '.' . , ' " ••• .- , •• " {':*. ' "••' : William'A. Huber'
». What are the precision and accuracy of the measurements used in the study?
•". Did the study actually measure what it claimed to?
f he questions In Checklist D would focus on the fundamental questions:
•„ Has the data set captured the variability within the population of interest?
•. Is it sufficient in size and quality to support the estimate, decisions, or actions
recommended in this risk assessment?
•, Can we quantify potential departures of our estimates from their correct (but
i III i . ,..••!••'••!•• : " " . .;';••' '••: •••• '"" •'' l" .'i;' ' '. '" " . •' •'•.''•';.'
unknown) values? Why and how?
|i : - v !,' '. ' i - '• •: .. • ' ;
Each of the bulleted items above has some detailed questions associated with it.
1.2 Tiered risk assessments
There is no subset of questions that can be selected since It cannot be foreseen which
question is critical to evaluating a particular study. However, there is a basis for limiting
the effort needed to establish representativeness. First, materially unimportant
variables—as established, for example, by a sensitivity analysis—need not be fully
addressed. Second, many of the checklist questions are relevant when variability and
extreme percentiles must be characterized; they become less consequential when only
a central tendency need be assessed. Finally, for a screening risk assessment, only
qualitative degrees of representativeness are needed. For example, if it is known only
that study results will conservatively overestimate exposures, then that study could be
useful for a screening level risk assessment, but probably not for subsequent tiers.
2) Sensitivity
fthere are two fcincls of sensitivity in a probabilistic calculation. They are related to the
i*,^ ' "","' " ",''I,,!''"1 IJfi'iJ!]1!1!':; U '" .hill 'I!!1!1'11 :'!" ' !"']|'| " V. ,'! ,*» ,„'•,, ",,,f *,,,;,„• "p" ^ru , , ;„ , ,Pi: v ,„," ' „, i, ,:|, itf,,,,' ' » ', „ •; ^,:,|il|1|" ' j1, !,„ ", „,:' , '• ": '
(distinction JDetween yariabiiity arid uncertainty. We may, with some loss of generality,
-." '- ' :•'•. '•"•;> * ,,; . • '•:••• F-26
image:
William A. Huber
suppose that the calculation is a determined procedure F that processes a collection S
= {p1, p2,..., pN} of "inputs," each of which is a (possibly degenerate) probability
distribution, and outputs a single probability distribution F(S). If there is a material
change in inferences based on F(S) when one of the input distributions, say pi, is
collapsed to a point, then the calculation is sensitive to the variability in pi. Otherwise,
the distribution pi can, with some safety, be replaced by a single number (a degenerate
distribution).
Uncertainty in the input pi can often be described as a collection of possible
distributions {pi1} that are "close" to pi in some sense. A typical example is when pi is
parametric and {pi1} is described by a set of alternate values of the parameters. There
may even be a probability distribution on {pi'} (a Bayesian "prior"). If, by replacing pi by
an arbitrary element of {pi'}, the inferences based on F(S) change in a material way,
then the calculation is sensitive to the uncertainty in pi.
The data must be sufficient to establish either that a variable is not a sensitive input or,
if it is, the data must be sufficient to characterize the variability or the uncertainty or
both, depending on which contribute to the sensitivity. This provides one basis for
deciding when data are adequate. However, it could be argued that any data
acceptable for use in a screening risk assessment are necessarily acceptable in
subsequent tiers—at a cost.
To be specific, for data to be acceptable at all they must provide some valid information
about the population of interest and some quantifiable level of uncertainty must be
established (no matter how great that level is). This is true for any risk assessment at
any tier, not just for probabilistic risk assessments. For screening use, inputs would
have to be set at extreme (but realistic) levels consistent with the data and their
uncertainty, in such a way as to ensure a "conservative" estimate of risk—that is, one
biased high. Once this is accomplished, it would seem there is no obstacle to using the
F-27
image:
.
same data in the same way in subsequent tiers, with the price for doing so being
estimates that are still biased high.
3) Adjustments
.:".' '$: • -;' ,'V'S1 !' pj'l " •'"•', - ""!'!'; - ^ , • : '" *..'.'•' .' " '' '" ', ,"< • if!;;"!1" ', ' • • " , V ." ,. '" !' I1
i'1"' ;;i,!, i ,;i!;l,,i- -i , ,;Jki : .:.f.',.':., '*].••$* ,'>''v .,• /. ,•. •. !• •;,-,:;:,, t;";,;»,; - v '$v :*•>!. '••.: ,i> ;< •'/'<..!' ''
Geostatisticaj methods; arei available forcertain adjustments of spatial scales. Good
':' '' r^^regcesi<a''re''lfcressie? "Kf. 'lv'Statist!csjfor Spatial' Data;* JourneJ, A', and" C. "Huybregts,
"Mining Geostatistics." In particular, methods such as "conservation of iognormality"
have been developed to adjust for differences in spatial measurement scale (this has
been termed the "change of support" problem). This is the spatial analog of the DW
model.
Adjustments should be applied with extreme caution because results can be very
sensitive to them. Similarly, surrogate data should be used very cautiously. A good
point of departure for considering('adjustments is'thVfoiiowihg1 definition, constructed to
capture the use of "representative" in EPA guidance ("Guiding Principles for Monte
Cario Analysis, lPA/6^0/R-97/001):
", " ' ii!!!1!!1' "!"" '""","„ " " ' "I :« :• i!"i '•'.in- ••••'. "•>,;: ! "," • ' " i ,; '.,':,„;-• • ••• „:„•-, - ,; •,/. .•.'.. i I
•• „ •,«[• . ' • "'!jil| • '" ,; ,",; "A, -. ' 'i'i' : ' ; „' ,.';, :„ ' :: •• •• : ;•' , - .;; , ' :,l • Jj |
Data are ^representative" when they admit 6b]ective and quantifiable
Statements concerning the accuracy of the relevant inferences made from
: T,,: , them.
From this point of view, adjustments can be considered (and defended) when made in a
Way that allows the potential bias or imprecision thereby introduced to be quan'lifie'd m
,,,- , IK'i'ill: .'iTillsa:,':,,' n!l» 1 .'CIIIIBr !•' Jr ' H,: " 'IV '»" , ' •" „! . '• •!, •'?:•(!(-. , '",','„',,, '''• I f. !'iar..'!l J.J. i ".(•.'.;'
':.:} ,„' Ipg, ,ri,s|s issessment.
;' ..'.' ii|l|l'!i "li1!' s ";' "'": '„ ''-'" i""1 , iSI« '"' . , '.,'[;, ' '.:',•'» • ,„ ''„ -i,-':', '. ,-"'• , ' .in 'i ;, . •; -: '«/";
, ! '." : ;• « ",:" .. ';: •• • = ' •:< ;! 3 "i- • .< :< •• • ••
'•••,•• ' •' '".ii , ;; - : '-.; , • • t|.;' • • i • •. ,;
;, , . ,' t.' ;.,»; • .;.,•, I," "nifV, •'•. . . ' ,'
image:
William A. Huber
EDFs (Issue Paper #2)
1) Selecting an EOF or PDF
The primary consideration is the effect the choice will have on the risk assessment
results. Each choice has relative advantages and disadvantages. They come down to
this: using the EOF honors the data but subjects the calculation to the risk that the EOF
poorly represents population variability and percentiles, a risk that can sometimes be
decreased by using a well-chosen PDF. Using a PDF requires some theory and
professional judgment and subjects the calculation to the risk that either (or both) could
be wrong or inapplicable.
The choice is not inherently one of preference. With small data sets especially, an EDF
is unlikely to represent an upper percentile adequately and so is manifestly a bad
choice. (That's not to say that any particular PDF fit to the data is necessarily better!)
When measurement error is large, the EDF will not appropriately separate variability
and uncertainty. On the other hand, when the data set is large and not fit well by any
theoretical distribution function, using the EDF is an excellent approach.
So we come back to the basic point: what effect will choice of distribution function(s)
have on the risk assessment results? This is determined in part by sensitivity analysis.
For this, the exponential tail fitting approach is particularly intriguing, because it seems
to provide a robust opportunity to explore how relatively more or less extrapolation
beyond the sample maximum (or minimum) will influence the results.
F-29
image:
William A. Huber
2)
Goodness of Fit
The best basis for concluding that a fitted distribution adequately represents a data set
i| when (1) there is a theoretical reason to presuppose the data will be represented by
such a distribution and (2) the fit is consistent with that presupposition! In this situation,
P-values are meaningful and useful provided that one appropriate goodness-of-fit
(GOF) test is chosen before obtaining and testing the data.
Graphical examination of the distribution is crucial. AH empirical distributions will depart
from the theoretical fit, so the nature and amount of departure must be assessed. It is
highly unlikely that any standard GOF test will produce P-values that reflect the
1 "I'll i .i ' . J i 1 . , ,' . ,:<:•:,•.! >;" :..•••" ,,
sensitivity of the risk assessment results to these departures. In particular, goodness of
fit in the upper (sometimes lower) percentiles is usually far more important than
goodness of fit elsewhere.
!'«,; ; ' ' . ' -. _,» ; , • .;,.._ ,' .: . • "...,., ';• • , ': ':. '*.'" '• .•,•••',', L .••' • ' ''.•*,
In many cases, where many input variables are involved in a risk calculation, using
fitted distributions that reproduce the means and variances of the data is likely to
produce adequate results. So, more than any P-value or selection of GOF test, these
tpree criteria wl be practically useful for risk assessments:
IV Correctly represent the centers (means and medians) of the input distributions.
£ Correctly represent the variances of the input distributions.
$, Fit the Important tails of the data as well as possible.
(The "important tails" are the tails most influencing the upper percentile risk estimates.
The definition of the tail—e.g., data beyond what percentile—will depend on which
E I , - ', '- ill. Hi ''i. ... :'",!, V '. ,. , J ,, , • , ' ... .. ,
dppef 0e'rce"rltiies are Being characterized in the risk assessment.) Note that EDFs will
satisfy the third criterion only when data sets are large enough to estimate extreme
percentiles with confidence.
.«!•]" K
'51 ' "
F-30
image:
William A. Huber
When only summary statistics are available, there is an inherent problem in fitting any
distribution: it is impossible to estimate uncertainty. Using additional information about
possible limits to the data (that is, what the most extreme values could be), one should
over-estimate the amount of uncertainty in the fit and use that in a sensitivity analysis.
Uncertainty in the variance of the data is particularly important for probabilistic risk
assessments.
When the better known distributions do not fit the data, there is exceptionally little
advantage to resorting to someone's system of distributions, such as the generalized F.
First, there is usually no theoretical basis for adopting any of these distributions.
Second, there is little assurance that the best fitting distribution in a family will
adequately represent what is of importance, namely the variance and tails. Third,
reproducing the calculations can be difficult if the family of distributions is not in general
use or is ad-hoc, like the five-parameter generalized F distribution is. Fourth, many of
these families of distributions include obscure members whose estimation theory might
not be well understood or even known. It would be better for the risk assessor to work
with familiar constructs whose properties (especially with regard to influencing the risk
assessment outcome) are well known.
3) Uncertainty
Every standard method of assessing uncertainty has limitations. Maximum likelihood
methods often are based on asymptotic normality, which sometimes is not achieved
even for impractically large data sets. There are applications where the bootstrap does
not work—it is not theoretically justified. Certain methods, such as pretending the
likelihood function is a probability distribution, simply have no justification (based on the
theory of estimation).
In general, uncertainty should be assessed as aggressively as possible. As many
possible contributors to uncertainty should be considered and as many of these as
F-31
image:
VV'!li§m A, HMlper
possible should be incorporated in the risk assessment, because their effects
accumulate.
An excellent method for assessing uncertainty is to randomly divide datasets into parts,
perform calculations (such as fitting distributions, estimating statistics, and computing
t. . ,1 * • . ,. . . =' ., I, • ',. . ," , ,;.. , " '•( , ,.. .! , • •, . • . .. ' . , ' "ll
risk) based on each part, and evaluate the differences that arise. Certain forms, of the
bootstrap and its relatives, such as the jackknife, automate parts of this procedure.
71,, i •; *• |h pi,,,1 v •'' ' \ ., fn .• , [ , '• • • , ,;„""!,, >; i " ' i '. ,• , "'•': ' ' , i., '"",•.
image:
Robert C. Lee, Golder Associates Inc.
Comments Regarding "Issue Paper on Evaluating Representativeness of
Exposure Factors Data"
1 . The issue of representativeness relates to how the risk assessor makes
judgments and corrections regarding uncertainty inherent in a nonrepresentative
sample. Discussion of the differences between uncertainty (bias and/or error) and
variability (heterogeneity) would be useful to avoid confusion. For example, Checklist I
misleadingly implies that measurement error can have an effect on variability, which is
an inherent property of a population.
Uncertainty can either be characterized as systematic (bias) or nonsystematic (error).
Uncertainty in exposure assessment may stem from:
Model errors
Errors in the design of the assessment method (i.e. measure of exposure)
Errors in the use of the method
Subject limitations
Analytical errors
One way to represent bias and error is as follows. A measured or observed
value X, can be represented as a function of the true value 7], bias b, and
nonsystematic error Ej, as:
The population distribution of Ts represents variability. However, perfect
knowledge is rarely available. Therefore, E can be represented, for example, as a
normal distribution with a mean of zero and variance as:
o2=
o-2T
F-33
image:
Robert 6. Lee, Golder Associates Inc.
\g/here cr^ is the variance of the uncertain measure X, and o*T is the true variance
(assuming independence).
§ias (which can be positive or negative) can be represented as a deterministic
shift in the mean of X as compared to the mean of T, as:
Thus, error and bias can have an effect on the estimated population distribution,
fcjut not on the true variability.
.'• " , " "ill ! : • , • , -I;-':-- ; i,;, . . , ;, .1, , , , ..' . ; ,
2, In many cases, an approach that uses "reference individuals11 or strata rather
than attempting to evaluate or estimate variability in a broad population may be useful.
For instance, if one is concerned about children's exposure to lead in a Western mining
town, it may be simpler as a first step to hypothesize a few examples of children with
deterministic characteristics with regard to site-specific population variability, and then
evaluate the uncertainty associated with these reference individuals' exposures. This
rflethod can be relatively inexpensive and easy compared to population sampling, and
could be used as a screening step in an iterative decision-making framework.
'•" 'i i?
3. The exact meanings of the terms "probability sample" and "probability sampling"
Ss tlsed in the Issue paper are unclear. Presumably these are broad terms covering
SM.' -' "-*; " »:,'!V::.::^' - ¥- ,V;;. : x. •••..*;** i? v ,vt•<•: .-p; •^•t'< ,;:• ,-„, • .; ,, .'..;.
schemes such as random, stratified, cluster, composite, etc. sampling. If so, then there
§hould be clarification and discussion regarding thei'methodological' and inferential
differences between these methods. For example, simple random sampling may not be
afSprdp'riafe for "all" environmental exposure variables. If an exposure factor varies
geographically, then it may be more appropriate to spatially stratify the population, and
characterize the factor within each strata as accurately and precisely as possible.
IS! ".:;'
F-34
image:
Robert C. Lee, Golden Associates Inc.
4. As stated in the text (page 8, final paragraph), the process of determining the
"importance of discrepancies and making adjustments" may be highly "subjective".
However, the remainder of the discussion focuses heavily on frequentist methods of
accounting for sources of uncertainty, which may not be the most appropriate
approach. There should be discussion regarding both empirical and nonempirical
Bayesian methods of population inference, since these methods are very powerful and
are increasingly used in risk applications. A major advantage of Bayesian methods is
that they allow refinement or "updating" of a priori knowledge with additional data or
information.
5. More attention is devoted to "temporal" characteristics of a population than
"individual" or "spatial" characteristics in the text. The reason for this is unclear. There
should be discussion of how to determine the relative importance of these
characteristics in risk assessment.
6. Discussion of Bayesian techniques may be useful in Section 5 of the paper,
which covers issues involved with improving representativeness.
7. Discussion of the use of simulations for future scenarios would be useful. For
example, if a the characteristics of a population are changing over time, time trends
could be incorporated into a simulation to determine the parameters of an particular
exposure variable in, say, 20 years.
Comments Regarding "Issue Paper on Empirical Distribution Functions and
Nonparametric Simulation"
F-35
image:
„';•!
: Vi
Robert C. Lee, Gblder Associates Inc.
j.
f. The assumptions listed in the Introduction of the issue Paper are ihiportant and
rii" .I,..',. ;>,"•: I .ISljJ]'1. ti '"Pi' '"•,', ••'•''" '.I1 .'i '••,.,.. ' 'l- •!'(,:.„'( ': ;;/f!V''H'-j •• '. ^liSii -,"' :,,, M,!P ••:-•;• .'.,'': , :: ., ,\ "'•'!"'
should be discussed further. The first assumption/.'. .data are sufficiently
(ijili'l/',!'", 1i.|.')1,j''!,l|»r., .NiiJt |l:ii!:;"i',' \ ,;': ; _ ;:-; ,!*•••' :f!'i' .^ VIM':' 'j; ,i. ; i"; ..jj.'Jfv ,,,;,*, ''A, V,, 1=;,, ^' i ...!"»_. •,._.; 'i;,^;. ,' •, " j •.••
rfprisefitaliveof the exposure factor In question", Is rarely tiflet. Uncertaintyassociated
With representativeness is often considerable. The second assumption,".. .the
analysis involves and exposure/risk model which includes additional exposure factors'1,
fi often true, although evaluation of the upper tail of a variability distribution is often
difficult because of its uncertainty. If the tail is of interest, it niay be preferable to stratify
I i 'ill III III PI I ' ',,'",'•' .!!, i" i1" • i ," •, ,'":: • ' : 'i '• ,",.,[,' ' , , „ - " , • '
the analysis so that the mean of a high-exposure stratum can be used in the risk
,1" ii •' i' ,, i i ' , ' ' it,! i 111 ,, i T ;„ , i', 1,1,11 i' 'r!i'"i,,l'ii '', "' \ '! ' •' • • ,'" i , '
assessment. The third assumption,". . .Monte Carlo methods will be used to
investigate the variation in exposure/risk'1, may be true in practice, but other simple
analytical and numerical methods exist. Given simple distributional assumptions (e.g.
lognormality), a hand calculator can be used to calculate probabilistic output of many
regulatory risk assessment models.
2. Examples of EDFs that have been used in risk assessments would be useful.
3. The statement implying that it is rare that theoretical probability distribution
functions are "available" for exposure factors deserves discussion. For example, under
:!:!»"i :, , ' .: ' ' ii.i'smiii .''„ ' •' 'ii1 : » .j'1 i1 • • " ; n ilk', i , ,:':,', •,;,' ;> ,iin,, .„. : "• i F" ;,i': ln' " ' ,,r' ••• • ; , • ' : "n, .'„
t|ie jTiaximum-entropy criterion, theoretical PBFS may |,e fj| jn a rigorous manner using
various combinations of limited a priori information. Furthermore, the assumption of
lognormality for many exposure variables and models has a theoretical as well as a
mechanistic basis. It is'hard,tp argue against using iognorrnaf 'distributions when' non-
negative, unimodal, positively skewed data are available.
Regardless, there is a practical continuum Between using an EDF and, say, a
maximum-entropy theoretical distribution. The issue of sensitivity is important; i.e. when
does it make a difference in a risk assessment? In general, EDFs may take more time
to develop. Discussions of the utility of particular distributions should be separated
ftprn theoretical arguments. An iterative approach to refinement of environmental
J!::!1!'' * '! ' ill I • ' • '•' ' fi, •" ' i ' ' , , ' i ' in I ",.i ' ' •,.! • ,! ""!", ' ,,r , '" ,n I ,
image:
Robert C. Lee, Colder Associates Inc.
exposure distribution functions should be discussed. This could potentially avoid
inefficiency, and could be used to focus research dollars. If conducted within a
Bayesian framework, prior EDFs or PDFs can be refined given additional data.
4. Much discussion in the text centers on the appropriateness of particular
goodness-of-fit methods, visualization, etc. All of these methods are "blunt tools". Most
statisticians simply use a number of different methods simultaneously or iteratively. If
all the_methods agree that a particular parametric distribution "fits" the data, then that
distribution is probably appropriate. If they disagree, then the mechanistic and
statistical justification for a particular distribution form and the sensitivity of the model
output to the distribution defined should be examined; an EOF may be more
appropriate. If the model output is insensitive to the particular PDF defined for a
particular variable, then it probably does not matter what shape it takes.
F-37
image:
Merits
Comments on Issue Paper on Evaluating Representativeness Qf Exposure
rs Data
!*!!!:;
' flMjlM'l1
£l Inferences Worn a sample to a population
I I1;1'1"1 li •••''•^••••..; '
Tjriii'ijjjjjjir i ' l».;'
population of concern at a Superfund site, is generally the, population, surrounding
the site. This is true if the concern is for exposures during remediation activities. If
S'J?iiS ipi'int; s;i-i|i'; iiil|'i|| !F i ,- if j ;.=.;!• - •' • "'-, V1',,' :" jl" ,.••".' ; . i '> • <".••• •;«,'- - ' ,' V;;';,',,,," ,,;i, , ;! • . :' , , • ,;'-;,', : ,,
there is some residual risk that may last over an extended time, the population erf
cOTQerh may change. "In a brownfields situation, for example, the population of concern
!!|!|''J ll'|l|i'"1 •'"!"""''IP*1 '» "i^f ;;;|j|iji|| , i* •:, : ,, ,,: ( ^^'i, , ,>'!; n ^ r ,: ^*; ; \^, | '*-: ;j'•,'•„;';,:, j!!1,;,.;1'1,,,,"jl|, :i."j:, '"'', »" "'j !.*»;:,!!,i^, '• •.• , '!" • !' ''"• ' '.,v.:! '" ' ,' ;;•!'•' •••.•'
nfl^y be people who will work at the site years into the future These people may be
q'uite different than the population currently living around the site.
I" • ," '• 'litsll! " .; " •• "' ••••.,. r,' -,- . •'." '. ; " ,- Si1'1' ;i '•• .- '••
4. COMPONENTS^ OF REPRESENTATiyEMESS|
«, • ...,;,;',. ""• ^,; , „ „ "MI :'. i; " "• • ' «., '.' •'"'„ • •'•'•, ••' ' "''': ,, 'i'"""i" '"'' '« ' •„ ' • i' • , ,:,',., i"'
fere is no question that one would like a cl§ar definition of the population of concern,
'•; " i ! T:;I!' jj|., ;;:;. , --•',,,', -• • •;;••; "..;••:; ,.; i;/rr* ,:" • •••. :,-?;•;..•':>>,.'-.'; " ifiy'S"'t >':'',. ' :'j''i;.'^.1.' t\';'. >". ,!»'•' ..•'••
but |f a representative sampling of the characteristics of that population has not been
«••' ' " • i,, •'.••.< ':>,1B(| , "" 'I!1- •• '„.'." •• , - '' ''. ' ;' ;: . " :; !"; ' ' •' !.»' : -I • I • '• " '. " . I" . "•'• : ." ,
done, that cjeflhltion doesn't exist. Isn't that why one uses information from a surrogate
population? That question then is, if one cannot characterize the, population of
concern, how can one know if the surrogate population is suitable to represent the
•;:["' V, •""' '• * - ' |"" ;' !fi:;'| ", \'::' .-;• ';•' •';'" .•;';.i •';;,,.: • •:i •,. :», t , ;' , i ;';i., ;i ( • r;.:' • •; ,,i]ii i;;_.",, i; •.. , „ : i •.: , ,•.: : „ i,..
population of concern? The answer is a practical one. It depends on the availability of
eSj which in turn one hopes depends on how severe the risk is judged to be.
4.1 Internal components - surrogate data versus the study population
Certajnly the representativeness of the surrogate study for its own study population
should be evaluated. This paragraph seems to suggest that every assessor that makes
use of a surrogate study should make this evaluation. Good surrogate studies are
•.,'!!:' > , , '"',•, .' "' ,'il ' "' „ .1 • ' ' J!"!1 ,'" J , , II . .1, '- ' ';',,'" ,,», , ' ,„ ' :] ", • :;:>' : ,: '' • • , ' . ,' , : • .i,,1 '!.,''
generally used over and over again by many assessors. Such an evaluation should
gnly need to be made once, with the results made available to all assessors. Along
with this evaluation should be an evaluation of the character of the population for which
image:
Samuel Morris
the particular surrogate study is useful. This could go further to provide some limiting
population characteristics beyond which the surrogate would not be recommended.
4.2 External components - population of concern versus surrogate population
The suggestion of using several national Food Consumption Surveys as a basis to
extrapolate dietary habits into the present or future seems like a rather precarious thing
to do. It also is something that could only be done for an extremely large, important,
and well-funded assessment. It is another study that, if done at all, should only be done
once and results made available widely.
Regarding several assessors independently speculating on the mean and coefficient of
variation of a parameter (expert judgment?), to avoid the phenomenon of anchoring, a
useful protocol is to have the experts begin from the extremes and probabilities toward
the central point, rather than beginning with the mean.
Checklist I.
I don't understand the questions, "For what population or subpopulation size was the
sample size adequate for estimating measures of central tendency .. .and other types
of parameters?" The previous questions ask if the sample size was adequate, etc.
Presumably this means it is adequate for the size of the population that was studied. I
am assuming that this checklist pertains to an internal analysis of the surrogate study
and has nothing at this point to do with a different population that is of concern to the
assessor.
Checklist II.
F-39
image:
-•; ;/ ; t| ,. . ' ',;. ."' , . "'v'' ' • ..;' Samuel Morris
1 suspect that In most situations, the answer to the first question will be that the two
populations are disjoint.
Checklist 111.
These questions concern whether the two populations inhabit the same geographic
airea. Presumable the interest is in similar climate, activity patterns, etc. Spatial
characteristics convey a Droader-in fact a different-meahing to trie. It suggests how
:," 1 •]./. '_" "' J I IJflJ! •,,..,•' _/ •„.. ;,,; • , - i , • , i , .,; _ ::,™, , ... . . , ,r ,•',;,
the population is distributed in space. Is it a high density area or a low density area?
Are there clusters of housing separated by open space?
Responses to the Questions on Representativeness
I ' :ji',' l|;!, ',,',:" '.: I'll: : '".'•",'*;,• . ! !';•''"•', ' ' '''••'.;!• ! ':'•'' •' *'.*;'''''Ll. :''J! .' |f ' '" '" '":' '"'',!', ' ' '•:'*1 " ','' '"'''
Issue Paper oh Empirical Distribution functions and Non-Parametric Simulation
introduction
Is stochastic variability really the right term here? Juslto make sure I am interpreting
this right, 1 take "variability" to mean that, for example, some people drink more tap
water than others and thus have a greater exposure. The Big difference between
f liability and scientific uncertainty or random error is that it is presumably possible to
identify which individuals drink 2 liters/day and which drink 0.5 liters/day, or they can
Identify themselves. This is important because it provides a tool for intervention. For
;;,!,'',"„• . i , ;, '-.I'M. I"-.- !! • ","!•;; • •: !•.',/. '.-.,-. ; -'".•" •.!;-. •• L'.I i •'>•;.'! •' iiu-'J" - i: :. j •• i1;.1'1 ' ' . ;* -' •' j'1!';,'1
example, we can warn pregnant women to reduce their intake offish rather than setting
a standardI requiring' everyone to eat fewer fish. "Stdchastic variability" seems to imply
f ariability that Is so randomized that we-nor the individuals involved-cannot determine
E'1 ;, "•),•, „:, isfif ;•• :": . i"; ;: ' v • ' :.„'«.. i^ • '•.-:" ..t i-11."^1/;;1'^1!- •'•• .•v'";i"i''1i^!^"i;:' , ,•;• ;• • /.I'V"^
|/ho has a high exposure and who has a low exposure. In that sense, it is the same as
I; ;.••" C' t^.; ** :' li"-'": •'/' i- •• ' :"' •' :' '' '•1;';'"";:': '•"'• •••"" !/ •ii'v' V; - •• •';; ' r':';-
a Cancer dose-response function.
''ill'!"!!. ' ' ' 'i!|i " .''''''iiiili !n i ' 'i ii. n ..I ,. n
' "
F-40
image:
Samuel Morris
Why do we write-off the use of theoretically based distribution functions? Many
environmental variables do seem to be distributed lognormally. It isn't just coincidence.
I believe that we are often better off fitting our data to a lognormal than trying to develop
an empirical distribution based on what is typically a rather small data set. I once got
some good advice when I was a junior engineer trying to figure out how much water
was flowing in a pipe. My boss told me, "We have a good theory explaining the flow of
water in pipes, but our meters have a 5% error at best. If there is a difference between
the theory and the data, assume the meters are wrong." My only problem with
lognormals is how well they continue to map nature out in the extreme tails. Even
there, however, how much confidence do we have in the 99th percentile of an
empirically based distribution?
Part 1. Empirical Distribution Factors
Extended EOF
The EOF is extended by adding plausible lower and upper bounds, but the paper does
not mention how one extends the linearized curve to reach those bounds. Presumable
by using a curve-fitting routine of some kind.
In many cases, there is no clearly obvious point for the upper or lower bound. We know
we do not have any one kg adult males, but how do we decide to stop at 15 kg and not
14? Expert judgment is used. Expert judgment may be all we have, but it is not a great
justification, and it is important that we provide justification. I believe it is worthwhile to
do a sensitivity analysis to find the difference between .using quasi-arbitrary bounds and
letting the curve run out to zero or infinity. It might also be worthwhile to check the
difference with stricter, but perhaps more reasonable bounds, say a 40 kg adult male.
F-41
image:
Morris
Kjixed Empirical-Exponential Distribution
I think that mixing theoretical distributions with empirical distributions in some kind of
ft'] > " 'K1 ; •;/.» ; iiiiilt' in.1:'' ..... • ; •;"""" •• • =,: ...... .>'• ''•'•• ' '• • ..... M-'. i1 •••'..'» •*:•.!' • . '••;'. .•„ :' . ........
composite sounds like a good idea.
Starting Points
f lie smaller the "data set, the greater the rationale for using a standard distribution.
111 I I
Responding to #5, people feel more comfortable with a theoretical distribution because
Jt has a theoretical basis that supports interpolation between data points and extensions
beyond the data, although I was always told never to do the latter. When plotting
empirical data without a theory, one never knows if there is some big discontinuity
between two completely innocent looking data points. The problem is that the theory
l3§h)nd the distnputjpn is matKematicai, not physical. To be comfortable
interpolating or extrapolating in either case, one must have a theory of the physical
process involved.
image:
P. Barry Ryan
Workshop on Selecting Input Distributions for Probabilistic Assessment
In the transmittal letter dated March 27, 1998, Beth O'Connor asked us as reviewers to
provide "... not... comprehensive comments, but rather your initial reaction and
feedback on the issues... ." Further, we have been asked to focus on the so-called
"Representativeness" Issue Paper. My discussion focuses on that manuscript to start.
First Reactions
My first thoughts on this paper center on the need for an "audience" to be selected.
Issue papers such as this one will lead, eventually, to guidance documents similar to
those supplied as background reading. But what is the audience of this document? To
a degree, the audience must be viewed as one and the same. This document will be
referenced in a guidance document. Assuming this, a diligent worker looking for more
information will seek out this manuscript. Hence it should be readable and accessible
to practitioners of risk assessments and exposure assessment science. With this
assumed audience in mind, I continue with my initial reaction to the Issue Paper.
The Introduction commences with a single sentence that concisely described the
purpose of the document. This is a good start; the reader is entitled to know what is
being discussed. Unfortunately, the next sentence is a parenthetical notation. Is this
statement unimportant, less important, to be ignored, or what? The third sentence has
a relative pronoun as the first word but the antecedent is unclear. To what does "This"
refer? Exposure factors? Representativeness? Whatever it may be, it is both
extremely brad and extremely important as the rest of the sentence tells us.
Before the above is dismissed as grammatical nitpicking consider the following. At this
point, we are only three sentences into the document and I, considered to be an expert
F-43
image:
11 .11
.i'lfl! .,
. ,\ 1' 'iii [ 'it/!!1:"
.' • f!'i' 3!'!' I ,"l !'
-IS'; ,•
,;ii . ii ' , i . ' , : , , ' „ , I1 , ' II i, , i • h :,
Vi .- , ,,',:, , . ... .. , „ ' : .. . '". , P. Barry Ryan
reviewer, am uncertain as to what is being discussed. A gentle introduction to a difficult
Subject goes a long way toward keeping the reader "on line." A little editing for style up
i1 ,: ' '' " " I!1 ' "!l •' , ' " ' ijjj'j i " '' ' , ' ',i',,' f1 ' ' ' "•' i • '' ,„' ' "! " ,;,,' v i i , , ' i • , '»
frprnt will make this document much more useful.
1"'' ' '' ''! ' '''1' ' ;: '' '
Let us continue. The next paragraph is a roadmap describing the way through the
remainder of the document. These two paragraphs provide the Introduction. More is
needed. Why is this important? When should it be applied? What has been done in
fie past? These are all reasonable questions to ask.
"i h
The next section begins the meat of the Issue Paper. General Definitions/Notions of
Representativeness is a real mouthful of a title. The term "Notions" has the
corjhptation of uncertain knowledge. Definitions are quite the opposite. Will we be
treated to contradictory information in this section? Apparently the answer is 'Yes"
because, as pointed out the Issue Paper continues, a reference to Kruskal and
Mosteller indicates that the term on which we are seeking guidance has no "...
Unambiguous definition..." Why is it necessary so early on in the discussion to confuse
"I:'"';; .'V "" "'", •" .• !:jj<;;J! : L V-v, i' f •"'• 1 ', "-/ :i.'; ' '"' ' :; 1 '••? • •" ,!>"' •:• Sf"' ' '•• ;'•'.''• '.-^f. " ',:..• ••„ '•
fte issue in the mind of the reader by saying that no definition exists? Why would a
reader of this document continue reading rather than throwing his or her hands up in
Despair?
V"' , '• ;' ,. ';), •..'''••'', ', , • ••.,: I, '"•''. • • «•., ;.•.;' •. , ! • ; T :•• • .Jj;.-•,.;,:„ '• :-i ' ; '•.'••>: i
fhe next paragraph (and accompanying table) adds further fuel to the fire. What is the
w,,, i p ' • Mh ijir i si*"1!'!!' ' ' "' ' ' '"! '"'' '" ' ' ' I"'!!1" '" ! ' • ' lilill!! '""!" J '' ' •' " ' ''• '' J": "'" '': ' " '! n ' lii!'!"1 " • ' '" " '
purpose of this table? How does it contribute to the definitions or notions of
representativeness? There is no discussion of the importance of the terms, how they
might be used In assessing representativeness, nor the purpose of the table.
•ill
',11
F-44
image:
P. Barry Ryan
So, again, we have a section that needs significant editing. It is not clear to me that this
section adds any insight into the notion (or definition) of representativeness. Th
elementary concept is not difficult. The attempt to be all-inclusive at the very beginning,
however, is doomed to failure. It is difficult to tell someone what works by telling him or
her all of the problems with the system first. It would be better to adopt a working
definition, show how it can be applied to many situations, then list some problems with
the working definition. This allows the reader to gain some understanding of the
concepts, without having to grasp the entire subject a priori.
I have, until this point, spent a great deal of time discussing a very small part of the
Issue paper. In particular, I may have spent more space on the discussion than the
manuscript length to this point. However, the first page or two of any document sets the
tone for the whole piece. The tone for this manuscript ranges from one of despair to
one of disorganization. There is very little room in that continuum for gaining new
insight. 1 urge a re-write of these early sections.
Moving on to the next section, A General Framework for Making Inferences, begins
the "meat" of the manuscript. As a matter of style, I do not care for a series of
parenthetical notations in sentences. I believe that it obscures the meaning of the
prose. Shorter sentences fully describing each of the activities are better. This is a
recurring style point throughout the document. I will not comment on it further.
Figure 1 represents a nice, concise "decision tree" approach to risk assessment data
collection. The discussion is muddied somewhat by the introduction of the (undefined)
concept of surrogate data. Reordering of sentences in the paragraph to bring the
example closer to the first use of the word surrogate would clarify substantially. But we
quickly go far afield from our discussion of representativeness. The manuscript needs
F-45
image:
P, Barry Ryan
to focus on this concept. Indeed, the entire section on Inferences seems misplaced.
Should it not be at the end of the document? On the other hand, Figure 1 is useful to
t^e disci|ssion "of representativeness. The branches in which one must assess this
factor offer an excellent opportunity to introduce techniques, etc., to assess
representativeness. For example, the figure instructs the reader to follow the
algorithms outlined in checklists l-IV. Why not discuss them now? It would seem that a
, : ' I , , • ,| ' i'1'!" I ..... 'i i , " , i, • . " • ' i' , ,. ' ' • , ', "„•!»: ,i '.' i, ;, , i.
discussion of Figure 1 in light of representativeness would be a more useful first step
than to develop concepts of inference form it. The figure is designed to result in an
inference, granted, but the pedagogical role of the figure here is to help the reader
understand the concept of representativeness.
the next section, Components of Representativeness, begins to dissect the concept
into pieces more manageable. The table, Table 1 , and the coupling of the discussion to
the Checklists in the appendix, are perhaps the strongest parts of the Issue Paper.
fable 1 is especially noteworthy. It presents the fundamental questions and parses
out according to the "population" characteristics under investigation.
these include Individual Characteristics, Spatial (here misspelled as "Spacial")
Characteristics, and Temporal characteristics. Further, the characteristics are divided
between exogenous and endogenous effects- a very useful division. The focus should
remain on this table. Discussion should expand, examples given, and understanding
" ' ' i ...... Til , '„• "I • '""'" ' ••' „»,:! •• • . ,i, • i *, - T, , . • . '.. -
reached. These are the essential concept of the Issue Paper.
image:
P. Barry Ryan
Unfortunately, the manuscript gets bogged down a bit at this point with the "Case"
scenarios. I kept getting confused between Case 2, Case 2a, etc. Also, the
introduction of the National Food Consumption Survey confused rather than helped. I
found myself wondering if this approach was only applicable to the MFCS or did it have
more general applicability. The topic is very general and the specificity of the example
obscured that. Again, the tabular presentation is much more straightforward and
helpful. Table 2 could be discussed without reference to the MFCS and the different
components of representativeness addressed much more clearly and generally.
With section 5, Attempting to Improve Representativeness, the tenor of the Issue
Paper changes dramatically to become much more statistical in nature. It also
becomes more difficult to follow. At points in this section, the authors go off on
tangents. See for example the discussion on raking techniques on page 12. A better
approach would include more on when such data are likely to be suspect and a better
description of the weighting techniques that have been advocated.
In the sub-section Adjustments to Account for Time-Unit Differences, there is
considerable discussion of the Wallace, et al., approach to inferring temporal effects.
No mention is made, however, of the work of Slob (See Risk Analysis 16, 195-200,
1996) who advocates a different technique and evaluates both. Regardless of this
missing reference, one questions why it is here at all. It is very detailed and, in my
opinion, should be described briefly in terms of its logic, then detailed in an Appendix.
The brief reviews of the Clayton et al., paper, the two Buck, et al., papers, the work by
Carriquay and co-workers should receive the same treatment.
F-47
image:
i' .'» V,- :: -.'! •' "1:. " '•.. :' h. : •' ' -'••' '••*:? :" "•-• pYiarryRyan
•i ;' g *,'. - ' ,*i|||| ,,,:. -• ".I* .-'.; . . ; ; '•' , :,' ' : ir • ." _ ;, * ' ' ;. ,-
fhe section Summary and Conclusions, is really only a summary. The first two
paragraphs perfiaps should have come earlier In the document rather than at the very
end. They express the philosophy of what needs to be done. This is a good thing- it
sets the stage for the Issue Paper.
Continued Thoughts
After the above impressions while reading the document, I have come away with the
Impression of a fairly uneven presentation that may not be especially valuable either to
the risk assessment community nor to EPA. the idea of ah Issue Paper addressing the
'"i| ,!'. ,;- ; '•:',,! • /liiliiii! i'"!' ,r. • ,„>•;:,;.; I1'!! ,,, '.'.., i' ..,/"",,;. ;', ••'•'..;.;:>,, i' 'I'M,', •;>. .,., • '!;, _ ;.:•._ ' ," r ;-;'!' i "!•:
gSpcSM of representativeness is a good one." Data are often used in a willy-nilly fashion
"i1:]'1'1'1 ' is n v '.. • :" i; : •< ...• • i . , • i,,; •< ••. ,, n . ':-.• "• • ., •.. .1 ; " ,. • •, 'i-j.vi ' i mi '.M
with little regard for the way in which they were collected hot what the study design
intended to do. Because of this, erroneous conclusions can be drawn resulting in much
Wasted effort and, sometimes, money.
I think the document as now presented does not present the issues well. However, the
• • ' ,.,''!!'' " •'"" '»,' ' '"" ",'''«": , ' ' „ ' i • - ,'" ' * "'i '". • i ' IN." "' .'• . „ ' " ' j ', ''•
Figure, fables, and Checklists are excellent. They provide a strong foundation for a
";•• " ..... . i ' il , "llllfjjl, :.. 1.1 ...... "'t'. r. .: ......... i. ...... (,):.". '".'• ,-!,'- 'i" '..' ':.. : ..... : .. , ..... . ......... • V, i t/H-.', ...... I: : Si-' '" ....... '. " .. i," i
document useful for Both the neophyte arid expert alike. As ah exposure assessor, I
am always trying to come up with clean definitions of the parameters I am measuring.
IS It exposure? Is it dose" Is it applied dose? The authors of this Issue Paper draft
qnaftejl'a^ivvers'ito simiiar questions associated with flW representativeness of
data, surrogates for data, and the pitfalls of ignoring the problem altogether.
'LJhlortuhalely, '' these gems are buried m a veriiabie rbckslicle of other information. They
||-§ hpl given flieir proper attention in the Issue Paper! The science anci EF*A would be
Well served by asking for a re-write based on {he Figure, fa"b|es, and Checklists. Some
Introductory prose should b"e placed up front to set the stage- perhaps the two
paragraphs (or modifications thereof) found at the beginning of the Summary. This
Material would be descriptive of the problem at hand answering questions such as why
image:
P. Barry Ryan
is representativeness critical, how is it often lacking, and why attempts to improve the
representativeness of sample must be done carefully. This would then be followed by
Figure 1 and its description, which leads further on to Table 1. The description of Table
1 and Figure 1 give the essentials of the representativeness argument.
The next section would use Table 2 as its focus. Table 2 expands on the ideas of
Table 1 and thus is an excellent follow on. The "examples" could be relegated to an
appendix with more complete examples chosen and more detailed calculations worked
out.
Finally, the Checklists should be given a more prominent placement, and a more
complete discussion.
F-49
image:
Comments on PreVVojkshpp Issue Papers:
"Evaluating Representativeness of Exposure Factors Data," and "Empirical
•! "' ,1,^ t, :: : -"." ,",L, | , 'I,,IF '| ' « IjV, '",,1 '/;: „'. ' ii: | y" ' , ;" •:;,;• >• ' "' i"; '. ' , ••.
Distribution Functions and Non-parametric Simulation" for US EPA Workshop on
III • ' "' "" ',. "' *"..Ti :."':'-" ' :', •'. " '•'•''' [„ ": V .',''.' ''it'll ," •:'. '•.'„•'.".! , ' ', • • ' •'"'!•
Selecting Input Distributions for Probabilistic Assessment
"i" [ • ;;; ' "(New York, NY^ April 21 -22, 1998| ":
sue Paper on Representativeness
' J!""!r""i';""' " ,1 .••TX f !l ' ::' """"-' «"' '' < '•••
it "
I find the discussion of representativeness in this first issue paper to be generally
'".Hill i|j ,,; iiJi'i" i!:Hlii''<"'!'"! , '"'i,"r >..•"£-.• •,, .,;•.:.)....• , • ,. „• i.. .-,... v ,<,',~-:c • .•. v, '!•'..•'. .'.. i. ., v .' «• : ,. •
thoughtful and Helpful. The paper does a good job of presenting statistical concepts of
experimental design in a manner that should be understandable to most exposure
1'P •: ,,. :: ', J!]' , " ;, . • ' ' *t . .• ."• " ,• • •„, , „:;:, ;;'" • :" f,;:;: ,•;'. , .::. •,.,,.••;"!.'.; • ".;":' ;:r/,..;,',; ;.:,,:.,[,", ,; ' "g
modelers. The major issues of target populations versus sampled or surrogate
""! ;">'! F"l III1"!'!; ,; ^||||| "'; .,•• ~.f-f\ ' '' ' " !.":''•*•..:',• ' ", '" , !.ll1-;;iS ' -, " 'f:';^ •"; • .r,i :'',I11I ; i- : ' " I" ; .il'/,1 ',' '"il'.jil!.
Populations, and differences in available vs. desired spatial and/or temporal coverage and
%in"V ;.' ,";;i"" " : < ,,"!l'l!f ""F, ".' •',':"'•,. :','*. 'i"1 ,! ,;'! ' <;- " ,7'.|•!*:,- •"''•'•' V •. • "''" i-'1'1 • '" ' ",.. • '"• '',;!{
Scale, are addressed in a clear and comprehensive ~——
'III:!!"!""1!' 1l ' , , ill,, •; f •• •>'•'••! "'' "' ' •' i Ii ! V . • '.' ','" "
",i, ' " i !ii"'ji"! ' " , '; " :
5! !vi'' :••• i I
manner.
Tiered Approach and Sensitivity Analysis
f he Issue of tailoring the frameworK to a tiered approach to risk assessment is integrally
llnkM to the importance and need for sensitivity analysis when the tiered approach is used.
1'C ii , JiU'i infl'l '!-, ' iii '!! , ' ', „ , i ' • ' 'ii!'1" "' '"' '. , , '4.H'• i » , •. " " „,"„ I'1 M"!' ' , • , ;
When simpler icreening level assessments are pursued, sensitivity analysis is critical to
Piniii! i "!,,i,i „• 'i, ii . ;,..,!! ./ i , • '•' • ''i : *' .'"!! ,, " ' „' • i.. ri • in," iii!"1 'i 'i , • • .. .< . "• •
determine whether a, significant problem, worthy of attention or remediation, could occur.
lE>""'''".i :v'' "• ' : ,''«i;f • ' '•' : '*• " 'iii'1' ",'''"./ ',:!' /'"::: " '" •/':>. "";l ''; 5!j'' ,:, i* '.'.''"; •'. '", ; ;' :> ••'.". : :;
Sensitivity analysis is always most meaningful in a decision analytic framework - can the
decision deiiyed from the risk Assessment change as a result of a change in the simplifying
assumption (in this case, the use of data or distributions derived from a sample of
questionable representativeness)? The only way to determine whether this is so is to
repeat the analysis with the underlying data or derived distributions modified in a manner
consistent with known or suspected differences, over the range of plausible adjustments.
image:
Mitchell Small
If a plausible adjustment does lead to a change in the risk management decision, then the
analyst must first consider a more rigorous basis for determining the adjustment. If, with
a better basis for making the adjustment, the range of predicted exposure or risk still
"straddles" multiple decisions regimes (i.e., different management decisions are still
possible given the improved adjustment and the overall uncertainty from other
assumptions/parameters in the assessment), then this suggests the need to move to the
next level of sophistication in the tiered approach. This could include the use of a more
detailed and rigorous exposure and risk assessment model, as well as collection of a more
representative sample for the target population.
Adjustment
The discussion of methods for modifying statistical estimates derived from a surrogate
population to obtain results applicable to a different target population is thorough and
informative. I do have a few insights to add on encouraging the use of hierarchical models
with covariates to derive more representative distributions for the target population; on
variance adjustment methods for spatial data; and on the use of Bayesian methods for
combining information from surrogate (e.g., national) and target (e.g., site-specific)
samples.
Adjustments based on covariates: The discussion in Section 5.1 covers the usual methods
for weighting sample observations or sample statistics to adjust for stratification of the
target population in the sampled population (either intended, as is the case in a pre-
planned survey of the target population, or unintended, as is case addressed in the issue
paper, when the stratification weights are a matter of happenstance). The discussion does
recognize the utility of covariates (either continuous or discrete) for determining sample
weights and mentions the method of "raking" for deriving these.
F-51
image:
•Ill
Small
I think more could be done to encourage the collection and use of covariate data, in
particular, using these data to develop "derived distributions" for the target population.
Derived distributions arise when a relationship between the parameter of interest and the
...... 'MM 'i ' ..... "" ..... ii ...... ' "! ..... ! ...... M ! ..... S 1 1 '•"' ..... '" ' ...... '"'.i i '" ' I'"1.' '!!'"' ! ......... " ..... ' ,, , ,.||. ................. . ,, ,,,i „, , „ .....
cbvariates can be established in a surrogate population, [this relationship could be
rriodified for the target population based on a small sample and Bayesian methods (see
rjy discussion below for how this might be done).] The relationship is combined with the
distribution of the coyariates in the target population to derive the distribution of^ the
parameter of interest in the target population. The relationship need not be deterministic
4hVmethod is quite amenable to use with the usual regression relationships (with explicit
distributions of residuals) that are developed in exposure assessment.
Consioler the Mowing examPle witn a simP'e' closed-form solution: For subgroup j (i.e,
based on gender, ethnicity, urban vs rural, etc.), the natural logarithm of house-dust lead,
ln(house-dust lead), for person k is related to income, I, with the following relationship:
II!
= 3| + bjlin(lkij)] + ekij
wnereajisthe intercept, bjtheslope and ekjthe residual of the regression relationship, with
• i •, : :, :..: ,
|| Income I for subgroup j is lognormal:
then HtiL. for subgroup j is also lognormal with
= ajVb|^, cj)HD4 = [b/^
cy2]0 -5
The distribution of HpL for the entire target population with subgroup proportions P(I is the
Pj-weighted mixture of the lognormal distributions determined for each subgroup.
,v:.r 1i',;
image:
Mitchell Small
For more complicated relationships between the parameter of interest and the covariates,
or a more complicated distribution of covariates in the target population, Monte Carlo
simulation methods may be required to derive the distribution. An example of this (entitled,
"Bayesian Analysis of Variability and Uncertainty of Arsenic Concentrations in U.S. Public
Water Supplies," by Lockwood, et. al.) is attached. It presents early results of a project for
the EPA Office of Ground Water and Drinking Water (OGWDW) to estimate a national
distribution of arsenic occurrence in source water used by drinking water utilities, based
on a stratified national survey. The application is an example of Case 3 in Table 2, where
the surrogate population is a subset of the population of concern. The most pertinent part
of the attachment is highlighted, noting that the national distribution is synthesized by
sampling the covariates of the target population.
The use of covariates for deriving distributions of exposure factors in a target population
is a powerful tool that should be encouraged in the issues paper with more examples and
methods. It would also encourage exposure assessors and analysts to be more careful
and thorough in their collection of covariate data as part of their monitoring programs.
Variance adjustment for spatial data: The report does a good job covering the options for
adjusting bias and variance for time-unit differences; similar methods can be utilized for
differing scales of spatial representation. A good reference for this is Random Functions
and Hydrology (Bras, R.L. and I. Rodriguez-lturbe, Addison Wesley, Reading, PA, 1985),
especially Section 6.8, Sampling of Hydrologic Random Fields. Methods are presented
for accounting for spatial correlation when determining the variance of an area average.
(The other thing we should do is vote on the correct spelling of spatial/spacial.) Bavesian
methods for combining information from surrogate- and target-population samples: I have
learned a lot recently about Bayesian methods for combining expert judgment and
observed data to estimate distributions. Some of these are discussed in the attached
F-53
image:
ill >
It
'"*
Mitchell Small
paper by Lockwbod et al. The Bayesian method allows a prior judgment for distribution
parameters to be updated based on an observed data set, yielding a posterior distribution
fpr the distribution parameters. The posterior distribution characterizes the uncertainty in
t|e resulting estimation, but can also be used for "best-fit* point estimates (e.g, based on
the mean or mode of the posterior distribution). Bayesian estimates converge to those of
classical methods when "vague" or "informationless'1 priors are used, so that the
information in the sample dominates that of the prior.
~ ' "' . • ".. ' Vl'liJIli " T • • : !l ' 'i 'i . i|» i, , ' •" .
" '¥
"Oil!
,131
i, : v!'
|ayesian methods can add a lot to the suite of tools available for using surrogate
goppfaiion samples when estimating target population statistics. A number of these tools
Ife described in a paper that Lara Wolfson and I are (hopefully!) about to complete,
iiF'iiiij !'" iii,:"" • .;:',!•"' '• " .iiJiinni1! ' •, i*1! i, • .,,"1 ' •,;;:'!', i,i,r» , ,.,' i," • .«" :, ' i v T , !"• „'•< inn r'C1 'IF. "•' ,:„ "• '" •• • .,i: <t li! . > i ! , • ii/.'
"yetfjods for Characferizmg Variability and Uncertainty: Bayesian Approaches and
Insights" (we have been "about to finish this paper" for quite a long time, covering a few
of our recent meetings - hopefully I will bring a copy to the meeting in New York). In
particular, estimates from surrogate population samples can serve as priors for the target
population, allowing information from (presumably small and limited) site-specific studies
to be informed by, and combined with, the previous studies of the surrogate population.
Results from multiple surrogate populations can also foe used, each given a weight, along
with the informationless prior, to determine how much the resulting estimate will be based
oh each of the surrogate population studies vs. the information in the target population
§yn/ey itself.
A Specific Comment on the Representativeness Paper:
The discussion on !i§umrnary Statistics Available" in Section 8.1 (page 16) contains what
1 believe to be an error, when suggesting that standard deviations be averaged across
fLibgroups 'when approximating a population standard deviation: "In the case of population
variance, we recommend calculating the weighted average of the group standard
P-54
image:
Mitchell Small
deviations, rather than their variances, and then squaring the estimated population of
concern standard deviation to get the estimated population of concern variance."
However, neither of these approaches properly accounts for possible differences in the
means across the subgroups, which also contribute to the population variance. The correct
approach is to compute E[X2] for each subgroup:
E[Xg2] = E2[XgJ + Var[>g
then E[X2] for the population:
E[XATP2] = Ig PgEtX/1
and finally, the variance of X for the population:
Var[XATPJ = E[XATP2] - E2[XATP]
where E[XATP] is computed using the middle equation on page 10.
Issue Paper on Empirical Distribution Functions and Non-parametric Simulation
You appear to have already gathered a lot of thoughtful comments on the two topics
addressed in this issue paper. Will any of these respondents be at our meeting? Will they
be identified? I have given more thought to Part II (Issues related to fitting theoretical
distributions) than I have to Part I (Empirical distribution functions). I identified strongly with
the comments of Respondent #6 in Part II. To add slightly to Respondent 6's comments,
I note that parametric tests of significance for the fit of a TDF almost always reject a
particular parametric form as the sample size gets large - real populations invariably exhibit
some deviation from a theoretical model, which cannot capture all of the population's
behavior and nuances. In these cases, visual comparisons of observed and fitted
distributions are essential for determining whether these deviations are in fact important
to the problem at hand.
F-55
image:
! J
Edward J. Stanek 111
Review Comments on "Issue Paper on Evaluating Representativeness of
Exposure Factors Data."
(March 4,1998 Report)
by Edward J. Stanek
Are questions of differences in populations, questions related to differences in spatial
coverage and scaled and questions related to differences in temporal scale complete?
Should other areas be added?
The document defines a population in terms of a set of units (subjects) at a location and
time, a definition that is a standard starting point for traditional survey sampling. The
definition of the population is important, since the term "representativeness" is being used
tp describe the relationship between estimates of exposure, and the true exposure of
subjects in the population (or summary measures of these true exposures). An example
of a typical population is (p3) "the population surrounding a Superfund site".
The population is defined as a "snapshot" of persons in time and space. Although this
!;v ; ' : ;'."(! Y:'.-..'. ''vv, '•' , • • v."", •)-,'.. ;,v •• «:, ; ;•., ',(';•;' ' 1; •; •.
definition fits the traditional survey sampling paradigm, this definition may be lacking from
the stand point of defining exposure in the context of the public's health. The photographic
like quality of the definition does not account for the fact that hew people may move into
.f< • "•„>', ' ;.,: !i! r1: • .', ;!•;•' .(*", v,:- v •! •: "'• i'1"' "• ;• ,'t 'i,.«"'«"' : '::Vi- : r . , • ;. .,
the picture, ancl others may leave after a short time has passed. Thus, while
"representativeness" may be assessed for the picture, the picture itself may be limited. As
a result, the assessment of representativeness may have limited relevance for exposure
and ultimately tfie public's health^ 6f course, when one looks at the "snapshot" close to
the time it was taken, the differences may be slight. After a longer time period, the
II ',!;/ .ill- , ,;'!;,'Si
image:
Edward J. Stanek Hi
differences may be dramatic. This practical concern over defining the "population" is
ignored in the report.
It is important to introduce a longer time frame and possible changes in exposure when
defining exposure in a population. Such definitions are important conceptually,
pragmatically, and politically since they define the target parameters for exposure. Such
definitions are accessible to a broad range of interested parties and not limited to statistical
or technical experts. They set the stage for decisions on additional data collection, and
technical choices for estimation and modeling. The current document limits the scope of
"representativeness" by defining it only in a context that has an established traditional
statistical literature.
In a simple sense, such a definition may be diagramed as in Table 1. The idea is that over
chronologic time, there will be mobility and other physical changes. Thus, exposure for the
first subject (ID=1) may differ between 1998 (E^) and 1999 (E12). Similarly, ID=1 may
move in the year 2000, and hence no longer be exposed. Other subjects may move in the
area. Subjects will also age, and their exposure may change with age. Of course, the
exposure values in Table 1, while potentially observable, are not known. Nevertheless, a
consensus on what will constitute such a potentially observable exposure table is the
starting point for discussion of "representativeness". This conceptual framework has a rich
background (Little and Rubin (1987)).
The present document defines the problem in terms of the shaded cells in Table 1. I
suggest that the starting point should more closely correspond to a population as defined
in Table 1. Establishing the goal first will help prioritize issues such as representativeness,
sensitivity, and adjustments. One might dispute this goal by arguing that the problem
definition is difficult, exceedingly complex, and since conceptual, detracts valuable time
F-57
image:
Edward JL Stanek HI
afld effort from what data is known. I would argue that establishing consensus on this
definition (while not statistical) should be the starting point for "evaluating
:to
>ds and
rtodel-based fnJirehqe (Sasselet alM'(1"§77J, Scott and Smith (-fggg^ Meedenand Ghosh
^•iL/ji.':'''?;•' "1 ;.',' r ••.•'. . >-.: •••••.•.^ • •;•••• •>">•.. ••• • ..•.<• t;.v v;." ir :, : ;•• ,/, •-: ^^ ,11
(1997)).
.M "'' ', !<||i||l!!i! ,i I, i', 'i ': ! • ', • ' • , '»'' "' ' '• ' .','•• i",[!'"' ;f, ' 'i , • , , / ' ,''i
tgbje 1., Pqtentiilly Observable Exposure on Subjects in the Defined Spatial Loeatfon
i '' " i ' jLliij1 'i ! ' ' '". ' „ ' » ' ! ' » , '"" '' " • „'',!«, • ' '" »!• '' „!' ' i' iwl'ii,!!, ' i11:, n, ' • . " ' ,, . ' . .,'
(E» ==Exposed)
" ' ' ' " ' '
Time
(Yr)(j)
1998
1999
2000
...
Subject fDs (I)
ID=1
fD=2
ID=
3
." ' " r • •"• , <
^11 » ^st* ? &-m i
E«
Mi
M2
E32
£33
M3
...
ID-M
' t
1 IK
<•>* 'f **-Ht ,
...
...
...
^N2
^N3
MM
ID-N+
1
EN+I, 2
MN+t
i ID=M+
2
^^2,3
, MN+2
ID=N+3
^N+3,3 .
MN«
i Average
M't998
MtS99
M2000
M
UNJ1 , j N, 1 "I I ,» j ; • rii "Illll "' . L "I "" I1 .ill „ ':• „ "' ,„::! ,i, I!1" I'1 " „ " , " C i" I' '»',"' i • il I1"*!! ,!' H.M! , , T1 |,|i| 1U" ,ij, . , .l',,,,,1:,,, , '. • ' '„ <i .! ,i,,|i ..I'l' „ " !'• , I1,,,, ' , , M ' . Ii "
Are there wavs of formulating questions that will altow a tiered approach to risk assessment
(a prodressiori from simpler screening level assessrrtents to more complex assessmehls)^
A genera! strategy for tiering estimation approaches is by ordering the assumptions. With
very extensive assumptions, ail exposure assessments are easy. For example, assume
ifi!i;]:;!'!" > \ •' -:, 1 • ^'li-1 ;.V/>'fif.!: ;, , '•• >' ;;'*"' ;":„, • i":.';.' i' ,,; ' '>''•[ ^ ' m'.',:\,. I;1* ', ": \ '•" '.".,1^•: Tf ;.•'''<• ' •.• L;'•':if I,1'1,'•"" I!
a{ everyone at every time in every location has the exact same exposure, and that this
: t nJ'.n."1; i.}"1',; -i
m, iL.iiV:, ,:"•;.;, - l"iii IB:',, l.i,- '< ' i ( ji'i::.. r •: :' "i;::1.,; j ",.•! i M : r: a ." , T i!",, • ',. ..".„:'.. I": •',•:* i i. „•' , 1 ,'"!, Ii n-', vj .: i •
Iposurl a§fi be measured without error. Using these assumptions, a single measure on
'VI" ""T ,. ;,' "Hit"! ;•,. : »: :»-'";!:i:ii i> .:.''.'.' ' I •••' . i!' '••; •'»•'' .' • •'•'( ;'V ," '/• . • .."''.' :' , .;'• •'VJ1 ,
,!., . , . ' I-ilJII •' .. ' ' . , -i- ..' • ' : ••• !•:• . , : „ . f •
F-58
ii i
image:
Edward J. Stanek III
a single subject will suffice. These assumptions are clearly too strong to be broadly
acceptable. Nevertheless, these assumptions represent an extreme which has as an
opposite extreme the target "potentially observable" population (which is exceedingly
complex). A gradation of assumptions can be formed between the two extremes, with such
a framework leading to a tiered approach.
The framework asks how important are (or sensitive is the analysis to) population, spatial.
and temporal differences between the sample (for which you have data) and the population
of interest. What guidance can be provided to help answer these questions?
The document addresses the way the "surrogate population" represents the population,
how the sample from the surrogate population relates to the surrogate population, and
finally, how the measured value relates to the true value for the measured unit. Assuming
that the population defined is the potentially observable population of interest, this is a
good framework for developing inference. Some guidance can be provided to structurally
evaluate the sensitivity of the exposure estimates to analysis decisions. To do so, we build
estimates from the data to the surrogate population, and finally to the population.
F-59
image:
. WW .'I.11- "!!'!' "I •'" "'''»''" '<4V.Am'.P 15 "" v.fr. f.ft "
J. iianifek 111
fabfe 2 represents a framework for successive developmeni of estimates Id fhe
palliation. Probability sampling will connect the surrogate data to the surrogate
I1"!!"1''1 i :' ,li',ii;'"!,i . "• . 'J..,' , ' Iflll , ' ' ',''.•'" i ' ,,in ,' ,ii '"i , !„ ,|, H. „ '!' ,'" ,! !:' „„ • ,i» „ " »,„ '• ','..;'!"!'»'':,!,: ^ilj. ""."j, "' ,,.i;ij||j' U j, ' - .'„" '" ,;,:!i>| 'i' hl, $ „ '''iff ,!,ii'' j v n „,, ', 'jji, ^.jit i '
^gfjfifation, arid may serve'as the basis; for infeferice to the lower shaded portion of the
'V"i Vii ;.J III,;' I'Sil n,;,'"'- ' " '' •, i '.' |,* ' ''\f '.'„ / " -in; . ' •' I, 'it ,:'!, Li-!,: ..'-il-i -,'l'ir ill1'!,, I-,:;!!!,,:•,i:*-:, •„!»'.,! J i i li: 'i •,,:,'»:,:,!''' • .... »•'' .li"'< - ,J ."", f .11 '•"-.'
Surrogate Population. Specifically, the inference consists of estimates of population
parameters, and the accuracy (mean squared error) of those estimates. Non-response,
limited coverage, etc. may require' additional assumptions before inferertce cah be
e'Mended to the entire Surrogate Popujatidri.
Improvements In the accuracy of estimates for the surrogate population may be possible
Via modeling and/or post-stratification, the models developed oh surrogate data may
provide support and serve as a structure for assumptions needed to ^ree}jct exposure1 in
,i ' ',;;.;•'» >r > .,• • ^mm ,-• •', ••, . ,. ' , j... ",(-,•-i •• .1. • : ,, i j • - i .: ,t , .M, VIP i tar < • i ,, ;/ , )<.'•< " , | ;, .;( ;; '•/;;: „•,:•' it.vi'i'i !. ' I
the surrogate population not stemming from the probability sample. For example, models
based on suffojcjafe data may develop a strong dependency of exposure on age and
ge'rider, but a weak to hull relationship with urban /rural geographic location in one slate.
Assumptions to esIM
rf Quiring assumptions) may be supported by evidence from the surrogate data, although
-||l i||^||y jy^ |jy the pfobabliity sampling inferential framework, the range of
l|rii|ivjty^ analysis {for example, varying the urbah/rumie^^
e;stabiished making used of model based estimates when extending inference to the ribri-
iirrlpled surrogate population.
, • • j,,"|!|i ' ; .i,; , ' "• ,.;, t " . ' : i : • ;'•; • ^ •» - ^ i : ' ,< ' i ',• , , •• ;:;„ i .»,• \t
Models and assumptions most likely will be the primary source to generate estimates from
the surrogate population to the population of interest. As the distance increases from the
actual data, the role of the models and assumptions will increase, this increased role will
result in the estimates being more sensitive to the assumptions. Much""progirelss is
currently being made in studying Issues of sensitivity similar to these issues In
image:
Edward J. Stanek III
epidemiology, where a similar situation occurs in observational epidemiologic studies (see
recent presentations by Wasserman.Rotnitzkyetal. (1998)). Three-dimensional sensitivity
plots, such as those developed by Rotnitzky et al., provide a way of visually communicating
and identifying the relative importance of assumptions .
Table 2. Conceptual Steps in Developing Inference from Data to the Surrogate Population
to the Population of Interest.
Data From Surrogate
Population
Surrogate population
(Assumptions Required)
Surrogate Population
Population of Interest
(Assumptions Required)
Population of Interest
Adjustments
The description of adjustments focus on adjustments due to time unit differences. There
are empirical ways of dampening short time variation when estimating longer time interval
distributions that do not require parametric assumptions (such as the log-normal
assumptions illustrated by Wallace et al (1994). Such methods (such as empirical Bayes
methods) require some assumptions, but the assumptions may be minimal and subject to
verification. More research is clearly needed in these areas. This is however an active
research area that is close to providing answers to practical concerns.
F-61
image:
4 iiunik
ffeftrencea
Qfj§&3li C-Mi Sarndal, C-E. And Wretman, J.H. (1977), FounciatlQnsof inf§r§ni?i in survey
Dimpling. John Wiley and Sons, New York.
Lttt
, R.J.A., and Rubin, Q-B- (1987). statistical analysis with missing data.\|ohn Wil§y and
i ,;•"-'fS" "_,;>• || J'/jS j I,' : ; f": j1:"1"1 ;|,I' ' !,'"'! -;,".'''•''•"."i: i" ;"' """ '""*• I.;1:1. ,' " ' ;", ' :••'•••n^* •• hi • • ^ ;'
J . •• "'i ; ;,w, .' "{,;,,.:., ,'i j.-i,/1..,' „ f;:; I, JiVJj'.i. '.'• ''.,'&'" ••*'*'\'':'''•$':. '• ; ::;, ' : "' i'..'i , ..'•-: ,••:•. if! ,""'' '.I
Meeden, and Qhpsh, M. (1997). Bayesian methods for finite population s.ampHng.
II'" T.Tf-i!""• f!": ;• iflf.^; •i!!:;;|ll;?jf-|j,*;>£ /i- si;,:' •:f i; ^. T: ,?^j1 ^r;Tr,,:»..'^. i'',".!? i;:^i•.'I'vi1;::,s,t, ir^ ': w,';£!> •,!:;i •', -... iui,, •,; *••••;•&•%- >•:• i? i i• >• ::•,,
MpnP|raph on Statistics arid Applied Probability, Chapman and Hall, New York.
Qott, A.J, and Smith, T.M.F. (1969), "|stim§tipn in mwltis§|e surveys," jpyrnal Pf the
iSin Statj^ical Association, 64;§30-84Q.
i Ii
nap, L. (1998). A tutorial on CB-estimatian. Eastern Regjpnal North
Annual Meeting of the Biometrics Society, Pittsburg, PA., April 1,1998.
A. , Rpbjns, J., and, Scharfstein, P- (1998). A 6-estirnatipn approach for
oonducting sensitivity analysis to informative drop out and npn-jgnprabl§ non-complacence
IfP!" ;"''' I!1-;1!11 f % "• J'Jj'l' |; .I.'"-" -'• .,1. ;,,;',: •'..>:! ,' .'. .j.: ' V< . '«' ,;(,', ' -r,:;-; -!, , • "!- (||si:( •'.,,, ,;:'•:..;,••',.,. -, „ '' - ' '•;!',}
in,,a r,andomjz§|fo|ot^-up stuHy. Eastern Regional North American Annual Meeting of the
.ivi .^•_. «^^"^ pjtfsburg, PAl, April 1,1998.
' ';' " ;; «"" •••; i i ";: ';" .••;•.•:•• , . -. •, •: . >. ', ' . ;:
.!i I
1II
image:
Alan Stern
Response to Questions on Issues Paper #1 (Representativeness)
Alan Stern, Dr.P.H., DABT
Div. of Science and Research
New Jersey Dept. of Environmental Protection
I believe that the "checklists" are a conceptually sound and thorough guide to
approaching the issues of representativeness. The major problem with the issue of
representativeness is not what criteria should be evaluated, but what remedies are
available. In my experience, the majority of cases where probabilistic analysis is
considered in environmental regulation/standard setting involve choosing a generic
distribution to represent an essentially unknown population. That is, default distribution
assumptions which can be employed in much the same way that standard point
estimates are currently employed in (e.g.) the Superfund Program. Efforts such as the
NHANES III project and other data gathering efforts on national and regional scales
often provide data of excellent quality foe large scale populations, Notwithstanding that
such data are often structured in a way which can permit information on specific
subpopulations to be extracted in a representative fashion, we are rarely in a position to
know who those subgroups should be in any given instance. While probabilistic
analysis holds out the potential for realistic descriptions of the characteristics of real
populations and their exposures, it has been, and, I believe, will continue to be rare for
specific populations exposed at a given location to be characterized (other than
possibly by their geographic location) in a way which will allow appropriate
subpopulation data to be extracted from national/regional databases. If such
populations were characterized and/or population-specific exposure data were collected
in a focused study, then the issue of representativeness would become a more practical
consideration. On the other hand, if such focused studies are not done, then there is
little or no quantitative basis for considering whether national/regional population data
are specific to the given population. Thus, in most cases the external data are likely to
F-63
image:
Stern
be "disjoint" with respect to the population of concern. In the absence of population-
specific characterization (either with respect to demographics, oi, preferably with
respect to specific exposure), there does not appear to be an objective way of even
identifying how the national/regional surrogate data may be biased with respect to the
population of concern.
Havipg acknowledged this practical problem with deriving representative data exposure
distributions, I am not sure that, from the standpoint of puEfic health and risk-based
regulation, it is necessarily wise that the population "of'concern be precisely
characterized. The reason for this is that precise characterizations of populations are
I'll'ii" !'• l! ' ,"| I'1' •'• iW '""..I; .. • ".'"" , '„•„„ J. Itlj I!1!',,;1 V i;,!!"" ! "'! '" Jjl, '!*! -'I1'","!:""1!,. : '.',, '!:,'"„ ''!«"!!• !• ' \ Slii" ^OM. V W"^, ,'.', '" ",(''' i " ,„ ''•»..• .- "fm , ,,, ',r»r ' ; «
(as recognized in the checklists) precise with respect to individuals, and their location in
s|ace Ihd timer Such information is only precise for a specific moment of time.
Pempgraphic arid land use patterns change over time, and distributional data which are
II!'1" I;" "if i • • ; >. »m ,i,.:.• , , •:••• !]<:,q r ;lt; "'•,;. >:«'): "•.;i'lh'1!' i", ^:," 't':, "> W<• 'if v^-"'':-7,. W£. '::• n V V,; •:/^ :"r't ^," i )•.
rljpfeslhtative for a given population at a point in time may not be representative for the
" fe •; I, ' * i:';."., ' i" • ": "., n i : i Jf:.; . j.: •;«" Si:, it.,, ii! I • .•[ I;., . •.:'' ;"!< i; , t <; , ; iw: ||_T * l ;' 'i' "' ;;:!iiif :i ,"" •; 11,; " j, <» " ;;t, i". ::,.: : fl "• '!" , . f!i
population at the same location several years or decades later. Risk-based regulatory
(lecJsiQqs, on the other handl,''are|intend'edrto'be^protective of the exposed population
into the indefinite future. Too specific a description of a population of concern may,
therefore, make a risk-based regulatory decision unprotective of future populations at
the given location. Such considerations seem to argue for more generic tailoring of
input exposure distributions to include an intentional component of true uncertainty to
address the possible, but unknown values which might apply to future populations and
land uses. It is not entirely clear how this should be addressed in quantitative terms,
Si:' , ,,. ,i " • • illl"1 '"liilllHI! ' ,„:'":,.. h, :,,,"•, ,:.",,in',"i •» ;, "!!, » ,' ' '• i!1 ill!!.«',.: i V. .i,]'- '• . ., •,' • , ', Mln", ,'"».:„'i ; iEK. !' ',,,"! i,":,".• ,.1,1 !i, ' ,l;,1lil1 i,'»i • , 'II, "ii, lll.il|i:,,
but as a starting point, it seems necessary for such generic descriptions to include the
range of values which could reasonably be anticipated to apply to a generic population
a| a site. To the extent that such descriptions are biased with respect to the current
"ill1, i i:.;, -:. . •;l in i i i in , i.».;,-,:l; .3, '.'i'lK:.. '""',..'." -si ; vijiriii::! t."':' f.'! /n '" t'"\: • '.".••• ....... . i.
population and/or land uses, that bias should (as appropriate) be toward including more
of the high risk population than is already present at the site. For example, if the
"", lull "ill i,;1"!,,"1 ....... ."III;. '! Mi .'•': 'I'd ' .nil i,""! " "!'.' ': i'S;' :;". ,' .« Lifflli (.";,::! '; f., ;,, ill: If. .<; i, ». .'{. : :,:.,: , .'
demographic make up of the potentially exposed population at a given site were such
image:
Alan Stern
that there were few young children, the generic input distributions should assume that
at some future time, the population could have a larger proportion of young children. It
may not be necessary to assume that the national or regional demographics shift in a
radical fashion (although over time such shifts, do, indeed, occur), but rather to assume
that local demographic idiosyncracies are short-lived. Thus, if a specific locality or
neighborhood is demographically skewed toward families with older children, or without
children, it should be assumed that in the future, the demographics may shift such that
the proportion of young children at the local level of a site reflects the overall state, or
county proportion. Such assumptions should be based preferentially on analysis of
regional population data, and, if such data are not available, on analysis of national
data. One obvious problem with such an approach is that adjustments of current local
demographics to current regional demographics to account for future local demographic
shifts assumes that regional demographic patterns are more stable than local patterns.
This may be true in general, but will not necessarily be true in any given instance.
Tiered Approach
The usual rationale for a tiered approach is that it saves the time and effort which would
be needed to conduct population and/or site-specific analyses. Computational time per
se, however, is not usually a limiting factor in such analyses. Site-specific data
collection, on the other hand, is a major undertaking and is generally a limiting factor.
Thus, if population-specific data are available and (as above) it is appropriate to base a
risk-based regulatory decision on such data, there is no reason to employ a tiered
approach to site-specific distributional descriptions. If, as above, regional-specific
distributions are more appropriate for risk-based determinations, and such data are
available, then, likewise, a tiered approach is not necessary. If, as is usually the case,
population site, or region-specific data are not available, and national population-based
data are available, such data may be appropriate as the basis for a screening
approach. In considering the use of such data in a non-population-specific context,
F-65
image:
however, it must be asked to what extent the specific characteristics of the national
^ata might be misleading for screening purposes. Specifically, are the details of the
njatjbh Distribution in the extreme tails appropriate, even for screening purposes, for a
ijlpj j' j ,,,| , ! t||i !;,,i,j j, 'i • j'. i ''Oil , '''' ii'''!'"'" '' '"'" i'i ,;''',,;!' |,| 'it ,i| ."in"'''' '" ' i ' ,,'•"" ''".iii'i'"|!; v •!' ,, I|M i* ' i i:' ,]!] 1M ..I,1"!1" i,, '' 'I1'!,"!!,!1 'j ,»" ' •„ ' ' ,,'• i,, ' ' ' Mi1' :i:'fi '' " '''" ''•' '• ''' ." • •
|tven subpopulation? Given the screening nature of such an assessment, it may be
ffl&fi |ppr5pr-jafe 15 gerieraieand employ generic screening" dlstribuitibhs which use
cjuahtJlative approximations specifically Intended for screen'Ing such as triangular,
"l|l!lli" ' " " Lf!:,:,; "J ,.', isiliji! •, ill r lit,i jiii. ; .lili,;,} v',,/,1: *k.'"\,l 'i :.!,,, '.I, ..Viii; "L '!,,,' ,;!S,i : ini' V! -ill. •• !.!„" ."ii-'L "l ,;„ "k i. ', '•',.'• ''
n and generalized distributions. Such distnbutions can also be applied when
^p^g'ng^pgf populatSon d'istrib'ultibhs arenoVavaiiaBie. fhese ^jg'^^^^^^
u)cS iaescnbejfer'examjple, rotative minimum vaiu^res^ma'te^ 18% values, mbsi
||eiy value's, esfimatet5 ^10°^'vatues and relative maximum values. It Is not necessarily
clear ffiat such generic distnbutions would hot be more appropriate lor scfeen'ing
plirpos'es than national populatioh-based data. Using such generic default screening
;• , „ i .ill'! 'rij , ,,|, iir '„, ,'ii', ihJ1' " „" ', . „ ! i'1',,' "I i '', ! '! '" i1 ' Iji, ' '• ! .. 'll11:,1'!!'.! ll'nlll I,,1 n!,1" .".r «, ,< , i1'1 ,!: L, • ' |, , ' Ml ill. 'ji,/ ij;,,.i|
distributions would have the additional advantage 6? es'tabtisftlng" specie, ahcl easily
identified rebuttable presumptions which would* form the starting point for site-specific
;':.(-,! :,»!' :-J if , ,„,"
ji,:: i -J'pjii- i I
,11 i iff , '.t\
modifications, thus, starting from a default screening 'd'isiriSution, it mlghl no! be
necessary to generate a complete site-specific distribution in order to move toward
site/regional specificity. Rather, consideration of the detauit distributibh may help locus
4L j i II '' ':ittt.- 'i'""i-1': -<K*< i-':""': ;&••!''%'..'iiLi^t'i :,:i'; ;!;::: «*: • /(i '->i $*& ,-••,I , £ :;>'. ',-1;*i .v •%;"":|
e need for more specific mfonnation, and it might be realized that the most significant
^|fference between the default assumption and the actual site/regional-specifie
"i, "i IIL; >-"U I!-.1,! ' SSI1 .;.•',•'!! , ":*" it'sii •' .v'ii't "f .';.••'. :, i >;• " , r..; . •• i •< * ui"1"1, • , ..':•'!•. •.•'-•'".' •..;'''.,'.' • «• !• ' • - >". "' "
distribution lies (e.g.) in the upper tail of the default distribution, thus, it might be
ftfecesiaf^ Bnly lo collect data gppr5prja|e |o mod1 jfyih" g" the §5% value in the default
' -". Distribution. '"" , ' ' , ' " .' ' " , ,
if! if
1,111!
In II "
'"'I1,
image:
APPENDIX G
POSTMEETING COMMENTS
image:
:":-":,«*: M
<
f"
"JSj
ll/i
i IJB •:;
i'-iitl
image:
David E. Burmaster
27ApriM998
Memorandum
To: Moderator, Participants, and Attendees —
Workshop on Selecting Input Distributions for Probabilistic Analyses
Via: Kate Schalk, ERG ~
From: David E. Burmaster
Subject: Thoughts and Comments After the Workshop in NYC
After much more reading and thinking, I remain staunchly opposed to letting the US EPA and its
attorneys set a minimum value for any or all goodness-of-fit (GoF) tests such that an analyst
may not use a fitted parametric distribution unless it achieves some minimum value for the GoF
test.
In honesty, I must agree that GoF tests are useful in some circumstances, but they are not
panaceas, they do have perverse properties, and they will slow or stop continued innovation in
probabilistic risk assessment. The US EPA must NOT issue guidance, even though it is
supposedly not binding, that sets a minimum value for a GoF statistic below which an analyst
may not use a fitted parametric distribution in a simulation.
Here are my thoughts:
1. Re Data
For physiological data, many of the key data sets (e.g., height and weight) usually come from
NHANES or related studies in.which trained professionals use calibrated instruments to
measure key variables (i.e., height and weight) in a clinic or a laboratory under standard
conditions for a carefully chosen sample (i.e., adjusted for no shows) from a large population.
These studies yield "blue-chip" data at a single point in time. These data, I believe, contain
small but known measurement errors across the entire range of variability. At the extreme tails
of the distributions for variability, the data do contain relatively large amounts of sampling error.
Even with a sample of n = 1,000 people, any value above, say, the 95th percentile contains
large amounts of sampling uncertainty. In general, the greater the percentile for variability and
the smaller the sample size, the greater the (sampling) uncertainty in the extreme percentiles.
For behavioral and/or dietary data, many key data sets (e.g., drinking water ingestion, diet,
and/or activity patterns) often come from 3-day studies in which the human subject recalls
events during the previous days without the benefit of using calibrated instruments in a clinic or
laboratory and not under standard conditions. Even though the researchers may have carefully
selected a statistical sample from a large population, no one can know the accuracy or
precision of the "measurements" reported by the subjects. These studies yield data of much
less than "blue-chip" quality for a 3-day interval. These data, I believe, contain large and
G-1
image:
;;5xi, ;•>''•• > >'; i ii !'• - - ..':"!.. ' ?">' v ' <• •'• ••." • •, \ ;-" 11 "' David E. Burmaster
unknown measurement errors across the entire range of variability. At the extreme tails of the
distributions for variability, the data also contain large amounts of sampling error. For a sample
with n = 1,000, any value above, say, the 95th percentile contains large amounts of sampling
uncertainty above and beyond the large amounts of measurement uncertainty. Again, the
greater the percentile for variability and the smaller the sample size, the greater the (sampling)
uncertainty in the extreme percentiles.
My conclusion from this? With all sample sizes, certainly with n < 1,000,1 think the data are
highly uncertain at high percentiles. I think it is inappropriate to eliminate a parametric model
that captures the broad central range of the data (say, the cehtrar"90 percentiles of the data)
Just because a GoF test has a low result due to sampling error in the tails of the data. (This
observation supports the idea that fitted parametric distributions may outperform EDFs at the
tails, of the data.) Asi Dale Hattis has written, use the process to inform the choice of parametric
models — not a mindless GoF test.
2, Re Fitted Parametric Distributions
As is well known;
a 6-parameter model will always fit data better than a 5-parameter model,
15-parameter model willalways fit data better than a 4-parameter model,
a 4-parameter model will always fit data better than a 3-parameter model, and
13-parameter model will always fit data better than a 2-parameter model.
Thus, GoF tests always select models with more parameters than models with fewer
parameters.
TM§ perverse behavior contradicts Occam's Razor, a bedrock of quantitative science since the
13th century.
The venerable Method of Maximum Likelihood Estimation (MLE) offers an approach - not the
only approach — to this problem. First, the analyst posits a set of nested models in which, for
example, a n-parameter model is a special case of an (n+1)-parameter model — and the (n+1)-
parameter model is a special case of an (n+2)-parameter model. Using standard MLE
techniques involving ratios of the likelihood functions for the nested models, the analyst can
quantify whether the extra parameter(s) provide a sufficiently better fit to the data than does
o'oeMthe simpler models to justify the computational complexity of the extra parameter(s).
3, Re Continued Innovation and Positive Incentives
to Cg|Ie.dj New Da^a snd Develop New Methods
Oyer the last 15 years, the US EPA has issued innumerable "guidance" manuals that have had
the perverseeffect of stopping research and 'blocking innovation ~ all in the name of
jitii.
lr| my opinion, our profession of risk assessment stands at a cross-road. The US EPA could
specify, for example, all sorts of hurrieric criteria for GoF tests -- but the casualties would be (i)
" I continued development of hew ideas and methods, especially the theory and practice of
G-2
image:
David E. Burmaster
"second-order" parametric distributions and the theory and practice of "two-dimensional"
simulations, and (ii) the use of expert elicitation and expert judgment.
1 again urge the Agency print this Notice inside the front cover and inside the rear cover of each
Issue Paper / Handbook / Guidance Manual, etc. related to probabilistic analyses — and on the
first Web page housing the electronic version of the Issue Paper / Handbook / Guidance
Manual:
This Issue Paper / Handbook / Guidance Manual contains guidelines and
suggestions for use in probabilistic exposure assessments.
Given the breadth and depth of probabilistic methods and statistics, and given the
rapid development of new probabilistic methods, the Agency cannot list all the
possible techniques that a risk assessor may use for a particular assessment.
The US EPA emphatically encourages the development and application of new
methods in exposure assessments and the collection of new data for exposure
assessments, and nothing in this Issue Paper / Handbook / Guidance Manual can
or should be construed as limiting the development or application of new methods
and/or the collection of new data whose power and sophistication may rival,
improve, or exceed the guidelines contained in this Issue Paper / Handbook /
Guidance Manual.
References
Burmaster & Wilson, 1996
Burmaster, D.E. and A.M. Wilson, 1996, An Introduction to Second-Order Random
Variables in Human Health Risk Assessment, Human and Ecological Risk Assessment,
Volume 2, Number 4, pp 892 - 919
Burmaster & Thompson, 1997
Burmaster, D.E. and K.M. Thompson, 1997, Fitting Second-Order Parametric Distributions
to Data Using Maximum Likelihood Estimation, Human and Ecological Risk Assessment,
in press
G-3
image:
P. Barry Ryan
jr'lt
Unfortunately, EDFs are not readily amenable to analyses that lend a lot of insight (cf., Wallace,
Duan, and Ziegenfus, 1994). If EPA codifies a fixed value, even in the guise of ''guidance"
pretty soon rib pdf will be safe from legal wrangling.
We spent a long time at the workshop fussing over definitions of representativeness, sensitivity,
etc!, with little focus on the utility of the techniques. EPA'rMy'weiTbe" In the difficult position df
haying to
defend everything frorn a legal perspective. However, the jDfedccu|3ati6ri with numbers often
comes at the expense of insight. The role of probabilistic assessrnents is the latter. Qur goal is
to understand exposure and its influence on health, not to focus bh a specific value of a GOF
test statistic.
in'
be used
b| prbfessibhals jfarniliar with tHe nuances bf the problem at hand ind the techniques used,
trjejr limilstions, and strengths." I' object 'to the cookbook approach to this type of assessments.
, j
will now step clown off my soapbox.
P", Barry Ryan
nt and Environmental Chemistry
E-rnpry University
,
Xlanta,
(464) 727-5528 (Voice)
(4p4J 757-8744 (Fax-Work)
bryah^sph.embry.edu
.1 ' I 1,1 "I" ,
i;:f! '*!
C'bfleagues-
I read with interest the comments forwarded by dr. David Surmaster regarding the conference
from last week. "^ i '' " i ' "^ " ''_ " ^' , .. .
ff'i • • ''.:, ; '• i« ••• • •-;';.*.i1:-'". , •: • ''•'•••.. • ; i !; '•;'.' ' ';'' V'Vi.' ' •• -J rV:. '•'•• ,; ' "•.':• ' '•''
I would like to add a few similar words regarding the codification of any specific values for any
• • njlljjj!|i i" LI '"• ' , " 'i i",,,, in!"', ' 'WMI . nil,.1'i'lib i,1',' ,„' ;,'! Him,, iLi i'111,,,! , . , • , , n .• • ,i . "• • . ," ,.• i ' ' '.• " »•: " ::• i • » r v, f:: ,,
specific goodness-of-fit (GOF) tests.
GpF tests, by their nature, are very restrictive in affording acceptance of a distribution. For
elarnplei the Kolmbfgbrov-Smirhoff test chooses the largest difference between the observed
dfta and |he theoretical ranking and tests using that. Unusual occurrences in data, minor
contamination Of by other distfibutiohsi etc., can cause rejection of distributions that otherwise
p|§s the"duck tes"tr(if" it walk'slike a duck,.^]even if brie'point IboEi"a little more like a pigeon.
The GOF test will end up rejecting pretty much everything leaving one with no choice but to use
image:
APPENDIX H
PRESENTATION MATERIALS
image:
i!*
'.Ml!
> '• ill
; , ..... , .......... i, , ...... I.L: ., , 5 fii ...... i ,3 . lii ........ .1, • i ...... .. a:, ....... , ...... lie ..... • ....... y ........ . ,: ,11;: Jllilllt :, In ..... -a . ....... a: 1. ai ...... >I ..... Hi ...... '.t!:'; t! ...... I ........ : :„.,.= , >• ,S1 ........ ..... :i. ;:: .; ..... til: ...... ., i .I';. .: I i. A- : ...... :,:' ....... ii ...... ill ..... ;lii! ....... ; iliii-' ......... Miiit ....... . ,„:,!' iliiai- : : :iiiifer£ ...... ,.i ;
;a.ii ifc :„ -i;,: ..... > ......... I ' ...i.'i fSiii iiiili i aiiilLilii ;' ..... .iilil ..... II
image:
G)
£
O
,2
o>
c/>
O
§
03
Q.-Q
O
CO
CO
CU
Q.Q
DC
c
CD
0)
0)
0)
CO
0)
H-l
image:
•«
• i
, v • jj'1 iiin '•.!,'» , i iiiip't i,,!"!':j,,i!!!" i .'"'"', n,;: , • ;' !'""' "'/iiiiiiH • ,"„";'• ,S*
;, « rte^.i^'v'iJIJ' _C
• :- , L , ;-;i : C co
., . • : 3 C
0 o
Q) CO
Q |
(Q €Z
» , . , .~g~^^ ^—>
1 • ,,' * ,, >• Til t
o
Eo
fl\
VLJ
•• ,' *;•'- "3 'DC
.I •••;-. . .^..- -.^
. : •. t: •.•:'- O c
l>< o
|M|dijJ|H^ :: ; r^)
flip|lw8'~'; fl5
I:S|, I
mill LL
•illiri'SiSllS
!!!! ' Vlk ^1
CO
.•.ifc-i'i^ * i*. .-^;.!.ti a .vti ^ .-'urf, /
"E
CD
c
0
o
CD
"2
0
0
LL
0
C.
"c
0
E
I^MHH
^/\
^J J
CD
CO
CO
^
i.:. 'if'.ki1
c*5*
00
O)
CO
^
Process" -
0
D)
c
O)
^«J
c
CO
^>
H-2
1 in* ,L, »i I1:,, 'i:'1""!;!1,
Q_
LU
E
o
**—
+2
_"w
HHJ
C
_Q
"o
(f)
O
"c
CD
0)
CD
13
O
CO
^^^^^c
E
0
!>
"
.;. ,, :,.,
CO
0
.Q
H^
O
co
O)
o
ol
"D
C
CO
co"
c:
g
0
DC
co"
0
O
^^J
CO
^^•H
O
CO
:M,^J
CO
+2
"c
0
E
o
o
Q
0
Q
C.
CO
d
0
"o
O'
~' '"\
T3
E
ct
"
«v«l::..
V
CO
be
ji
Q
c
o
o
o_
9taAB
0
DC
"co
C'
o
0
1-
,a,«
1
1
ssues
!
^^ •
tr\ m
1
^J J ^1
0 1
CO I
CO I
< 1
•
.!;;;i,i:,!;;! ;;:i !!!:^i«!.i'i- !ii;X;i iSiitl, I
image:
«E
w
(0$
(0
(0
O
0) «
o> cc
o ^
o2
U-CL
CO
CD
CO
_g
"co
~p
m
CO
-^
o
TJ
O
O
LL
•
O)
•
CD
,S2 1
o
CO
_o
"o
Q_
Q_
LU
E
o
o
CD
o:
co
co
CO
CO
2 s
CO °^
co ^
O) O
CD +-
o
CD
CO
CO ^
CD O
CO O
ircL
H-3
image:
f'illi'
f~ I
i'lJS,! 5
PCHJI,1'"'iii,. i!',,
^Bm
<D
m.
iCL
CO
O
CD
IH
iH-4
£0
O -r1
CZ
aeo
CO .•<£
•10
«0
","«! ' ",,;•„,' '!':i :i iniipg! • i '"w,H""' •' :\wi \
image:
o
0)
E E
.Q (0
CO
0)
to
o .co
CD
13
CO
CO
_o
"6
Q_
0)
0 0
L
DC
lit
O «P c
CD fS "J5
»-a>t:
co ;,* o>
"S.SJ o
_ CO CO =
OJ = O "o
JQ : co
•S'g-fe L
O - - *-
Q_ CO v, ^
- c c E
o < "E 59-
C/) H 05 CD
: co c co
I
0)
o
CD
O
O
CO
o
O
O
g>
LLJ
CO
CD
o
g
OS
CD
E
CD
_O
"o
o_
H-5
image:
•.'••'•', S 1 ;'•' > n«!1!ll:K >,
<D
O
(0
10
IS
o
o
O
"4= O
05 13 —
- JD CD
CO -C JC
O =
s g>
.S CD
-
O CD O
O •- co p>
.j
c
Q>
CL
CD C J±f
CD Q
8 .
o <2 —
0 CD 05
S ""
05
O
o
CD
CL
CD
^•^ ^»
Q
Q_
Cd
i I ••— /^s ^\ *•—
O Q.Q O OT -S ~ O
CD O CD
O)
CD
o £
CO
^-» w CO
o c: o
-Q 05 Q_
LO
H-6
image:
C0
CD
o
c
o
CO
CD
CD
•••
Q.
E
t i
"5 °°
^ CO
<
_o
CO
o
CD
°-
LLI
0
o
05
<
_O
CO
O
CD
0)
2 o
o
CO 00
Ifl
COLU
05
CO 0)
3 0
O C
CO
CD
CO o5
O CO {£
Ct5 ^ *-
CO CO
Q)
CO O C
O "•*-* o
CO
CD Q5 C _
— - - " CO
*= vP C C
O ^- 'F -d
CO •— O
c: co CD
= ^ O) CD C
O Q < Q<
LL A A A
H-7
image:
it; '
•\
c
o
o
&>
CD
-o
c:
CO
CO
©
8BPH5
0>
o
o
, • *l
' if,":'
,11'.!,.,,
.' ''!" !
it <
';3P^
1
Q.
€0
C
O
0
CO
1
Q JO
11
I
H-8
Q
X"
H—
C6
0
•BBB
0)
13
Q
to
JD
03
3 Q
S"
<D
^iff^ •••••
••H^^f 'iffff
o
image:
o
Hh*
0)
o o
CO
LL
.Q
x
UJ
.= £
jj> &
en
0.
0)
< o
is o
S|
-2§
co "c
C CD
O CO
•^ CD
CD
^cc
Q-^
CD
CD
LJ_ Q.
O
i wa^^m
^^^^-M
O
CO
CO
CD =
CO 0 CO
DC D_
H-9
image:
C/3
C
Q
^
si
«
«X
•>-3*
O
'S
r"^
&•§
&H Q
<D *.
43 *8
C^g
o ^
+^ ^
co c>
•FH ^
^-i a
' IS
^
C ^
•T—I -r-H
« h
<D CD
c: >
' *'"* >-»v"
a ^
Og Jg
aD^
<D CSI
S
>T-^ VVJ
*->
•rH TAN
-lj
O ,g '^J-
•*-* o
G S
CD cd
Su
S ^
|i
^
O
d
<D
OJQ
<
d
o
•p-{
^ S0
Presented
onniental Pr>
New York
!H
* ^^
a
W
00
rs-H
r^
<N
<N
r— 1
<N
^
C^
<
00
•
U>
HH
• •
• r
H-10
image:
e
a
o
B
O
O
in
O
C8
es
§
«^N
w 3
£
o o £<
OH OH
H-ll
image:
Qfi
cc
fi
=5
O
•N.
Qfi
5-
s
i*
-S
JH
DC ^
S !-
cs
e
s
H-12
image:
H-13
image:
H-14
image:
communicating
a &
^ ^
g .g
SP-N
- ^
•^ »^N
13 O
. ^5
»™2 *^^
ce ^
• PN ^^
Poten
across
WD
eg
3
<Dfi
C!
3
=
9
H,
c
o
o
s_ ^
M —
^^ ^^ ' :
^ - ^2 '•"•^
^^^ ^^S *'ii111^
o ^
;/>
O
H^i>
•B
H-15
image:
,n,! iwiot i fi'i,''!"",
-
>»
ft 2
O
O
s
fi
O,
Q
5-
-=
-fi
53 m
^Q
O
O
=
O<a&
^••itf
5£
fi'H O
m
ica
ech
m
O
»l
S3
•/*fl ''^
CJ
a
;p«i<
A
a
=00
T^1
i
—
© as -
- ! JI
IBMj6
image:
ff^«
a
• •
4>
"I
5
0)
H-17
image:
; ii,. ••.
"'.:.,!! v::... til rilii,;,:'.;,':.• •:•;„ i/1" :!',,3! •,: i ,' i'•;':'" >,,.•: ii,.!' >•"., v,;•: "!,"::,:,: it,-it&..,;",s:;' i'81 £• . ;:,*,. „:;:.:•>•.-.si ,
..IrMiiftliit'.iicatilteifctliaa .."L All..!: liJi'l ili,'.,llil.',i:'llii:iil »!' Hi.!.'!...!", lltai.ill!.'ill l.iiitiM^ BtilLVtUB'.''! Ji»B MHJuJUPHgBIBgaiBiaiiatf '•"' "= '"•' *""'"' I ''!''.'•
i;;1!:,:1^
,;;, [T
«i \,^
>»
1*1
*
HH
gg 2
d> ^
r "•" "•-%
O) _
i> 'CS
0 a>
"•a
r 1
CU
«3
O>
CO
o
••-<
CO
•»•*
^
<v
03
SH
C3
^=
u
a I ^
Q • —«
&
VI
•w+
U .^4
CS
a
5/5
5C
SS a>
u
a? as
r *
ce
j=
53 U
S3 -fi
,2 U g
•^ SH
ee rs o
A
PH «
o a o>
P^ c^ H
O .3
2
'53
>•
ett
o>
^
CQ
CO
Frt s _z
w ••• ^>*i
0>
;! '>: '.:<•!, i !»;•
't-^iiiS)"!::!
image:
c^»
&
&
3
<»
c«
b
0>
X
OJ
S'S
es
s
03
O
s p
fl
O
ES
a
H-19
image:
o
CO
CD
cd
• 1—I
CO
CD
ctf
CO
CO
CO
CD
CD
CD CD
CD
O
ctf
CD
S -d'
2 ^ —
S w> o
b£ C co
CO
CD
CO
CD
CO
?H
fi «
PH
X
CD
>-»
^ c3
^ -i—i
o
3 S
g »
CO
CD co
^^
^
PH
CO
CD
Cu
CD
CD
O
CD
W)
CD
£> ^ to S CD ^
^ r-J -TH CO J> "•*
PQ g ^ S '^ ^
T3 S
CO
<U
a
CD
CD
CO
CD
<D 5 ^ O .> _CD
T—H ij CL r->. 1T2 »-H
M.X
S
co
co
CD
CO
co
CD
CD
CO
CD
CD
CO
CO
CD
O
CD
CD
co
CD
O
CD
11
H-20
image:
£ * W>
v a
PN
go
« . 13
r
s
2
T
O
2 S
5« fl
.^^ PN
.9 -«
•c .s
H-21
image:
image:
H-23
image:
image:
H-25
image:
:''Ml *:i!
CO
CO
CO
VI ,
SBi-,
a1;! I, -i itc
•Hi' • • '<•> ;;• "i liTiiiiiiii 11
ilitll1' . ' , < i i, ",i i1 jft ^ML
i • I i>9 • Jin!
.1 f ;. • ' I,.: :|
, '!, Ht ! •!
'
1 ^I^^B^
CD
CO
CO
« 'S
CO
CD
C/D
H-26
image:
H-27
image:
H-28
image:
H-29
image:
H-30
image:
H-31
image:
:,»;;I N •«'!S ()•,; i.; ! •' "i 'JWIIIII-'!:* yiMW. "' '•''"i i',"l| i! ::r -ie tJ< Ji tt'S"1*:1."1 *•:;•» ! ':'«!"' • • !"S,:
;'',- inii.. I""1 "i -i*! T;»,,a!'•••$ ^imrrirr
H-32
image:
c/s
e» ee
ca as « B
H-33
image:
I,,!; , !!Hllllii|i»!l"!!|l<|j , p •;;'.' lllhlWii'.',!!!'
,*!!1: ", '! Hi !".i> III. H"1 i i :!»' I,"*
I1' , .1
1,,
\ ; .
•r
!<
)l,
<il!
...
•|,
:m i,
i1 I1 i.
V '1'"
i , : " ,
i '
'!• ,
i.
'ri'l
,i;ii,:|'
r ,
i
1 '
•i i
i1
'ii",1 ; '
d
o
• ^-4
.^^^^
^" "i
^^^j
GO
•T— I
Q
o
^^J
§
1 1
Ctf
^3
"*^<
GO
CO
.2
ts
3
PH
a
fv| .0
o ^3
•2j 4D
^5^ * :fim4
O&
GO
H 3
C5 r— 1 CO
OC3 r" !
. Q §
«F"I »T-K y
^5 ^H *p
5^ o . O
SL^ c*^
fH r-j
X W fc
s
GO
a
.2
5
• T— 1
GO
« i-H
Q
o
•T— I
13
a
^
c2
• T— 1
PH
^4H
O
I>N
4^*
• ^^
i— H
OS
(&
too
..g
GO
GO
<D
GO
GO
<1
'• ' 1 Illli.' !«,, i !!,! '.ill. I,,, ' . , .,„ IM , Ml' IN "!«' ',', , ', I, i',,, ''i , llll 'iiiil'i!! ,., ,J IN '' i i.'« , "'111
H-34
image:
t.
Q
&
. =
o
n Functi
.2
"3
,£
*!Tj
JJ2
r^s
V^Q
^ ^n^
Q
i"N
0S5
.g
*Q<
1
smallest
|
jD
O
CO
r\
•j
X
€\
Cw
Cw
CD
.^
•4-*
ctf
CD
Tf\
CO
£H
^*"l
CD
5-H
a
CD
>
O
£
0
• T— 1
3
.0
• 1— 1
£
CO
• i— i
*&
CD
>
• _H
13
1
O
CD
4H
+-j
CO
• i— H
Pk
Q
W
CD
£
*f
VI
:
VI
0*$
i^
VI
1— <
K
c\
co
CD
M
O
X*B"^> £H
v^ CD
sti ^
&
r— i
*rv
5 -2
IT °
\li s
s* ^
9^^ ^H
<4H
- 1
i I
^ 43
o
i
O £H
,0
"o
C3
VI
J^
/J
^ c^
| T> 'O
.
C^ Jt^
CD 55
S *^
* 1 1 g»
CD ^
y^j Z^
£H
•B '
o
a /r
^2- ^>-
<[j
c3
&
"co
• i"H
r* ^
^** *~fc
^J
^^^
CO
* ^^
^^
^^
ffi
8
5-1
^ CD
•H
CD
>
• i— i
ts
"»w
r-H
1
^+H
-*-J
CD
co
CD
-*-^>
CD
•s
CO
• i— <
*73
CD
£
CD
H
03
PH
Q
CD
^
c^—i
O
CO
CD
*c3
WSJ
CD
H
o
Al
/~
^
""s
*
5
-^
O
s
^™
(~
<-—
^s
^i
•^,
•^
N*
*^
^
*.
4
c\
-A
•*.
H
c\
s
2
*/
CO
CD
-4_
J
• i— l
-s
o
PH
H-35
image:
.Si'.! 'v ;,]'.;* vv* '
'
Hi
i.
,. ,P;t
'1 1
•i,,,'1'
LL
Q
LL!
o
,0
Q.
£
CO
X
LLI
Q)
D
D)
LL
I
o
o
oo
o
o
o
o
CO
O -J-J
o m
in .:£
(0
o >
E
o
"O
o C
o (0
co Q,
-- o
o
o
CM
o
o
ooooooooooo
H-36
image:
CD
"o
§
0
T— 1
1
»f
r\
en
CD
1
§
, en
CD
>
• r-(
s— >
fc o
ft 1
w §
0) |
^^^^ •^••i
H""^ ^~*
j3
^^^^^^J 4V^
® §
'W5 g
•PM CD
o3 1
a -i
o >
PH •
13
60
M
"i- ^
>r M
rs S^
|l
T3 ^
^^ ^^
'a AI
CD ^
^^^ cs
& fl>
1 -^
CO rv
cn i-^j
CD ^
^ g
1 1
H CD
^ 50
1 ^
<^ c
a ^
en _r
en G
CD ^3
^ s
ft** • fM^
> X
^ ^
§ a
o ^
o G-i
^H &^
T^ en
^ CD
^ *g
^
G ^
en T3
<D -22
^ « 8
'*<-!.,* S
0 9 '- - -§
8 | g
SH S ^
CJ CD
•n "CD I-H
rt 1
CD 5 • 03
a cj c! • en
pCJ en -= -.
H « 1 .S
. ^ g ^3
fi ,"*"* en
& ^ ^ CD
a 2 1 .1
« 0 H +j
CD CH Oi g»
^ cd M <:
t'i-§ 7
3 > ^ • '
1=0 CD "g — S
2 ^ n
rT3 ^ CD 2
^ d ° JZ
5 «' S ^
4n .2' f^
^*^j! ^™ L^ ^^(
§ ft £ w
& ^ > en
° ^ _§ 'rT
.23 a ^ §
P^ en en M
9 | 1 ' «
py TO *^j _ <~{
^g 13 ^ o .
C4_( cn T~l <D
0 'S ' '• 8
a- i ^ i
^ CD /*, "jr"
CD i^ O • H
S LS -5- >
CD & § CD
^ Q g, ,J3
H P-l CD H
•
dues of simulated EDF percentiles are equal to the sample
> *
en
i-r-j CD
CD VJ3
1 1
f"< ?— i
X CD
WQ,
iF"™"1
•
a
CD
i— H
PH
Q
CD
•S
bO
• TH
VH
CD
^
3
T3
CD
^
CD
^
en
en
• i— i
a
0
.i— *
S
:2
43
en
^
bO
a
• l-H
CD
§
CD
£
HH
•
ate the true mean and variance.
a
• T— 1
+-J
en
CD
6
*T3
2
H-37
image:
^
Q
A
o
=
^o
3
cs
inearized EDF
h-4
bservations
o
ing between
"o
f
CD
,xtended EDF
w
1
CD
-2
CO
CD
CD
O
T— I
(DO
a
"d
"S
#
CD
E
CD
#
based on expert ju
' of the exposure variable (EDFs
r\
CU
&JD
fl
CS
•^
O
•»•*
*J
%
«p^
73
o>
'»-
<w
!M
O
=
et
SJ
V»
H->
O
CD
q3
CD
^
0
CO
O
3
CD
ts
CO
• 1— 1
cd
-*— >
CD
O
-§
2
73
• PM
^M
=
0
=
0
&
M
--
TS
O)
M
•P-<
CD
•S
3
r— 1
1
1
CD
a
o
' CD
i
bO
1
O
CD
1
cd
>
based on extreme
ehavior of many continuous,
,-Q
exponential
CD
-a
r— -(
CD
*d
O
S
^
fe
Q
W
CO
§
•-£
3
unbounded distrib
H-38
image:
Figure 4. Comparison of Basic EOF and
Linearly Interpolated EOF
n c«n
o
\
\
\
\
A
\
\
\
\
N-
o o o o o
o 10 o 10 o
CD 00 00 h- N-
O C> O O O
1 1 ' 1 — 1 1
)0 250 300 350 4C
Random Variate
w
CM
H-39
image:
lilt Till" i| '<
(M
1 'li|'"|' i 'i|v
„: t ; I"
05
«*-'
05
mmmm L»J
£ ^
I
O
(D -
LL
M—
O _
§5
CO -=
05
Q.
I?
05
11 I I I
q
d
O)
c
CO
o
x
T LLI
Q
o
q
ci
o
q
d
13
O
:'if:{:.}:
H-40
Jiii'i „» '" - /,;"!'"• :
J ••••: • ,*.(
1,1 h ::,vif. itil •,„ " „,:*,
,: ,, ,,,,,,„, n i,,,, » ,. i ii, I;M| ,,i
'»• i* ;;'; ,.:i' iSiii'.;. iiirlif. i !.,» ; .: •: .ij!:i:,' t, "viil, ';': j, t' i: • ,"!-;>. ,'• jlil! •,' 'fa' SiE* '
iili;!:£:;:::; li I
image:
w f" "S
fc
-*->
o>
PQ
g
N
.S Q
-J
gs
& ^
C/J
CN ON ON OO
OO l> CN
C"- *xF en
VO CNI (N ON
O O r-n'
r-H (N r-H (N O
oo in ON ON in
oo Tf \o in -xf
O T—i in rn
O O O r-J
OO CN ^O O 'xf
O rf ON en oo
o o r-H in -o o o
es oo
en ON
O en
vo eN en
O
O
OO
oooinooooo
in ON 'xt* eN ^" vo ON CN
oo oo en oo r-H rn oo
^xt" r^- r-H r-H vo CN en
• , * * « • • • ••
<^> O r-H VO O O O r-H r-H
CNt-^ooO'sj-ONOen
(NooenONeneneNCN
oo en vo CN en oo o ^f
O O r-H ^O O O O r—I
in
in
en
o
CO
CO
s
<D
S
CO
'co
3
*H
CO
O O O
r-H in ON
in
ON
H-41
image:
,: • ' : ! '
"'!' ' "!• ,4
, ' ' ' ' 1' ' 1 >,j.<
' < , llll< ll'ill,
' 'i' •'' i i i ' "! "
i|"" • i1 J ;;:' '
i , ' • "i , ! 'L
'.. ' ' , "„„
,it, ...-I '•;:, '
i ''i1 iri.'
"l " ' f1
n , li
'' III
!.: . ''""
; ' •'
'''" ' ':","
, ' !! i" , '"
" i ,"'i! i. V;,''" !:
) ""'i. Jr ' .
" ' j 1- "it
; ;' !"•:.'
li; ,i , '
;| , ,M ri, i; ' ,|
T3
1
<D
&
GO
•rH
PH
Q
W
i
r-j
^O
^
d
'^
GO
CD
O
s
c^
GO
§
Jj
T^PS ^^
2 o
^^^^^ ^^i
w .d
O o ^
*|^ CD PT (
2 ^ Q
3 ^3 H
ry g 2
^^^^^^ ^^J W^
r^- , ^ ^.
s • °
fjrj
liiJi " ' ;' '!'J|: : " ' , •.;/1'
>;; : , ! "" • ';;:
iihii'ii ,r ,:„ ' ,1" liiilillll i i, , '•' !•:,,:; • ii,
CD
^o
a
WHwA
w
p-^
o
GO
Q
w
g
o
• 1— 1
H?
^
.a
GO
a
o
•T-H
c^
3
GO
CD
CD
^s
J
•
H-42
CD
a
^CD
t<
CD
c\
Tl3
CD
•E2
CD
•^
CD
^
^
O
i ^{
GO
CN
CD
GO
GO
,-
Q
W
C^j
d
CD
GO
1 !;.v. i!c •?.;;.,' ;.&*-, ;"...; ••'• . ;ir .
.'..'' jiiijiii" 'i'i.'ili'jiKiis i-iiti-; .[• ,:fi i:1 ''..'•iiiv,';."
O-
CD
GO
r"^
CD
GO
d
.2
"GO
5-H
CD
^
CD
• rH
S
t
o
':.',', r, ' , ; 'I1 "
I'- /-i^V" 'r' tl 'itii' i:''it HliRl iilil !
1
1
1
•
I
I
I
1
1
1
1
1
I
•
•
1
1
1
1
1
image:
^
o>
QC
^
a^
fi
TS
O
O
O
not rejected by the
o
«i-H
-3
I
CO
• T—I
T3
O
. 1—I
-*->
>?
<&
i
s->
CO
CD
• T— 1
CO
CD
•
&
O
CO
CD
CD
o
<D
>
(D
"I
a
CD
CO
O
O
0
T3
CO
CO
a
o
CO
CD
O
£
CQ
03
t3
ce
O
*a
. ^H
a
bO
• T—I
CO
^
O
'^3
CO
"is
-*->
CO
•S
CO
a
CD
CD
PH
CO
§
^O
_co
1/3
O
O
CD
O
CO
I
s
a
-S
O--
>>
^
"pH
-t->
^1
<4H
O
CO
-S
CD
a •
CO
CO
CD
CO
rf\
CD
ts
W)
* ^^H
+->
CO
<D
1
CD
£
T3
3
o
^H
CO
C\
ID
^
-4— >
«
CO
a
o
•l-H
^3
rS
£
CO
S'
o
<-w
*\
cti
S
§
i
60
*H
_^
-<— >
CD
1
£
PH
S
£
bb
CD*
.
co"
a
o
•43
^
cd
H-43
image:
BAYESIAN ANALYSIS OF
AND
OF ARSEN|C CONCENTRATIONS; IN
r.S. PUBLIC WATER SUPPLIES
MSI* I
John R. Lockwood and Mark J. Schervish
Department of Statistics
Patrick L. Gurian
rjepartment of Engineering '"&, Public Poiicy
'|| '
1 411' ' I ' 'ii
'..;''<!:,' " Mitchell J. Small , ,
Departments of Engineering & Public Policy and
';\:, if :•., • '"' Civil ^ Environmental Eng.
Carnegie Mellon University
presented at
7th Annual Meeting of ISEA
Session 1: Approaches to Uncertainty Analysis
"' ' '''-V; ^search Mangle Par, '
' ' ,'••" v; t;''November §r
Sponsor:
U,S. EPA OGWDW
(Usual disclaimer — )
H-44
image:
OBJECTIVES
Illustrate use of Bayesian statistical
methods
- variability in an exposure factor
(arsenic concentration) is represented
by a probability distribution model
- uncertainty is characterized by the
probability distribution function of the
model parameters
Illustrate use of probability distribution
model with covariates
(explanatory variables)
- allowing extrapolation to different
target populations
H-45
image:
gy
Probability model with covariates
- lognormal distribution with mean of
.S •
"Bl'l
, ;.:£>,. region,
• source type (sw vs. gw), and
• size of utility (population served)
— constant variance of In5 s
Bayesian metHodology
v ' '' " ' • '' " .1 , ., ; . ;; ! j ' '$ '. ;i "" | .' Ji . . ? „ /Ifii ....... , i ,. ",:« ;if , :; • „ -;' ; vij, ; , ,I;L ,; ; ; , • ,
for model parameters
- posterior distribution c:c)niput:e(j using
Markov Chain Monte Carlo
i; s " ;
. .....
necessitated by model complexity
and BDL data
'' !"'• «::I IJl ill'!, "i1' i).
H-46
: I Inu .I,;., ,' « , liliin I" i V'i!,;1 , >i.'\,'ffliai ' ii.". M I'-ji ilii. if, ,1 ','..4.. !...!; ,il,'. Kitfl.^: K"-;. ''i: ^^''.sWlH f'ifli 'SMIf- it 'i1-'I ;,:„£, i : "Mi.1,1-;: Ml::; ; *•.;! iv !-i'.".'i l» ^<lilll/i
image:
•s
•«-»
CS5
o
o
I
o
6
o
"2
z
.2 "2
13 .5
CQ J
W) Q
,4_, "^
C ^*
<u <£
p •£
0 Q
1
j
c t^
.2 ^
CD
0
p
"c/3
o
J^
w
1
(D
H"
<D
a
3
O
CO
Sample
Locations
Database
o
t~-
cn
<n
o
^_
r^-
-^-
Surface and
groundwater
•1
1
OH
National AFSCHIC
ice Survey (NAOS)
TT (13
S s
rr^ o
^ O
o
<r>
a\
«o
(N
00
a\
Groundwater
"S
1
00
"5
S
tional Inorganics and
elides Survey (NIRS)
rt 3
ZC
0
< =6
CU rt
PQ OS
^
en
*""'
S
i/^
^ -
Surface and
groundwater
•a u.
C O
2 «
« ~o
S .22
rt "S
OS «
ion of California
gencies
+~t ^<*
"o *-•
1 |
< ^
»n
\o
00
.2
.OS
*"*"
O
p
^r*
*""
A
Surface and
groundwater
c o
§ ta
*l
^ j=
fe .£2
(S «*2
1
2
1
0
a-
tS
"S
P
flj
1
CO
<N
H-47
image:
MODEL SPECIFICATION
+
is the natural logarithm of arsenic
concentration in fJig/L at j
th source in ith region
£ is a constant for ith region, where i ranges over
the seven geographical regions specified in NAOS
Xij is the natural logarithm of the population
served by jth source in ith region (an indicator of
the size and flow rate of the utility source)
gij is 0 if jth source in ith region is a surface water
source and 1 if it is a ground water source
€ij represents those sources of random variation
present at the jth source in ith region but not
captured by the covariates in the model.
H-48
image:
US geographic regions based on arsenic NOFs
tS'&S'^isaSSMfe' /•^•^3S&t^£-iV!;5«''j:«si?S'!'; •-*
gg»*tj?v
\%$$&&8&~.
Jj|pf:g&.
jiSSg^^-^^fffr?.-•*"'• "'"' "
«S«<? VH'
;?«---^--';-.. - •
•i-1'. -.*•'••••
,^t^-w|».-w-£3ii' y.' .'.-I /^T'/. V "
S^^^^-te-:'- ••
?^sS^§ivl >-r •
^^iis:.i';.; .-.
Source: Frey and Edwards, 1997
H-49
image:
DISTRIBUTIONAL ASSUMPTIONS
In the model
= i +
+
+
it is assumed that
ViJ
That is, fa are sampled from a parent normal
distribution (hierarchical model).
The normality assumption of eij implies that
conditional on all parameters,
-f
4-
H-50
r
image:
BAYESIAN METHODOLOGY
Probability model:
where 6 are parameters ©
Begin with prior distribution f
Observe sample X = x5
Compute posterior distribution
H-51
image:
Ill
PRIOR DISTRIBUTIONS
Without substantive prior knowledge about parameters
of hierarchical model, our priors were diffuse:
7
log(r2)
7V(0,32)
JV(05102)
N(Q,102)
AT(0,102)
JV(0,102)
These parameters are assumed independent a priori,
but are dependent in the posterior.
H-52
image:
POSTERIOR ESTIMATES
Parameter
Mi
M2
Ms
M4
M5
M6
Mr
a2
4'
T2
0
7
P.M.
-3.13
-3.50
-3.62
-1.76
-1.84
-1.04
-1.41
2.23
-2.27
1.76
0.21
0.14
P.S.D.
0.65
0.61
0.61
0.57
0.59
0.66
0.62
0.21
0.74
1.76
0.05
0.19
H-53
image:
.,'»'; , :: '" llj.i '•
)' i!'
i I1 .!<i,i
Ijf',:1''
to
6
CD
' . 0
sja
;,'• :w
, •' <jj
o
""' o
cvj
CD
CO
9
0.05
0.10
0.15
;!;:"
.Sr'
0.20
Beta
0.25
0.30
0.35
.,, •. • • 11 da* •. ..••.! i
i ,.I»I|'H"!. "'li1'!'"!.. • ',,i'T I
•;•' «' " III-"*:1 I; :M''" ,":
figure 2: Scatterplot of 7 versus /? from a sample of size 30000 from the joint posterior
distribution.
' -i
H-54
, '4 , '• ill1,:; I ,!, ' ",
si,".:'.: , ';Sii,ii; . * ' ' ; i
image:
NATIONAL DISTRIBUTION & UNCERTAINTY
The national distribution of arsenic concentration
measurements is the mixture of all the distributions
from the individual sites:
F - —
^National ~ AT
N
Fi,
All sites i
where N is the total number of sites in the nation.
Similarly for our estimates:
s<.
rp
National ~
All sampled sites i
where Wi is a weight indicating how much of the nation
is represented by site i.
.^
However, Fi is uncertain due to uncertainty in model
parameters. The posterior uncertainty in Fi is
characterized by the many (equally likely) Fid obtained
^
by evaluating Fi with the parameters in MCMC sample
*
J-
We can then compute the mean, cdf, median, 5th
percentile, 95th percentile, etc. of the distribution of
H-55
image:
•' •+' ',:!:,i f<F
1 :, , '"' ", .Hi'111 4,!:, I"'
ui ' ;l'i I1 , •' I'llJ ' " " Hi , ''if I 'ii'i "'i;iii , 1 j ill,
i, „ cii i 'i: ir f „ ,„, • ,' , " in ' " ;
CO
§
Q_
<D ^
> O
3s
Z3
I
O
O
.001
—i—
.01
.1 1 5
[AS] (micrograms per liter)
20 50 100
Figure 3: Posterior cumulative distribution function of national arsenic occurrence in source
Yo credible bounds and uncensored NAOS data overlayed.
'» •• &:' :, /" v:'>.'C?* :
j "•" ', •; f;!:, ;. j, in- ; !|!li; -i ^ |
•'."} •' '/•'•'!'•'" '" :"'1;:'; ill ''"
,•".?-f- i:;1"..-":" .t\'"
., ''• ;";!, : :i; -,l •'
'. ,'•';. •'• ''iifl;-'"*',!
H-56
'II,
image:
CO
o
.Q O
O
O
CM
O
q
o
0.75 0.80 0.85 0.90
Proportion of Samples Less Than 5 Micrograms per Liter
Figure 4: Posterior cumulative distribution function of the proportion of national arsenic
occurrence less than 5 /xg/L.
H-57
image:
!'! I
POSTERIOR ESTIMATES: Alternative Model
Supposing the 7 should be positive, we kept all priors
the same as before except took the prior for log(j) to
beN(0,102).
Parameter
Mi
M2
M3
M4
Ms
Me
Mr
a2
</>
T2
/3
7
P.M.
-2.78
-3.17
-3.27
-1.42
-1.50
-0.71
-1.04
2.22
-1.94
1.74
0.18
0.03
P.S.D.
0.55
0.51
0.49
0.44
0.48
0.54
0.48
0.20
0.65
1.75
0.04
0.07
H-58
image:
CO
d
in
d
05
<« o
<D
CM
d
0.0
0.1
—i—
0.2
—i—
0.3
Beta
Figure 7: Scatterplot of 7 versus /3 from a sample of size 30000 from the joint posterior
distribution when 7 is forced to be postive.
H-59
image:
I!";,
1 111,,
«• -:
CO
' , id
I
oS
,8-
^ ^
"5
1
CM
O
,p
O
• »'
"'. •'•'••'.'':>'?•*'
.',:'''••£;4* '? c
,01
'• •;' 111 :.',; ••..,'.
v.il |
i '*::•
.1 1 5
[AS] (micrograms per liter)
20 50 100
Figure 8: Posterior cumulative distribution function of national arsenic occurrence in source
water wiih 90% credible bounds and uncensofed NADS data overlayed. Plot based on
posterior when 7 is forced to be positive.
image:
SUMMARY
Bayesian methodology provides a
powerful method for characterizing
variability and uncertainty in
exposure factors
— effect of alternative priors can be
investigated in a diagnostic manner
- though don't try this at home alone
(without a competent statistician)
Probability distribution model with
covariates provides insights, and a
basis for extrapolation to other
targeted populations or
subpopulations.
H-61
image:
$! ,;i '
V !';V.
I,;
Bayesian Analysis of Variability and Uncertainty of
Arsenic Concentrations in U.S. Public Water Supplies
John R. Lockwood
Mark J. Schervish
Department of Statistics
PatrickJU Gurian
Department of Engineering & Public Policy
Mitchell J. Small
Departments of Engineering & Public Policy
and Civil & Environmental Engineering
)»&" ,';•;,"''<"'", ', ;V iQarnegie Mellon University
;iil|:! ',!' " :l :f "I"'1' ' '"ll i ; •'•. • •' " "' . > "' '• i '•
niiiiii ' ' ' , ' i' ' 'i " i. '"'
IS -. • • :; 'i't .•. ' •.. "? ,!•:" '; .. • i '•>''•.'"•«
if [ {.;••:.:',.••:. ">: •.••••;;.:' / •;•'•,'••.;•
presented at
Seventh Annual Meeting of the International Society of Exposure Analysis
Research Triangle Park, NC
'••'••" ;,:::,; • : ,' • : '•:' November 3, 1997
(Session 1. Approaches to Uncertainty Analysis)
1:='''
image:
Bayesian Analysis of Variability and Uncertainty of
Arsenic Concentrations in U.S. Public Water Supplies
John R. Lockwood1,
Mark J. Schervish1,
Patrick L. Gurian2
and Mitchell J. Small3
The risk of skin and other possible cancers associated with arsenic in drinking water has
made this problem a top priority for research and regulation for the U.S. EPA, as part of
implementation of the Safe Drinking Water Act amendments of 1986 and 1996. To assess
the costs, benefits and residual risks of alternative maximum contaminant levels (MCL's) for
arsenic, it is important to characterize the current national distribution of arsenic concentra-
tions in the U.S. water supply. This paper describes a Bayesian methodology for estimating
this distribution and its dependence on covariates, including the source region, type (surface
vs. ground water) and size of the source. The uncertainty of the fitted distribution is also
described, thereby depicting the uncertainty in the proportion of utilities with concentrations
above a given MCL. This paper describes the first stage of this assessment, based on a sample
of concentrations from source water drawn by utilities. Subsequent analyses will incorporate
the distribution and effectiveness of current treatment practices for. reducing arsenic, and
include available data sets of finished water quality to estimate the arsenic concentration
distribution in water supplied to consumers.
Using arsenic concentration data for source (raw) water reported by 441 utilities from the
National Arsenic Occurrence Survey (NAOS) (Frey and Edwards, 1997), we fit a Bayesian
model to describe arsenic concentrations based on source characteristics. The model allows
for both the formation of a national estimate of arsenic occurrence and the quantification of
the uncertainty associated with this estimate. The specification of the model is
= Hi +
+
+
where
Yij is the natural logarithm of arsenic concentration in ^g/L at jth source in ith region
fa is a constant for i^ region, where i ranges over the seven geographical regions
specified in NAOS
Xij is the natural logarithm of the population served by jth source in ith region (an
indicator of the size and flow rate of the utility source)
§ij is 0 if jth source in ith region is a surface water source and 1 if it is a ground water
source
1 Department of Statistics, Carnegie Mellon University.
2 Department of Engineering and Public Policy, Carnegie Mellon University.
3 Departments of Engineering and Public Policy and Civil and Environmental Engineering, Carnegie
Mellon'University.
H-63
image:
ey represents those sources of random variation present at the jth source in ith region
but not captured by the covariates in the model.
Furthermore, we model the values /^ as independent normal random variables with mean
t|> ajid variance r2. The national distribution of arsenic in source water is thus modeled as a
1 mi||ure'of Jognormals with the mean of the log-concentration equal to /ij+fix^+75ij and the
standard deviation of the log-concentration equal to a. The resulting distribution depends
upon the number "of utilities in each of the seven regions (?), their service populations x and
!, j'l,;: the(respective numbers drawing water from surface (snj — 0) vs. ground (^ = 1) water
(for'now, the sample is assumed to be representative of the national distribution, though
th'£ predicted distribution can be readily modified to reflect a different distribution of the
covanates" in the target population).
To characterize the uncertainty of the fitted national distribution, we use vague prior
distributions for the parameters -0, r, ft, 7, a and employ the Markov Chain Monte Carlo
melhodology (Gilks efr al.j 1996) to compute and simulate realizations from the posterior
distribution of the parameters. Posterior uncertainty distributions of all quantities of interest
can be calculated from these realizations.
Table 1 lists the posterior means and posterior standard deviations for the fitted model
parameters. The mean values indicate that
I arsenic concentrations are generally higher in the west than in the east (the posterior
means of /*4, /is, ^ and fj,7 are greater than the posterior means of /ii, /i2 and /t3)
• arsenic concentrations tend to be higher in source waters of larger utilities (the posterior
mean of ft is positive)
• arsenic concentrations are higher in ground water than in surface water (the posterior
mean of 7 is positive, though there is significant uncertainty in this result since the
posterior standard deviation of 7 is greater than the posterior mean)
The uncertainty in the fitted national distribution is characterized by the standard de-
viations of the parameters shown in Table 1 and by the covariance of the parameters in the
posterior joint distribution. Figures 1 and 2 illustrate this covariance for two of the param-
eter pairs: (ft, tj£») and (^,7), respectively. These covariances are of the type that commonly
arise in parameter estimation; for example, the positive association between higher ft (which
results in higher predicted arsenic concentrations) and lower ty (which corresponds to lower
values of the /i, and lower'predicted' arsenic concentrations) is necessary tomaintain the
match to the observed^sampie^values. ^ _ ^ ^ ^ ^ i
The national distribution is synthesized by sampling the joint parameter space (i.e, the
pOfnts in Figures 1 and 2 and the associated points for the other model parameters) to
generate many possible distributions. For each, the cumulative distribution function (cdf)
at a particular value of the arsenic concentration (exp(Y)) is computed as the average of
the predicted cdf's for each measurement in the original sample of 441, based on its model
covariates (or, the covariates for each utility in the target population, if these differ from the
sample). The multiple cdf's generated from the parameter space describe the uncertainty
of the national variability distribution. The median of the uncertainty distribution is one
H-64
image:
Table 1: Posterior means and standard deviations of parameters. The regions (subscripts)
are l=New England, 2=Mid-Atlantic, 3=Southeast, 4=Midwest Central, 5=South Central,
6=North Central, 7=West.
Parameter
/"i
to
to
M4
^
y"6
7*7
a2
' ip
T*
(3
7
Posterior Mean
-3.18
-3.51
-3.66
-1.78
-1.89
-1.10
-1.47
2.17
-2.30
1.74
0.21
0.14
Posterior Standard Deviation
0.67
0.62
0.63
0.59
0.62
0.67
0.64
0.20
0.76
1.77
0.05
0.19
choice for a single estimate of the national distribution. This median distribution is shown
in Figure 3, along with corresponding 5th and 95th percentiles and the observed distribution
of the original data set. The fitted distribution closely matches the observed distribution,
including the result that 37% of the sample is at or below the arsenic detection limit of
0.5 /ig/L. The full uncertainty distribution for the proportion of the national population
below one particular value of the arsenic concentration (5 /xg/L) is shown in Figure 4, where
this proportion is indicated to range from about 0.79 - 0.87, with a median of 0.83. This
characterizes the uncertainty in the proportion of utilities requiring treatment of their source
water to meet an MCL of 5 /^g/L.
Acknowledgment: This work is sponsored by the U.S. EPA Office of Ground Water
and Drinking Water, Standards and Risk Management Division. The paper has not been
subject to EPA peer review, and the views expressed are solely those of the authors.
References
Frey, M. M. and M. A. Edwards (1997). Survey arsenic occurrence. Jour. AWWA, 89(3),
105-117.
Gilks, W. R., S. Richardson and D. J. Spiegelhalter, eds (1996). Markov Chain Monte
Carlo in Practice. Chapman and Hall, London.
H-65
image:
•:fSI cii
"'53
I 111 III
0.10
0.15
0.20 0'.25
• Beta
o.3d
0.35
Figure 1: Scatterplot of -0 versus (3 from a sample of size 5000 from the joint posterior
distribution
i",/'1 ''!' ' ' IS11!! I
',:" > •iiiiisi °
,;,:!'ln,i, '• ; III: I -, <"i .'
§•
, 0.1.0 ' ,,,,0.15 0.20 0.25 0.30 0.35
„ Beta
figure 2: Scatterplot of 7 versus ft from a sample of size 5000 from the joint posterior
djstributkm „
H-66
image:
•e
a
2
a.
1 2
^
o
.001
.01 .1 1 5 20 50 100
[AS] (micrograms per liter)
Figure 3: Posterior cumulative distribution function of national arsenic occurrence in source
water with 90% credible bounds and uncensored NAOS data overlayed.
| d
O
0.75 0.80 0.85 0.90
Proportion of Samples Less Than 5 Micrograms per Liter
Figure 4: Posterior cumulative distribution function of the proportion of national arsenic
occurrence less than 5
H-67 >?U.S. GOVERNMENT PRINTING OFFICE: 1999 - 750-101/00039
image:
III ll"
:fj ;
it! ,
image: