Lapas attēli
PDF
ePub

1-14

REGULATORY PROGRAM OF THE UNITED STATES GOVERNMENT

distinguish serious risks from trivial hazards. Since quantitative risk assessment plays such a significant role in risk management, there is a continuing need to comply with the NAS recommendations, both to assure an effective ordering of regulatory priorities and to maintain public confidence in the risk management process.

In this section we briefly describe several important areas which continue to pose significant problems in preparing quantitative risk assessments. These problems arise in estimates of both the measurement of hazard and the extent of human exposure. We emphasize potentially carcinogenic chemicals because the prevention and cure of cancer plays a major role in health policy issues.

ALTERNATIVE RISK ASSESSMENT METHODOLOGIES

Risk assessments of chemical substances in general (and of possible carcinogens in particular) consist of a mixture of facts, models, and assumptions. Facts are 7 beyond dispute, of course, but there is considerable

debate concerning the scientific merits of the models and assumptions commonly used in risk assessments. In some cases, a scientific consensus has developed to support a particular model or assumption. In other instances, however, certain models and assumptions are relied upon because they reflect past practices rather than the leading edge of science. Furthermore, a scientific basis for several of the most critical models and assumptions simply does not exist.

Most scientists agree that these models and assumptions impart a conservative bias; that is, they lead to risk estimates that are much higher than the most likely level of risk. In this section we catalogue some of the more significant places at which conservative biases emerge in quantitative risk assessment.

Contemporary risk assessment relies heavily upon animal bioassay and epidemiology. Each approach has theoretical advantages and disadvantages. In practice, both can be abused to achieve preestablished conclusions.

OSTP Guidelines, Guideline 8, p. 10376.

Animal Bioassay

Animal testing enables scientists to estimate ri'· ex ante, before human health effects material. whereas epidemiological studies can only detect such effects ex post. In addition, animal tests can be conducted under tightly controlled laboratory conditions. This gives more reliable estimates of exposure and avoids many of the confounding factors that often plague epidemiological investigations. The relatively short lifetimes of experimental mammals such as rats and mice allow scientists to ascertain the possible effects of long-term exposure in just a few years.

Animal testing suffers serious limitations, however, arising from certain critical assumptions in the methodology. The most important of these assumptions is that results can be meaningfully extrapolated from test animals to humans. Despite its routine application, there is no accepted scientific basis for this assumption. Some eientists believe that it should not be used in assessing human health risks."

Another critical limitation is the reliance on very high doses to generate adverse effects in te animals." A mathematical model must be used bridge the gap between these high-dose exposures and the low-dose exposures more typically faced by people. Many different mathematical models can be constructed to fit the data at high doses. These models often vary enormously, however, in their predictions of risk at low doses.

[ocr errors]

Beyond these unavoidable methodological constraints, the results of animal bioassays may be subject to conflicting scientific interpretation or strongly influenced by the choice of research method. Tissue preparation and histology present obvious opportunities for error, as experts may disagree as to how slides should be interpreted. This problem generally is not significant at high doses, where malignancies are often obvious. At low doses, howaver, pathologists often differ in how they disungwish tumors from hyperplasia. Subjectivity cannot be avoided where such interpretations of the data must be made." 20

See, eg, Bruce Amen, Renas Magaw, and Lois Swirsky Gold, Ranking Possible Carcinogenic Hazarda," Science, Vol. 236, April 17, 1987; Gio Batta Gori, The Regulation of Carcinogenic Hazards, Science, Vol. 208, April 18, 1980. "OSTP Guidelines, Guidaline 11, p. 10377.

Is the original analysis of the rat bioassay used to derive the dose-response function for diaria, 9 of 85 controls were said to develop liver tumors. An independent review of this data resulted in 16 of the 86 controls being classified as having such tumors. See U.S. Environmental Protection Agency, A Cancer Risk-Specific Dose Estimate for 2, 3, 7, 8-TCDD, Appendis A, EPA/600/6-88/007Ab, June 1988 Chereinafter, Dioxin Risk Assessment Appendis A), pp. 2-3.

. Calin N. Park and Ronald D. Bree, Quantitative Risk Assessment: State-of-the-Art for Carcinogenesis," Chapter 4 in Riskdéanagemen!

Epidemiology

OVERVIEW

Epidemiology is attractive because it largely avoids these two problems. It focuses on observable human health effects instead of on hypothesized outcomes based on animal experimentation, and it relies upon real-world exposures to generate empirical data. Many of the serious problems associated with animal studies can be avoided, allowing researchers to develop risk estimates that are directly related to human health.

Unfortunately, epidemiological research suffers from its own set of limitations. For example, retrospective studies often have difficulty correlating morbidity and mortality with exposure to specific substances. Exposure data are commonly lacking, incomplete, imprecise, or affected by systematic recall or selection biases. Furthermore, the risks these studies seek to detect are often very small relative to background, thus making statistically significant ef fects difficult to observe. When health effects are latent, correlating exposures to illness is even harder to da

Besides these methodological limitations, epidemiological studies occasionally suffer from outright bias. Many studies employ scientifically questionable procedures often aimed at demonstrating positive relationships between specific substances and human illness. Some researchers use inappropriate statistical procedures to "mine" existing databases in search of associations. This increases the likelihood that observed statistical relationships are merely spurious artifacts. One result of these phenomena is that epidemiological studies often display contradictory results."

Despite these constraints, properly conducted animal bioassays and epidemiological studies both have useful roles to play in quantitative risk assessment. Indeed, they are complementary. The usual weaknesses of epidemiological investigations unreliable exposure data, confounding effects are readily avoided in laboratory experiments on animals. The weaknesses of animal bioassays-high to low-dose

1-15

extrapolation, animal-to-man conversion-do not arise in epidemiological studies. Careful risk assessment incorporates both types of analysis to ensure that the emerging picture of human health risk is as complete as possible, and that inferences derived from this picture are themselves internally consis

tent.

ISSUES IN RISK ASSESSMENTS DERIVED
LARGELY FROM ANIMAL BIOASSAYS

Animal bioassays tend to dominate current risk assessments. An important reason for this is that the derivation of dose-response relationships is a critical regulatory motive for performing quantitative risk assessment. Animal studies are ideally suited to serve this purpose by virtue of the controlled conditions under which dose and response can be calibrated. Epidemiological studies often are relegated to providing merely a "reality check" to ensure that the implications of animal bioassays are plausibly consistent with real-world experience. Because of this heavy emphasis on animal testing, we focus on several major problems that arise with respect to risk assessments primarily based on the results of animal bioassays.

The Use of Sensitive Test Animals

To enhance the power of animal tests, scientists typically rely on genetically sensitive test animals. It is unclear whether these species accurately mimic biological responses in humans.

Some test species are extremely sensitive. For example, approximately one-third of all male B6C3F1 mice, a common test species, spontaneously develop liver tumors. The same phenomenon occurred in an important bioassay concerning dioxin using female Sprague-Dawley (Spartan) rats. Tumors observed in dosed animals were predominantly located in the liver. However, approximately one-fifth of the animals in the control group also developed liver tumors. The relevance of elevated liver tumors in hypersensitive species has been questioned by scientists and is

Alvan R. Feinstein, "Scientific Standards in Epidemiological Studies of the Menace of Daily Life, Science, Vol. 242, December 2, 1988, pp. 1257-1263.

Linda C. Mayes, Ralph L. Horowitz, and Alvan R. Feinstein. "A Collection of 56 Topics with Cantradictory Resulta in Cas-Control Research," International Journal of Epidemiology, Vol. 17, No. 3 (1988), pp. 680–685.

Ames et al, (op. cil), p. 276.

[merged small][merged small][merged small][ocr errors]

The reliance on sensitive test animals also biases risk assessments in a more subtle way. It establishes powerful incentives to search for and develop increasingly sensitive test species. As test animals become more sensitive, repeated testing using identical protocols will tend to result in higher and higher estimates of risk even if all other factors are held constant.

Selective Use of Alternative Studies

In their respective risk assessment guidelines, both OSTP and EPA recommend that relevant animal studies should be considered irrespective of whether they indicate a positive relationship. In practice, however, studies that demonstrate a statistically sig nificant positive relationship routinely receive more weight than studies that indicate no relationship at all. For example, the pesticide daminozide (Alar) and its metabolite unsymmetrical 1,1-dimethylhydrazine (UDMH) recently received B2 classifications (probable human carcinogen"). Each of these classifications was based on a single positive animal

[ocr errors]

bioassay. Overcoming such a classification requires, at a minimum, two essentially identical studies showing no such relationship." In the case of Alar and UDMH, however, a more stringent test apparently applied: Three-high quality negaive studies showed no significant effects; these studies appear to have received little or no weight in the classification decision.”

Selective Interpretation of Results

Risk assessment guidelines generally give the greatest weight to the most sensitive test animals. Thus, if a substance has been found to cause cancer in one species or gender but shown to exhibit no effects elsewhere, the results pertaining to the sensitive species or gender typically will be used to develop estimates of risks to human health. For example, if male mice develop cancer from a substance but female mice and rats of both genders do not, then the results from the male mouse often will be used to derive estimates of cancer risks to humans.❤

Once a positive result has been obtained in an animal bioassay, a substance often will be provision. ally classified as a probable human carcinogen. e statistical burden of proof shifts to the no-effect hypothesis. Because it is logically impossible to prove

See Ames et al, (op. cil), p. 276 (arguing that such data are irrevelant), OSTP Guidelines,, Guideline 8, p. 10377 (concluding the data "mast be approached carefully), and EPA Carcinogen Risk Assessment Guidelines, p. 83995 (making the policy judgment that data are sufficient evidence of carcinogenesis).

See OSTP Guidelines, Guideline 25, p. 10378; EPA Carcinogen Risk Assessment Guidelines, p. 23995.

Bee EPA Carcinogen Risk Assessment Guidelines, p. 39999-94000. A single animal test that shows a positive resck to an sausual degree" (p. 23999) is sufficient to warrant at least a B2 classification ("probable human carcinogen"), even if this result occurs in a species known to have a high rate of spontaneous tumors. A strong animal bioassay or epidemiological study showing no evidence of carcinogenic affect cannot overcome this presumption (p. 84000).

Bee Second Peer Review of Daminoside (Alar) and UDMH (Unsymmetrical 1,1-dimethylhydrazine), Memorandum from John A. Quest to Mark Boodes, US. Environmental Protection Agency, OPTS, May 15, 1989 (hereinafer, Alar/ UDMH Internal Four Review No. 2). This internal OPTS panel reviewed several recent studies on Alar and UDMH.

One study of Alar yielded a statistically significant increase in common lung tumors in mice, but only for one of three dosage levels. Resuks were not statistically significant at ane higher and two lower dosages, and controls also displayed unusually high tumor incidence. 90% of the lung tumore in dosed mice were benign, versus 89% in the controls.

One study of UDMH yielded statistically significant increases in mmmon lung and uncommon liver tumors in mice, but only for the higher of two dosages. 97% of the lung tumors in dosed mice were benign, versus 100% in the controls. 29% of the liver tumors in dosed mice were benign; ne tumors were observed in the controls.

Prior studies that purported to show a carcinogenic response had been judged inadequate by EPA's Scientific Advisory Panel, an external poor review group. The Office of Pesticides and Toxic Substances (OPTS) panel noted that a different to tornal EPA nak assessment panel (the Carcinogen Assessment Group) considered these studies sufficient to justify B2 classifications when it evaluated then for EPA's O' of Solid Waste and Emergency Response. Despite the scientific controversy, the OPTS panel interpreted these prior studies as "suppon é evidence" under EPA's risk assessment guidelines. •

[ocr errors]

See EPA Carcinogen Risk Assessment Guidelines, p. 23995 (establishing the need for replicate identical studies showing no effect), and p. 33999 (establishing the minimum requirement of two well-designed studies showing no increased tumor incidence to warrant a "no evidence determination).

Alar/UDMR Internal Peer Review No. 2, pp. 6, 8, 9. EPA's scheme for carcinogen classification is self an issue among scientists. See, eg, US. Environmental Protection Agency, Risk Assessment Forum, Workshop Report on EPA Guidelines for Carcinogen Risk Assessment, EPA/625/3-89/016, Washington, DC: Author, March 1989, pp. 21-26.

"Sen EPA Carcinogen Risk Assmement Guidelines, p. 83997 (data from long-term animal studies showing the greatest mensitivity

OVERVIEW

a negative, however, this practice establishes a virtually irrebuttable presumption in favor of the carcinogenesis hypothesis.

Severe Testing Conditions

Current risk assessment protocols require the use of very high doses. Unfortunately, high doses are ofter. taxic for reasons unrelated to their capacity to cause cancer. A common procedure is to use what is called the maximum tolerated dose (MTD), which is the most that can be administered to a test animal without causing acute toxicity. At such exposure levels, substances often cause severe inflammation and chronic cell killing. For example, formaldehyde causes nasal tumors in rats when administered in high doses. However, MTD administration severely inflames nasal passage tissues. It is therefore unclear whether the cancers induced are caused by formaldehyde per se or by the toxic effects of high doses.

Results such as these have caused some scientists to question the validity of rodent tests performed at the MTD for estimating human health risks that arise from exposure at low doses. By combining very high doses with highly sensitive test subjects, some animal bioassays are predisposed to discover apparent carcinogenic effects.

Relevance of Animal Bioassay Results

An important reason why animals vary in their sensitivity is that they have different physiologies, metabolic processes, reproductive cycles, and a host of other species-specific characteristics that largely result from unique evolutionary paths. Each of these factors needs to be carefully considered in evaluating the significance of animal data with respect to human health. This is recognized in both the OSTP and EPA guidelines, but it is often neglected when the guidelines are applied to specific substances."

See, eg, Ames et al, (op. cil), pp. 278-277.

1-17

The most important assumption in this regard is that animal test results can be meaningfully extrapolated to humans. A recent study of chemicals tested under the auspices of the U.S. National Toxicology Program shows that this assumption can lead to the erroneous classification of many chemicals as probable human carcinogens.“ Positive associations have been obtained in either rats or mice for half of 214 chemicals tested. However, results were consistent across these two genetically similar species only 70 percent of the time. If it is assumed that rodent bioassays have the same sensitivity and selectivity with respect to human carcinogens as they do between rodent species, and it is further assumed that 10 percent of all chemicals are in fact human carcinogens, then 27 of every 100 randomly selected chemicals would be misclassified as probable human carcinogens. Only three chemicals would be misclassified as noncarcinogens. Thus, “false positives" would be 9 times more common than False negatives.****

Of course, this ratio of false positives to false negatives reflects highly conservative upper-bound" assumptions concerning sensitivity and selectivity. Given the high degree of similarity between rats and mice and the limited resemblance between rodents and humane the sensitivity of rodent bioassays is probably much lower than 70 percent Furthermore, other research indicates that selectivity may be as ow as five percent, Adjusting only for this lower Mtectivity suggests that false positives are almost 30 times more common than false negatives. This raises serious questions concerning the practical utility of animal bioassays for the purpose of quantitative risk assessment."

Other factors should also be considered when relying upon animal bioassay results as the primary basis for quantitative risk assessments. For example, certain substances are toxic or even carcinogenic by one pathway but not by others. Nevertheless, animal

OSTP Guidelines, Guideline 25, p. 10378; EPA Carcinogen Risk Assessment Guidelines, p. 34003 (responding to comments on the draft guidelines and affirming agreement with OSTP Guideline 25).

[ocr errors]

Lester B. Lave, Fanny K. Ennever, Herbert 8. Rosenkranz, and Gilbert &. Omena, "Information Value of the Rodent Bioassay," Nature, Vol. 336 (December 15, 1988), pp. 631-633.

44

False negatives occur when a test fails to detect effects when they are in fact present. Sensitivity refers to the capacity of a test to minimize false negatives. False positives occur when a test appears to detect effects that in fact are abeant. Selectivity refers to a test's ability to minimize false positives. The 9 to 1 ratio of false positives to false negatives calculated by Lave at al assumes that both selectivity and sensitivity equal about 70%.

45

Lave et al., (op. cil), p. 631. Adjusting also for less sensitivity reduces the ratio of false positives to false negatives. For example, if sensitivity is only 10 percent and all other parameters remain unchanged, then this ratio declines to 9.5 to 1. However, this implies that both types of statistical errors are rampant, which raises questions concerning the overall utility of animal bioassays. This is, in fact, precisely the concern raised by Lave et al, (op. cil), who conclude that such testa are cost-effective investments in informationenly under

1-18

REGULATORY PROGRAM OF THE UNITED STATES GOVERNMENT

bioassay protocols often emphasize the most sensitive pathway. As long as human exposure is likely to arise the same way, then this choice may be reasonable. However, the pathway to which the test species is sensitive sometimes reflects an exposure route that is implausible or irrelevant for humans, For example, formaldehyde causes nasal tumors in rats at 12 times the rate observed in the next most sensitive animal species. This extreme sensitivity may be related to the fact that rats breathe only through the nose.

There may be important differences between animals and humans that make specific tumors irrelevant. For example, some chemicals cause cancer in the symbal gland of the rat; because humans lack such a gland it is unclear whether these results matter in estimating human health risk. Other substances induce cancer through biochemical mechanisms not found in humans.

A greater controversy surrounds the question whether the same weight should be given to benign and malignant tumors. The scientific consensus is that benign and malignant tumors should be aggregated only when it is scientifically defensible to do 80. In practice, however, benign and malignant tumors are routinely aggregated unless a strong case can be made against the practice." The difference between these default assumptions is significant; one approach counts only carcinomas that are present, whereas the other counts tumors that might become carcinomas. In an extreme case, a substance that unambiguously promotes benten tumors but never causes cancer could be classified as a probable human carcinogen simply because of this assumption.

In addition, tumor incidence is commonly pooled across sites to obtain a total estimate of carcinogenic effects. This implicitly assumes that cancer induetion is independent across sites and not the result of. either metastasis or the same biological mechanism. Given the extreme Penkidvity of test species and thể regular use of MTD administration, other explanations for tumors occurring at multiole sites appear just as plausible.

The Choice of Dose-Response Model

No single mathematical model is accepted as generally superior for extrapolating from high to low doses." Consequently, Federal agencies often use a variety of different models. Rather than being a

OSTP Guidelines,, p. 10376.

EPA Carcinogen Risk Assessment Guidelines,, p. 83997. Toid

scientific footnote to the risk assessment process, however, the choice of model is actually an important policy issue. The multistage model appears to be the most commonly used method for estimating low-` `ne risks from chemicals, and there are two major ces of bias embedded in this choice: its inherent conservatism at low doses, and the routine use of the Linearized form in which the 95 percent upper bound is used instead of the unbiased estimate.

The multistage model essentially involves fitting a polynomial to a data set, with the number of "stages" identified by the number of terms in the polynomial. Since animal bioassays rarely have more than three dose levels, it is rare to see applications of the multistage model with more than two stages. Although the multistage model enjoys some scientific support because it is compatible with multistage theories of carcinogenesis, in practice the model fails to include enough stages, due to the absence of sufficient alternative exposure cohorts.

Typically, the multistage model yields risk estimates that are higher than most other models. For example, when five different dose-response models were analyzed in a recent risk assessment of cadmium, estimates of cancer risks at moderate d varied by a factor of 100. This difference an ag estimates widened as doses declined toward the very low levels within the range of regulatory concern. At very low doses, two of the five models predicted excess lifetime cancer risks greater than one in one thousand (10, a risk oftentimes regarded by policymakers as unacceptable. However, two other equally plausible models predicted essentially no excess cancer risk at all. Since none of the five models offers a scientifically superior basis for deriving lowdose risks, the choice of model is therefore a pivotal policy decision. The accepted practice under these circumstances is to develop a subjectively-derived best" estimate while fully informing decisionmakers as to the extent of uncertainty surrounding it." In the cadmium case, as in most others, this practice was not followed: Estimates of the number of statistical cancers that would be prevented by regulation were presented based only on the multistage model.

The linearized multistage model (LMS) is a special version of the multistage model in which the percent upper confidence interval of the linear te..n is used instead of the unbiased estimate. That is, the

OSTP Guidelines, Guideline 26, p. 10378; Ames et al, (op. cil), p. 276.

Bee, eg, OSTP Quidelines, Guidelines 27, 29, and 31, p. 10378; EPA Carcinogen Risk Assessment Guidelines., pp. 33999, 34003.

« iepriekšējāTurpināt »