The Educational Testing Act of 1981

Procedural Guidelines: Score Interpretation

1. Effective test use and meaningful score interpretation should be supported
and augmented by:

a. the development of appropriate test norms based on administering tests to
samples from a defined population when there is a reasonable expectation
that a large proportion of the schools or other units selected for the norms
sample will agree to participate; or

b. a rationally developed system of interpretation shared with score recipi-
ents when score interpretation is not developed from normative data.

2. Tests offered for sale and described by ETS as standardized tests (as distinguished from tests offered in testing programs) should have adequate norms or other information for use in interpreting test results.

3. When test norms are developed by administering tests to samples from a defined population, the resulting norms should be representative of any relevant subgroup, including those defined by sex or ethnicity, in proportion to their frequency in the defined population. Such subgroups may be deliberately over-sampled for more precise estimation of the statistical characteristics of the population by procedures that take over-sampling into account. Data on the proportions in the sample and in the population, when available, should be reported in an appropriate technical publication.

4. The report of a special norms study should provide information on:

a. the sampling design;

b. the participation rate of institutions or individual respondents in the sample;
c. characteristics of the participating institutions and individuals;

d. weighting systems used in preparing norms; and

e. estimates of sampling variability along with an acknowledgment, when
necessary, that such estimates do not take into account biases arising from
nonparticipation.

5. When descriptive statistics based on program testing (as distinguished from
norms based on special norms studies) are published, the following guidelines
should be used:

a. both table titles and descriptive material should make it clear that the sta-
tistics are based on examinees or participating institutions or other using
agencies;

b. the descriptive material should define the nature of the group by identify-
ing the appropriateness of the sample and the factors that relate the back-
ground of the group to test performance, and by acknowledging explicitly
that the sample is self-selected;

c. when possible, reports should be prepared to show comparisons of data based on program examinees or institutional characteristics with relevant data on variables from other sources; and

d. when information about interpretive data is prepared for different user groups, the presentation, whenever practicable, should be adapted to the needs and background of each group.

6. When norms are developed from program testing, the age, sex and ethnic composition of the program norms group should be described whenever such information about subgroup membership is available.

7. In testing programs, descriptive statistics should be compiled periodically from a sample or entire population in order to monitor the participation and performance of males and females drawn from diverse backgrounds, interests and experience (e.g., major ethnic group, handicapped status and other relevant subgroups of the population of interest).

8. If norms intended for use in the interpretation of individual scores are presented separately for males and females or for members of specific ethnic groups, the rationale should be carefully described. Separate norms may be justified for scores used primarily for guidance when access to the experiences needed to earn a high score is clearly related to subgroup membership and a more direct index of access is not available. The existence of score differences between subgroups does not in itself justify presentation of separate norms.

9. Descriptive statistics prepared separately for subgroups of the relevant testtaking population, but not intended for use in interpreting individual scores, should not be presented in a way that encourages their use for such a purpose.

10. Institutional or agency users and examinees should be informed of the standard error of measurement of a score, and test interpretation materials should point out the limitations of test scores and encourage score users to take into account the possible scores a test taker might achieve on retesting.

11. Statistical data used in score interpretation should be revised annually except when less frequent revision is judged to be appropriate as, for example, when norms are based on special studies. A statement of the period in which the data were collected should be included in any publication that presents the data.

12. Institutional or agency score recipients should be provided with interpretive materials designed to be helpful for using scores in conjunction with other information, setting cutting scores where appropriate, interpreting the scores for special subgroups (e.g., ethnic minorities, males, females, and handicapped students), conducting local normative studies, and developing local interpretative materials.

Procedural Guidelines: Test Validity

1. ETS should provide evidence of the validity of its tests in relation to the princi-
pal purposes or intended uses of the tests. One or more of the following may
be applicable:

a. When test scores are to be interpreted in terms of degree of mastery of the
knowledge, skills, or abilities of a domain represented by the test, content
validation evidence should be provided.

b. When test scores are to be interpreted in terms of the prediction of future
behavior, criterion-related validation evidence should be provided.

c. When test scores are to be interpreted as a measure of a theoretical con-
struct, construct validation evidence should be provided.

2. Evidence of content validity should be based (a) on a careful determination and analysis of the domain(s) of interest and of the relative importance of topics within the domain, and (b) on a demonstration that the test is an appropriate sample of the knowledge or behavior in the domain(s). A report on evidence of content validity should present descriptions of the procedures employed in the study, including the number and qualifications of experts involved in the analysis of the domain or evaluation of the relevance and appropriateness of the test.

3. Construct validation should be based on: rational and empirical analyses of processes underlying performance on the test in question including, where appropriate, noncognitive as well as cognitive functions. Empirical evidence relevant to the analyses should include results of investigations of the degree to which test scores are related or unrelated to other variables in ways implied by intended interpretations.

4. Criterion-related validation should be used only when technically sound and
relevant criteria are available or can be developed and when other conditions
affecting feasibility warrant the study.

a. Criterion-related validation should involve as many performance variables
as necessary to permit evaluation of the effectiveness of test scores for pre-
dicting the types of behavior they are intended to measure.

b. Criterion-related validation should not combine variables to form a single
criterion measure unless such a procedure is justified by logical considera-
tions or empirical evidence or the practical requirements of the intended
use of the results.

c. Criterion data should be collected in a way that permits an assessment of
the reliability of each criterion variable, but with the understanding that
there may be several sources of irrelevant variation (e.g., sampling of crite-
rion content, source of criterion ratings or data, and so forth).

91-170 0-82--29

5. Interpretations of correlations between test scores and criterion variables should take into account such factors as sample size, criterion reliability, possible restriction in the range of scores obtained in the validity study sample, and other contextual factors.

6. The method(s) by which any validation is accomplished should be fully documented; such documentation should include appropriate details such as the nature and reliability of the criteria, a description of the subjects used, the materials surveyed and the qualifications of the experts who made judgments regarding the appropriateness and importance of test content.

7. Where adequate methods are employed to insure equivalence of scores on alternate forms, it is not necessary that each new form be validated. New validation studies should be made if revised tests have substantial changes, such as different item types, or if they sample a revised performance domain.

8. When appropriate and feasible, the validity of a test should be investigated separately for subsamples of the test-taking population.

9. When a name of a test is established, it should not imply more than is justified by evidence of validity.

10. Information should be made available to institutional and agency users that would be of assistance to them in planning and conducting local validity studies.

TEST USE

Principle

Proper and fair use of ETS tests is essential to the social utility and professional
acceptance of ETS work.

Policies

A. ETS will set forth clearly to sponsors, institutional or agency users, and examinees the principles of proper use of tests and interpretation of test results.

b. ETS will establish procedures by which fair and appropriate test use can be promoted and misuse can be discouraged or eliminated.

Procedural Guidelines

1. Program publications should:

a. describe appropriate uses and caution against potential misuses of pro-
gram tests;

b. explain clearly that test scores reflect past opportunity to learn and dis-
courage test interpretations that go beyond reasonable inferences from
test performance;

c. emphasize that an individual's test score should be interpreted in the con-
text of other information about him or her;

d. provide appropriate information about test content, difficulty, and purpose
to help the institutional or agency user select instruments that meet the
measurement requirements of the situation and avoid selecting, requiring
or using inappropriate tests;

e. invite institutional or agency users to consult with the program sponsor
and/or ETS about their current or intended uses of ETS-developed tests and
identify the offices to be contacted for this purpose;

f. summarize results of research relevant to the use of the test or cite refer-
ences in which such results are reported;

g. describe adequately and clearly scale properties that affect score interpre-
tation and use;

h. advise institutional or agency users that decisions about the application of
single or multiple prediction equations, based on distinguishing character-
istics such as sex, ethnic group or curricular emphasis or training, should be
preceded by careful examination of social, educational and psychometric
factors;

« iepriekšējā Turpināt »

Grāmatas

The Educational Testing Act of 1981: Joint Hearings Before the Subcommittee ...