Procedural Guidelines: Score Interpretation 1. Effective test use and meaningful score interpretation should be supported a. the development of appropriate test norms based on administering tests to b. a rationally developed system of interpretation shared with score recipi- 2. Tests offered for sale and described by ETS as standardized tests (as distinguished from tests offered in testing programs) should have adequate norms or other information for use in interpreting test results. 3. When test norms are developed by administering tests to samples from a defined population, the resulting norms should be representative of any relevant subgroup, including those defined by sex or ethnicity, in proportion to their frequency in the defined population. Such subgroups may be deliberately over-sampled for more precise estimation of the statistical characteristics of the population by procedures that take over-sampling into account. Data on the proportions in the sample and in the population, when available, should be reported in an appropriate technical publication. 4. The report of a special norms study should provide information on: a. the sampling design; b. the participation rate of institutions or individual respondents in the sample; d. weighting systems used in preparing norms; and e. estimates of sampling variability along with an acknowledgment, when 5. When descriptive statistics based on program testing (as distinguished from a. both table titles and descriptive material should make it clear that the sta- b. the descriptive material should define the nature of the group by identify- 23 c. when possible, reports should be prepared to show comparisons of data based on program examinees or institutional characteristics with relevant data on variables from other sources; and d. when information about interpretive data is prepared for different user groups, the presentation, whenever practicable, should be adapted to the needs and background of each group. 6. When norms are developed from program testing, the age, sex and ethnic composition of the program norms group should be described whenever such information about subgroup membership is available. 7. In testing programs, descriptive statistics should be compiled periodically from a sample or entire population in order to monitor the participation and performance of males and females drawn from diverse backgrounds, interests and experience (e.g., major ethnic group, handicapped status and other relevant subgroups of the population of interest). 8. If norms intended for use in the interpretation of individual scores are presented separately for males and females or for members of specific ethnic groups, the rationale should be carefully described. Separate norms may be justified for scores used primarily for guidance when access to the experiences needed to earn a high score is clearly related to subgroup membership and a more direct index of access is not available. The existence of score differences between subgroups does not in itself justify presentation of separate norms. 9. Descriptive statistics prepared separately for subgroups of the relevant testtaking population, but not intended for use in interpreting individual scores, should not be presented in a way that encourages their use for such a purpose. 10. Institutional or agency users and examinees should be informed of the standard error of measurement of a score, and test interpretation materials should point out the limitations of test scores and encourage score users to take into account the possible scores a test taker might achieve on retesting. 11. Statistical data used in score interpretation should be revised annually except when less frequent revision is judged to be appropriate as, for example, when norms are based on special studies. A statement of the period in which the data were collected should be included in any publication that presents the data. 12. Institutional or agency score recipients should be provided with interpretive materials designed to be helpful for using scores in conjunction with other information, setting cutting scores where appropriate, interpreting the scores for special subgroups (e.g., ethnic minorities, males, females, and handicapped students), conducting local normative studies, and developing local interpretative materials. 25 25 Procedural Guidelines: Test Validity 1. ETS should provide evidence of the validity of its tests in relation to the princi- a. When test scores are to be interpreted in terms of degree of mastery of the b. When test scores are to be interpreted in terms of the prediction of future c. When test scores are to be interpreted as a measure of a theoretical con- 2. Evidence of content validity should be based (a) on a careful determination and analysis of the domain(s) of interest and of the relative importance of topics within the domain, and (b) on a demonstration that the test is an appropriate sample of the knowledge or behavior in the domain(s). A report on evidence of content validity should present descriptions of the procedures employed in the study, including the number and qualifications of experts involved in the analysis of the domain or evaluation of the relevance and appropriateness of the test. 3. Construct validation should be based on: rational and empirical analyses of processes underlying performance on the test in question including, where appropriate, noncognitive as well as cognitive functions. Empirical evidence relevant to the analyses should include results of investigations of the degree to which test scores are related or unrelated to other variables in ways implied by intended interpretations. 4. Criterion-related validation should be used only when technically sound and a. Criterion-related validation should involve as many performance variables b. Criterion-related validation should not combine variables to form a single c. Criterion data should be collected in a way that permits an assessment of 91-170 0-82--29 26 5. Interpretations of correlations between test scores and criterion variables should take into account such factors as sample size, criterion reliability, possible restriction in the range of scores obtained in the validity study sample, and other contextual factors. 6. The method(s) by which any validation is accomplished should be fully documented; such documentation should include appropriate details such as the nature and reliability of the criteria, a description of the subjects used, the materials surveyed and the qualifications of the experts who made judgments regarding the appropriateness and importance of test content. 7. Where adequate methods are employed to insure equivalence of scores on alternate forms, it is not necessary that each new form be validated. New validation studies should be made if revised tests have substantial changes, such as different item types, or if they sample a revised performance domain. 8. When appropriate and feasible, the validity of a test should be investigated separately for subsamples of the test-taking population. 9. When a name of a test is established, it should not imply more than is justified by evidence of validity. 10. Information should be made available to institutional and agency users that would be of assistance to them in planning and conducting local validity studies. TEST USE Principle Proper and fair use of ETS tests is essential to the social utility and professional Policies A. ETS will set forth clearly to sponsors, institutional or agency users, and examinees the principles of proper use of tests and interpretation of test results. b. ETS will establish procedures by which fair and appropriate test use can be promoted and misuse can be discouraged or eliminated. Procedural Guidelines 1. Program publications should: a. describe appropriate uses and caution against potential misuses of pro- b. explain clearly that test scores reflect past opportunity to learn and dis- c. emphasize that an individual's test score should be interpreted in the con- d. provide appropriate information about test content, difficulty, and purpose e. invite institutional or agency users to consult with the program sponsor f. summarize results of research relevant to the use of the test or cite refer- g. describe adequately and clearly scale properties that affect score interpre- h. advise institutional or agency users that decisions about the application of |