The Educational Testing Act of 1981

Contents

II. Before the FTC Study:

The Context of Prior Findings on Coaching for the SAT ...

III. Beyond the FTC Study:

Critique of Assumptions and Inferences..

IV. Estimation of Combined Coaching/Self-Selection

Effects in the FTC Study: Detailed Summary and

Preface

This report examines evidence and arguments about the effectiveness of coaching for the SAT. It views the issue as being much more complicated than the simplistic question of whether coaching works or not. Coaching in and of itself is not automatically to be either rejected or encouraged; it has to be analyzed and evaluated-it matters what materials and practices are involved, at what cost in student time and resources, and with what effect on student skills and attitudes as well as test scores.

The SAT measures developed abilities of verbal and mathematical reasoning and comprehension that are acquired gradually over many years of experience and use in both school and nonschool settings. By virtue of this gradual development, these intellective skills are relatively difficult to improve markedly through brief courses of intervention in the final year or two of high school when the SAT is typically taken. Since these abilities are learned in manifold ways through both instruction and experience, one would expect high quality instruction over extended periods of time to improve them and hence to increase SAT scores. Indeed, score gain across the high school years is the typical pattern exhibited by students taking the SAT. Since coaching at its best is a form of teaching, the key questions are whether the coaching experience is of sufficient quality and sufficient duration to yield significant skill improvement as well as score improvement over and above the experiential growth that would have occurred regardless of the coaching program. If significant improvement requires relatively large amounts of student time devoted to coaching, then the problem becomes expanded to include questions of the difference between coaching and instruction and of the instrumental role of comprehension and reasoning skills in school learning as well as their status as explicit objectives of school learning.

Thus the issue is not just whether coaching works or not, but how much student time devoted to what kinds of coaching experiences yield what level of score improvements in comparison with the level of experiential growth occurring without those coaching experiences. Moreover, since students with different personal and background characteristics often exhibit different performance characteristics and probably even learn in different

ways, we should be alert to the possibility that coaching programs, like other forms of teaching, may have differential effects for different kinds of students.

The second printing of this report incorporates the changes previously noted on an errata sheet prepared after the first printing. Many individuals have contributed to this effort. We would like to thank Thomas Donlon, Garlie Forehand, and Winton Manning for their careful review of the manuscript; Nathaniel Hartshorne for his editorial comments; Rex Jackson and Stephen Ivens for their helpful suggestions on the presentation of arguments and of summary data; Lloyd Bond, Robert Glaser, and Robert Linn for their advice on research and policy implications; and, Anthony Bryk for his comments on one of the applications of the growth model included in the first printing of this report. Special thanks go to John Tukey for his general review of the analysis and its ramifications and, in particular, for his gentle insistence that student contact time in some way or other holds the key to understanding coaching effects.

Samuel Messick
Princeton, New Jersey

Overview

This report presents a critique and reanalysis of the Federal Trade Commission's (FTC) study of commercial coaching for the Scholastic Aptitude Test (SAT). The FTC Study is one of the largest studies of coaching ever done and one of the few studies of commercial coaching extant, so it merits careful examination. But it is not the only coaching study ever done nor is it free from problems of design, so it should be examined in the context of prior findings.

The first part of the report summarizes the major results of earlier studies in a way that draws special attention to the strengths and limitations of the various study designs. One of the most important of these design features is random assignment of examinees to coaching treatment groups and noncoaching control. groups, for only with random assignment can we consider treatment effects to be independent of prior status on any of a host of personal or background characteristics. With random assignment, there are no systematic differences between the experimental and control groups initially and if effective control conditions are maintained, the only systematic difference that will eventuate is that one group will have received coaching and the other will not. In the absence of randomization, as is the case in the FTC study, there is an inevitable equivocality in the interpretation of the results because some unmeasured personal characteristics might have influenced both the student's decision to participate in a coaching program and that program's apparent effectiveness. That is, the same personal factors that led a student to attend coaching school may be responsible, at least in part, for subsequent SAT performance that appears to be the result of the coaching program. A number of factors that might lead a person to seek coaching may also by themselves explain why such a person would subsequently perform better than expected on the outcome measure or posttest; for example, a student might not have scored as well on the PSAT or SAT as he or his parents expected in light of high school grades or, in contrast, he might be highly motivated to earn a high score to compensate for a prosaic high school performance. Thus, the effects of self-selection are confounded with effects of the coaching treatment in nonrandomized

studies and, consequently, self-selection factors afford plausible rival explanations for the results, or for part of the results, that might otherwise be identified as coaching effects. In such nonrandomized designs, researchers usually attempt to control statistically for those potential self-selection factors that have been measured and to analyze the data in alternate ways to assess the sensitivity or robustness of the findings under various plausible assumptions, but there is no way to adjust statistically for selfselection factors that have not been assessed.

An historical appraisal of the effects of coaching on the SAT is complicated because most of the investigations prior to the FTC study were concerned with diverse special preparation programs typically offered by secondary schools. Furthermore, some of these studies used nonrandomized designs and therefore are hampered by the same problems of interpretation that affect the FTC study; some were also poorly controlled and involved small samples. On balance, the average effect associated with participation in a coaching or special preparation program according to those earlier studies that included some type of control group was less than 10 points for the SAT-Verbal score (on a scale running from 200 to 800 points) and less than 15 points or so for the Math score. For example, two studies were conducted using random assignment of students to coaching and control conditions, one dealing with SAT-Math and the other with SAT-Verbal: In a study by Evans and Pike (1973) an average effect of slightly over 16 points for SAT-M was obtained for special preparation involving seven 3-hour sessions with 21 additional hours of homework; Alderman and Powers (1979) estimated an overall special preparation effect across eight secondary schools of about 8 points for SAT-V. However, for particular groups of students and for particular coaching treatments, estimated effects of over 20 points were also reported in various studies. Although substantive content of the coaching programs was not systematically evaluated as part of these studies, the smaller effects appear to be associated with short-term cramming or drill-and-practice and the larger effects, found more often for Math than Verbal, with longer-term intensive programs involving skill development. In addition, results in at least two earlier studies suggested possible interactions for SAT-M as a function of years of math taken and sex. Relatively consistent with this historical context, the FTC study found negligible effects for students attending one commercial coaching school and average score increases of about 20 to 30 points for both SAT-V and -M for students attending another school (where

« iepriekšējā Turpināt »

Grāmatas

The Educational Testing Act of 1981: Joint Hearings Before the Subcommittee ...