Lapas attēli
PDF
ePub

be particularly out of line in this logarithmic formulation. Considerably more research is needed, of course, with special preparation programs entailing more than 40 hours of student contact time to fill in the gaps on the time line. But, if this suggested logarithmic relationship has substance, then each additional increase in SAT scores may require geometrically increasing amounts of student contact time and of all the curricular effort that contact time may be proxy for.

In summary, the average coaching effect across studies having some type of control group was less than 10 points for SAT-Verbal and less than 15 points for SAT-Math. The special preparation programs in these control-group studies tended to be relatively shortterm and relatively nonintensive in terms of student contact time, three of the longest and most intensive being 30 hours of student contact over ten sessions, 24 hours of student contact over 12 weeks, and 21 hours over seven weeks. For particular groups of students and particular coaching programs, the score increases ranged as high as about 18 points for Verbal and about 24 points for Math (or even somewhat higher if interactions between years of math taken and sex are considered). The larger score increases tended to be associated with the longer and more intensive of these relatively short-term programs, especially in Math. The average program effect across studies having no control groups could only be tentatively estimated, because there is no good way of taking into account the experiential growth of selfselected students in the absence of comparable control groups. The amount of their experiential growth and skill development could be considerable in this case, because all of the programs in the studies lacking control groups happened also to be relatively long-term. In addition, they were relatively intensive in terms of student contact time. The briefest of these programs was 45 hours of student contact over six weeks, and the longest was virtually full-time over six months. The provisional estimate of average program effects for these noncontrolled studies was 38 points for Verbal and 54 points for Math. Although the substantive content of the coaching programs was not systematically evaluated in any of these studies, overall the smaller coaching effects appear to be associated with short-term, relatively nonintensive practice and review, and the larger effects (which occur more for Math than for Verbal) appear to be associated with longer-term, high student-contact programs focussing on skill development.

[ocr errors]

Beyond the FTC Study: Critique of Assumptions and Inferences

This section draws together the major points made in several reviews undertaken at Educational Testing Service (ETS) of the Federal Trade Commission's (FTC) study of the effectiveness of commercial coaching schools in raising scores on the College Board Scholastic Aptitude Test (SAT). The separate reviews overlap somewhat in their criticisms, but they are reproduced in full in the Appendix since some additional points not covered in this summary are raised. The comments that follow are limited to the revised statistical analyses issued by the FTC Bureau of Consumer Protection (BCP) in March of 1979, undertaken in response to the several major data-analysis flaws observed in the memorandum issued previously by the FTC Boston Regional Office (1978).

Before proceeding to develop the critical arguments, we will first briefly describe the object of this critique, namely the FTC study of the effects of commercial coaching on the SAT. Students enrolled in two New York City area commercial coaching schools during the testing years 1974-75, 1975-76, and 1976-77 served as the experimental or treatment group, and a random sample of uncoached persons who took the SAT during the same three-year period in the same greater New York metropolitan area served as a control group. Data from a third coaching school were not analyzed because of its small sample size. Six subgroups were analyzed: (1) high school juniors taking the SAT for the first time in April 1975 (76 coached and 607 uncoached students); (2) juniors taking the SAT for the first time in April 1976 (247 coached, 617 uncoached); (3) seniors taking the SAT for the second time in November 1975 (98 coached, 396 uncoached); (4) seniors taking the SAT for the second time in November 1976 (177 coached, 387 uncoached); (5) all high school students taking the SAT for the first time on any test date during the three-year period (417 coached, 1763 uncoached); and, (6) all high school students taking the SAT for the second time during this period (316 coached, 1267 uncoached). Statistical analyses were actually based on smaller samples than these largely because of missing student descriptive data.

Since the coached and uncoached groups might differ from each other in a number of ways possibly relevant to the treatment, the demographic and personal characteristics of the two groups were contrasted. It was found that the coached group was significantly higher than the uncoached group in high school class rank, parental income, most recent English grades, most recent math grades, and number of years of math taken; in addition, the coached group included significantly more nonpublic school students and fewer public school students than the uncoached group. Before multiple regression analyses controlling for these and other background variables were conducted, the possibility of differential coaching impact on good and poor students was first discounted by noting a lack of interaction between PSAT scores and coaching

treatment.

The multiple regression analyses, which controlled for PSAT (or first SAT) as well as for the several relevant background variables, yielded neglible effects for students at one school and statistically significant effects for students at the other, where the impact for SAT-Verbal was found to be 30 and 27 points, respectively, for first- and second-time SAT takers over the pooled time periods and 19 and 28 points, respectively, for SAT-Math over the same periods. Since these values represent combined coaching and selfselection effects by virtue of the confounding between pre-existing group differences and the coaching treatment, the FTC report then presented an analysis of potential self-selection bias. In a regression analysis of the pooled treatment and control groups, coached students were found to achieve lower PSAT (or first SAT) scores than were predicted from their background characteristics, whereas uncoached students scored slightly higher than expected given their personal and demographic background. This type of self-selection was characteristic of students attending the apparently effective coaching school but was not generally found at the ineffective school. In an effort to eliminate the self-selection effect of this underachievement on the PSAT (or first SAT), the regression analyses were repeated dropping the PSAT (or first SAT) from the set of covariates. As a consequence, the estimated effects were greatly reduced. For example, the effects for SAT-V and -M for first-time SAT takers over the pooled time periods in the previously effective school were 11.5 and .55, respectively, which are no longer statistically significant; the effects for second-time SAT takers over the same period were 16.2 and 16.6, respectively, for SAT-V and -M, which remain statistically significant. It was then argued that this reanalysis would be appropriate

only if the underachievement on the PSAT for coached students. were due to chance. Since an analysis of PSAT scores for students coached between their first and second SAT exams revealed a similar pattern of underachievement on both the PSAT and the first SAT, it was concluded that the phenomenon was not random but was characteristic of students self-selecting coaching, that is, they were underachievers on standardized tests and would likely continue to be test underachievers in the absence of coaching. Therefore, it was argued, the prior results showing coaching effects in the 20- to 30-point range for both SAT-V and -M at one coaching school were the most defensible findings. Those findings still represent estimates of combined coaching and self-selection effects, however, since this attempt to analyze self-selection as underachievement on standardized tests did not alter the confounding of coaching with pre-existing group differences or otherwise eliminate the effects of unmeasured self-selection factors.

The most fundamental issue concerning the FTC coaching study is that it undertakes an interpretation of available data as a substitute for collecting experimental data. The fact that the data are taken from records and files necessarily puts the study out of reach of the kinds of experimental controls that would permit clear, unambiguous interpretation of findings-i.e., it is a quasiexperiment rather than a true randomized experiment. Such a study does not involve random assignment of students to coaching and noncoaching conditions, and in the absence of randomization some interpretive equivocality is inevitable. We hope to reduce-but cannot eliminate-this equivocality by conducting multiple alternative statistical analyses. Summaries of two such. reanalyses are included in Sections IV and V of this report and the full texts are reproduced in the Appendix.

The Power of Randomized Experiments

The value of a randomized experiment warrants discussion here. To conduct a true experimental study of coaching, one assembles a large group of students representative of the kinds of individuals about whom inferences and generalizations are to be drawn. These students are assigned at random to either a treatment (or experimental) subgroup to receive coaching or to a control subgroup for whom the coaching experience is to be delayed. To increase precision, before the treatment subgroup takes the coaching course, a form of the SAT could be administered to both

subgroups as a pretest. Or, to avoid possible pretest-treatment interactions, a different instrument might be used as a pretest or some other proxy measure of ability or achievement used as a covariate. Effective control conditions should be established to maintain motivation, avoid attrition, and otherwise assure that the two groups remain comparable except that one receives coaching and the other does not. At the end of the coaching period, the SAT is administered as a posttest to both treatment and control subgroups. The experimental data may be analyzed in any one or a combination of several ways. For example, the analysisof-covariance model uses pretest and other pretreatment variables to adjust for any differences that might exist between the two randomly assigned subgroups on those variables. Since by randomized design and the maintenance of effective controls the only systematic difference between the two subgroups is that one received coaching and the other did not, differences observed on the outcome or posttest measure can be confidently attributed to the coaching experience within some range of standard error.

In a nonrandomized design or quasi-experiment such as the FTC study, in which the coached students were those who had enrolled in coaching schools and the "control" students were drawn from another source, there is no way of discounting alternative reasons for the difference observed on the outcome or posttest measure. The difference might result from the coaching experience or, on the other hand, might simply reflect differences in the characteristics of the two groups existing prior to the coaching experience. One powerful feature of the randomized experiment in this regard is that we can attach probabilities to the likelihood of these alternative events. Thus, although we have other reservations regarding the analysis and interpretation of the FTC study, our chief reservations are those that relate in one way or another to the nonrandomized nature of the study.

Confronting Treatment

and Control Group Differences

As a consequence of the nonrandom assignment of subjects in the FTC study-indeed, the highly self-selected nature of the treatment group-the data offer numerous opportunities for alternative interpretations. The FTC-BCP (1979) report recognizes the limitations of quasi-experimental data and takes care to avoid the type of simplistic analyses undertaken by the Boston Regional

« iepriekšējāTurpināt »