Lapas attēli
PDF
ePub

inappropriate for the verbal data because of self-selection effects which were at least partially captured in differential group growth rates. More appropriate growth related adjustment models yielded verbal coaching effects about one-half the size reported by the FTC report.

The mathematics data were found to be more consistent with the standard ANCOVA model and thus more likely to yield reasonable estimates of coaching effects given the available control variables. This is not to say that the FTC and the ETS reanalysis estimates of the mathematics coaching effect are not overestimates, since the only self-selection causes that have been controlled for were those reflected in differential growth rates and/or available demographics. One missing piece of information is the reliability of the pretest scores. When individuals self-select to treatments, commonly held psychometric wisdom suggests that the standard ANCOVA model can be expected to underadjust to the extent that the pretests are less than perfectly reliable, thereby yielding an overestimate of the coaching effect.

(3) The FTC concludes that the students...

who attended the most "effective" school tended to be underachievers on standardized exams, i.e., they scored lower on standardized exams than would have been predicted given their personal and demographic characteristics (including such factors as grades in school and class rank). If this underachieving was random rather than systematic, the results showing the benefits of the coaching received at School A might have been overstated. Analysis was conducted, however, showing that the underachievement by the students was not due to chance, and probably would have continued in the absence of coaching.

(Executive Summary)

The above conclusions were based on examining the size and direction of the deviations from the weighted pooled regression line. The regression slope and intercepts for this pooled regression line were primarily determined by the noncoached population since the ratio of noncoached to coached individuals is approximately 4:1 in this sample. The FTC report of larger mean deviations for the coached group is therefore not unexpected. The directions or sign of the deviations would depend on the way in which the regression hyperplanes differ. The FTC analysts do not appear to have tested the critical assumptions of homogeneity of the within population regressions before they carried out and interpreted the difference in mean residuals. It would appear that

any conclusions about coaching as working best for "underachievers' is not warranted based on this type of analysis.

A more satisfactory, yet still inconclusive, analysis would be centered on the coached population only, and would include the regression of the first SAT on the PSAT Score and significant demographics as control variables. This analysis would be repeated for the second SAT. That is, the second SAT would be regressed on the first SAT and control demographics. The comparison of the relative sizes of the regression weights associated with the PSAT and the first SAT would indicate if a particular subgroup of the coached population benefits more from coaching* The problem here is that this comparison would only be valid if one first shows that the growth rates of subsets of the coached population are the same in the absence of intervention.

*e.g., If the raw score regression weight is greater than 1.0 for the PSAT but less than 1.0 for the first SAT, then one could infer that when background variables are controlled, low scoring individuals gain more from coaching. Since we don't have reliability estimates for the PSAT and SAT for this population, such conclusions would have to be tentative.

Appendix 2 Stroud Report

Reanalysis of the Federal Trade Commission Study of
Commercial Coaching for the SAT

T.W.F. STROUD

Abstract

SAT scores of students attending three commercial coaching schools are compared with those of uncoached students by means of multiple regression techniques. The techniques differ from those used in the Federal Trade Commission (FTC) reanalysis in a number of ways which are described in this report. One essential difference is that the SAT scores were predicted using a multiple regression equation based on uncoached students rather than pooled across coached and uncoached students. Residuals from predicted values based on this equation were then computed for coached students; these residuals were averaged within peak test administration months within coaching schools to produce coaching/self-selection effects. Bayesian methods, utilizing the concept of "borrowing strength" or "smoothing" to obtain estimates for smaller schools, were employed to estimate the overall effect at each of the three schools, averaged out over years.

The general impression given by this analysis is similar in spirit to that given by the FTC reanalysis: for the school with the strongest coaching/self-selection effects, smoothed estimates of the effects averaged over years range from 16.9 points on SATMath for Juniors to 28.5 points on SAT-Verbal for juniors, with standard errors in the range 9.2 to 18.6; for other schools the smoothed estimates of the effects tend to be smaller.

In exploring interactions between the coaching/self-selection effect and various background variables, it was discovered that at one school showing overall negligible effects black students exhibited a much higher average coaching/self-selection effect (46 points) than whites. This interaction appeared to be separate from an interaction with parental income, which was also found to be statistically significant.

Summary of Background

This report is a reanalysis of data which were first analyzed by the Boston Regional Office of the Federal Trade Commission (FTC) (1978) and then reanalyzed by the Bureau of Consumer Protection (BCP) (1979) in the FTC central office in Washington, D.C. The purpose of the original study was to determine how much effect preparation under the guidance of commercial coaching schools had on the Scholastic Aptitude Test (SAT) scores and the Law School Admissions Test (LSAT) Scores of the clients of those schools. The reanalysis of the SAT data was undertaken by the BCP because serious defects had been discovered in the original Boston office analysis; specifically, observational studies had been made on two groups, coached and uncoached students, and regression lines of SAT Score on Preliminary Scholastic Aptitude Test (PSAT) Score had been compared with no attempt either to present standard errors or to correct for values of background variables. There are, in fact, systematic differences between coached and uncoached students with respect to such background characteristics as high school achievement, race, and self-reported parental income. There are also systematic differences in the distribution of these variables for students attending the two schools included in the FTC (1979) reanalysis.

The purpose of the present study is an independent investigation of the SAT data analyzed and reanalyzed by the FTC. Extensive use is made here of background variables, as was done in the FTC reanalysis. However, this reanalysis differs from the FTC reanalysis in a number of ways. These ways will be described, following a summary of the FTC reanalysis.

The FTC data set began with about 2000 enrollees at two coaching schools in the metropolitan New York area during the period 1974-77 for whom SAT scores and background data were available. There were 600 enrollees whose names were not found in the ETS file of test-takers in certain zip code areas of Connecticut, New Jersey, and New York which were considered to be primary market areas for the participating coaching schools. Most of these, presumably, had not previously and did not subsequently take the SAT or did not take it in the metropolitan New York area. There was also a third coaching school whose enrollees' data were not analyzed by the FTC, because the number of individuals involved was considered too small. A control group of about 2500 uncoached students was chosen from among persons taking the

SAT during the same three-year period in the same geographical area using a systematic sample of every 150th student in the file. provided by Educational Testing Service (ETS).

In the FTC reanalysis, multiple regression analyses were performed on six subsamples:

1. High school juniors taking the SAT for the first time in April 1975. (76 coached and 607 uncoached students)

2. High school juniors taking the SAT for the first time in April 1976. (247 coached and 617 uncoached students)

3. High school seniors taking the SAT for the second time in November 1975. (98 coached and 396 uncoached students)

4. High school seniors taking the SAT for the second time in November 1976. (177 coached and 387 uncoached students)

5. All high school students taking the SAT for the first time on any of the test dates over the 3-year period. (417 coached and 1763 uncoached students)

6. All high school students taking the SAT for the second time on any of the test dates over the 3-year period. (316 coached and 1267 uncoached students)

Background variables included in all regression analyses, in roughly decreasing order of importance, were: pretest score (PSAT Verbal or Math when SAT1, the first SAT taken, was being predicted, SAT1-Verbal or -Math when SAT2, the second SAT taken, was being predicted, using the Verbal pretest to predict the Verbal score and the Math pretest to predict the Math score), selfreported grade in English or math, self-reported rank-in-class, self-reported years of high school instruction in English or math, self-reported parental income, sex, race, high school type, and number of PSATS taken (for predicting SAT1). Coaching school (School A or B) was entered as a dummy variable. Time between pretest and test was also entered, but it did not increase prediction significantly. Only those students with complete data were entered in the regression analyses.

The analyses showed that students from School A scored significantly higher, on the average, than uncoached students. Ninetyfive percent confidence limits for the differences in adjusted means, based on the median lower confidence limit and median upper confidence limit over the 12 analyses (6 subsamples, Verbal

« iepriekšējāTurpināt »