Lapas attēli
PDF
ePub

sample tested in April of the junior year (t,) and retested in November (t2) of the senior year are also plotted in Figures V-1 and V-2. These plots represent the 1975 and 1976 populations. Inspection of the verbal plots (Figure V-1) suggests that gains of 12 to 17 points from April to November appear to be commonplace for junior to senior retesters. The fact that the control sample shows a gain of 11 points over the same interval from t1 to t2 suggests that they are reasonably representative of junior to senior retesters. Similarly, in Figure V-2, the April to November gain in SAT-Math for the control group is consistent with the 17-point gain for the 1975 and 1976 junior to senior retesters in the national sample. The control group's rate of gain is consistent and linear over both plotted time periods to to t, and t, to t2.

An adjustment index may be derived from these data which takes into consideration differential group growth rates in the absence of formal intervention. If the coached group grows at a faster rate than the control group prior to intervention, as might be expected by virtue of its higher initial mean score, the adjustment index (b*) in the growth model is greater than unity, in contrast to the adjustment index (b) in the traditional ANCOVA model for test-retest data which is typically somewhat less than unity. Using adjustment indices derived from the FTC data, treatment effects were estimated for the growth model and contrasted with the effects obtained from the standard ANCOVA model.

Table V-1 presents ANCOVA and growth-model adjustment indices and coaching effects, as well as the mean growth rates for the coached and control students for Verbal and Math. These estimates of growth are based on the time period between the first and second testing (to to t1) and thus are not confounded with the coaching intervention. The estimates of growth rates assume linearity of growth, which would seem to be a reasonable assumption given the restricted time period and, even more importantly, given that the observed data for the control group on both Verbal and Math conform quite closely to a linear growth model. If one were to expect a deviation from linearity, it would more likely be in the direction of a steeper slope (in the absence of intervention) between the second and third testing since the motivational level would be at least as great during this time period as in the previous time period between taking the PSAT and the junior-year SAT.

The estimates of the coaching effect under the growth model are about 17 points on Verbal and 31 points on Math. This finding that the Verbal effect is substantially less than the Math effect is

in marked contrast to the ANCOVA estimates of quite comparable V and M effects. The growth model appears to yield an estimate of the Verbal coaching effect that is more consistent with earlier studies and with expectations that Math, being generally more curriculum related than Verbal, might be more responsive to coaching or special preparation.

The results for SAT-Verbal in Table V-1 call into question the adequacy of the standard ANCOVA to correct for selection effects when they are present in the form of differential group growth rates not predicted from available covariates. The ANCOVA adjustment index (b) is approximately 1.0 for the Verbal data and somewhat paradoxically 1.25 for the Math data, which is the one situation displaying little or no evidence of differential group growth. However, since the growth rates appear to be static for the Math data, the ANCOVA model is probably the more defensible approach for estimating the coaching effect for Math because ANCOVA CONtrols for all available measures of pre-existing difference, not just those related to differential growth.

In summary, then, examination of testings at three points in time suggests that: (1) the traditional ANCOVA approach used by the FTC is inadequate for the Verbal data because of self-selection effects which are at least partially captured in differential group growth rates; (2) a more appropriate growth-related adjustment

Table V-1

Mean Growth Rates, Adjustment Indices, and Estimated
Coaching Effects Under Different Model Assumptions

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small]

'The average gain in points per month is estimated in the absence of intervention.
Growth model estimates b; and a; are the adjustment index and the estimated effect
based on the group growth rate estimates in the first column. The ANCOVA estimates
are the standard estimates arrived at using the list of control variables presented in
Table 2, Appendix 3. The ANCOVA b, is the net adjustment index; when used in the
usual ANCOVA equation it yields the effect estimate a,.

model yields Verbal coaching effects about one-half the size reported by the FTC; and, (3) the Math data are more consistent. with the ANCOVA model. Thus, the resulting FTC estimate of the coaching effect in Math is more likely to be reasonable given the available control variables than is the growth model estimate, since the latter does not adjust for group differences in background variables unrelated to growth. This is not to say that the FTC estimates of the Math coaching effect, as well as those of the present analyses, are not overestimates (or underestimates), since the only self-selection causes that have been adjusted for were those reflected in differential growth rates and/or available demographics.

VI

Implications for Educational and
Testing Policy and Practice

In summary, the FTC study of commercial coaching found neglible effects for students attending one coaching school and combined coaching and self-selection effects of about 20 to 30 points on both SAT-V and -M for students at another school. The reanalysis by T.W.F. Stroud, using a more sophisticated analysis-ofcovariance design and all three coaching schools in the FTC data set, yielded similar overall results: combined coaching/self-selection effects in the neighborhood of 20 to 35 points for both SAT-V and -M at one school and inconsistent and negligible effects at the other two schools. The effects due to coaching per se at the one apparently effective school are probably lower than this, however, because of the confounding with self-selection factors that influence both attendance at that coaching school and performance on the posttest SAT. As we have seen, for example, when factors related to differential group growth rates in the treatment and control groups were taken into account in the reanalysis conducted by D.A. Rock, the combined coaching/self-selection effect for SAT-V dropped to about 17 points while that for SAT-M, which did not exhibit differential group growth rates in these data, remained at about 30 points. But it is impossible to determine with any confidence whether the effects obtained in the FTC study may be attributable in whole or part to uncontrolled selfselection factors rather than to any impact of the coaching program as such.

Thus, overall, the FTC study appears to reveal considerable variability in the coaching/self-selection effects associated with coaching-school attendance, with an estimated combined effect for students at one school of about 20 to 30 points on SAT-Math and very likely about half to two-thirds that (as reflected in corrections for differential group growth rates) on SAT-Verbal. In addition, the sporadic emergence of significant interactions indicates that particular types of students, such as those highly motivated to achieve, or students with particular cultural backgrounds, such as blacks or Asian Americans, might sometimes exhibit larger score increases in some commercial coaching programs. Further research is needed on this problem since the samples of

blacks and other minorities in the FTC study were exceedingly small and somewhat atypical. On balance, however, the overall findings for commercial coaching appear generally consistent with the results of prior studies on the effectiveness of special preparation programs offered by high schools, especially the longer and more intensive ones.

To pursue this latter conclusion in more detail, let us inquire how the results of the FTC reanalyses jibe with the rankings of prior coaching studies in regard to student contact time and magnitude of score effects, as summarized in Table II-3. The coaching program at School A in the FTC study entailed 40 hours of student contact time while that at School B involved 24 hours. If it is assumed that roughly half that time in each case was devoted to Verbal coaching and half to Math coaching, then the Verbal student contact time of School A would receive a rank of 8 when added to the rank-order correlations of Table II-3, while the associated Verbal combined coaching/self-selection effect (using the weighted average of estimates from Table IV-1) would receive a rank of 7. The rank for the Math student contact time at School A would be 5, while that for the associated Math combined coaching/self-selection effect would be 4. The corresponding values for School B would be rank 10 for Verbal student contact time and rank 13 for combined SAT-V effect, and rank 7.5 for Math student contact time and rank 11 for combined SAT-M effect. The consistency in these associated ranks between student contact time and magnitude of combined coaching/self-selection effects, especially for School A where the associated ranks differed by only. one ordinal position, suggests that the FTC results are of a comparable order with the results of prior studies. With the values for School A included, the new correlations between rank-order of student contact time and rank-order of score effect are .77 for SAT-V across 18 studies and .78 for SAT-M across 11 studies. Further, if the statistically unreliable values of School B are added, the new correlations become .77 for SAT-V across 19 studies and .74 for SAT-M across 12 studies.

If it were to be shown that relatively intensive coaching could substantially improve scores for some students on the SAT, this would have important implications for both educational and testing practice. In considering what these policy implications might be, it would be important to know whether any increased test scores attributable to coaching represent stable improvements in the verbal and mathematical reasoning abilities measured by the SAT or whether they reflect improved facility in overcoming inadvertent sources of test difficulty unrelated to these reasoning abil

« iepriekšējāTurpināt »