The Educational Testing Act of 1981

(2) The Purpose of testing: To improve children's education

For years, experts have told us that there are flaws in the present norm referenced standardized testing process and that we must use it because "it is all we have." As parents, we are no longer convinced of that argument, and insist that standardized tests be used to complement the educational process, not dictate a child's educational future. Testing must be responsive to the major goal of a public school system-the improvement of the education of children. We are happy to note that many school systems are moving toward testing systems more in keeping with local goals. Children are being tested in every classroom not so much to compare them with other children in distant school systems, but to determine whether they are learning in accord with the objectives of each local school system, and to determine the effectiveness of the system itself in implementing these goals.

(3) The alternative of national sampling

Sampling on a nationsl basis may be all that is necessary to provide meaningful national comparisons. But nationwide tests are not adequate substitutes for careful local assessments, since the approximately 16,000 local school districts produce a variety of local educational objectives, and different timetables at which specific skills are emphasized. Unless we are planning to move into a national curriculum, these differences of timing and emphasis will continue to exist. A child should not be pe nalized because a local school system has chosen a different sequence than the test maker. Nor should these important determinations pass from the hands of local school boards who are accountable to their communities to the hands of test producers who are not accountable at all.

(4) A balance between teaching and testing

Is more time being used in a classroom for "testing" than "teaching?" As public school students return to classes they can expect to take more tests than ever before, which leads to questions such as: (a) Is this an effective use of the professional educator's time? (b) Is this the most productive use of a child's limited classroom time? and, (c) Are there ways we could get the answers we need without further disrupting the already fragmented school day?

(5) Testing on what has been taught

The PTA advocates that children be tested on what they have been taught. We need the results of such tests to know whether the system or the child is responsible when learning does not proceed at an expected rate or in a certain pattern. This is essential if there is to be any accountability of local schools and if testing is to be used as a developmental tool so that a teacher can specifically pinpoint the next appropriate steps for each youngster.

(6) Test development

(a) Parents and teachers are concerned about who participates in test construction and the basis on which these people are chosen. This information would be useful in understanding the way in which tests are constructed.

(b) It is critical to know what tests purport to measure so that judgments are confined only to what is measured. Testing companies report that they are blameless for the misuses of test results, yet there are often conflicting points of view among school systems about what test results indicate and what information can be gleaned from the test scores. If it is mandatory for test manufacturers to define the purpose of the tests, parents and teachers will be better able to restrict the use of such data to the purposes for which they were intended.

(c) There is an information void on how standarized tests address the unique testing needs of handicapped and disadvantaged youth.

(7) Test usage

(a) We desperately need information on how tests are used, when and by whom. Discussion is necessary to determine how selective we have been in accepting a whole range of testing to which we subject children.

(b) Multiple choice or short answer types of questions limit the ability to measure a full range of skills. For example, the ability to write well, to organize and transfer thoughts to paper, requires testing of a written sample, yet for many years we have tested writing skills by multiple choice questions. Why? Because it is easier to score. We reduce writing to the rote priciples of grammer, spelling and punctuation, and then wonder why children's writing skills have deteriorated. We must recognize that methods of testing must rely on criteria other than "what the computer can efficiently score."

(8) The responsibility associated with testing

There is no clear indication of who is responsible when national tests backfire or are misused. At present, some problems are caught by random spotting, other are suspected and only verified when a test taker has been able to review the disclosed test questions. Is the test maker responsible? Is the school responsible? Or, is it only the student user?

Certainly, tests are only one measure of a person's ability, but because they are so broadly used, the significant questions about their effect and reliability must be thoroughly aired and other potential measurement techniques carefully considered. As parents and educators, we support the principles of H.R. 1662 as a means for requiring the disclosure of information which will address the eight concerns just raised and we feel that the following are hallmarks of the legislation:

(a) The bill provides for the standard error of measurement to be shared not only with teachers, but with test takers and their parents. Many parents will, for the first time, be aware of the broad range of precision evident in these scores, even though life and career decisions are being made of the basis of a few points.

(b) The legislation includes protections to the right of privacy of family and test takers in ensuring that research will be limited to data which are not personally identifiable, unless approved by the test taker.

(c) The bill mandates the release to test takers of their actual test questions and answers within a reasonable period following the tests. Also important is a procedure for review of a challenged test score within a reasonable time frame. We are hopeful that the section which will require a description of special services to accommodate handicapped students will allow schools to deal more effectively with the needs of such students for differentiated testing. We are finding, for example, that many students with writing difficulties can be tested appropriately by oral tests. To try to force written tests on such youngsters is cruel and unproductive. In closing, we would like to emphasize several points:

(1) Charges have been made that these proposed changes will dramatically increase the cost of testing and damage test makers' ability to compile quality test questions. This would be a poor excuse for maintaining a system of testing which is falling under serious challenge. Schools and families now spend over a quarter billion dollars a year on testing, and the appropriate question might well be "are we buying instruments that tell us what we need to know? We cannot evaluate the changes that may be necessary unless we can review what the true costs of the present system cover.

(2) Testing is a large private for-profit industry which has a tremendous impact on a public phenomenon-education. The public has a right to scrutinize this industry and to hold it accountable.

(3) Parents, teachers, and students should be able to review tests and testing materials to ensure that they are understandable. This is particularly important for tests like the SAT and the accompanying disclosure form. There is evidence to indicate that the present method for requesting a copy of the SAT and scored answer sheet is confusing students and is causing a low request rate for this information. (4) There should be further study as to the effects of coaching on test scores. If coaching has a significant effect, this should be acknowledged, and those who use test scores as an important indicator of ability should be informed.

(5) The National PTA does not seek an end to testing. But we are seriously concerned that the present degree of the art of assessment is far less precise than we have been led to believe. We feel the questions about our present testing methods will not be put to rest until parents and teachers are able to make decisions based on much more information than is now available to us. That is why this legislation is so important. We seek the information that will give all of us dependable data to help determine the course of our public schools and the appropriate direction for our children. The National PTA stands ready to participate in any reasonable effort to reach that goal.

Since you are a recognized expert in the area
of "coaching" for standarized tests, I have several
questions which I would like you to address for the
hearing record on the Educational Testing Act of 1981.
I would appreciate your prompt reply.

[merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small]

In connection with the hearings on the Educational Testing Act of 1981, I appreciate the opportunity to answer for the record the four questions addressed to me in your letter of November 16:

(1) Is the FTC coaching study seriously flawed and, if so, how?
Is it possible to use the FTC findings in a technically
sound way to estimate the benefits of coaching and, if not,
why not?

The FTC coaching study is seriously flawed by virtue of employing a research design that cannot control for selection bias or self-selection bias. Selection bias refers to any systematic differences between the coached and uncoached students selected for study that may affect SAT performance in addition to the fact that one group was coached and the other was not. If these systematic group differences result from student choice of the coaching program rather than from experimenter choice of the student, they are called self-selection bias. Thus, in the absence of appropriate controls, certain personal factors characteristic of students attending a particular coaching program, such as their motivation or career aspirations, may be responsible, at least in part, for subsequent SAT performance that appears to be the result of the coaching experience.

The standard prescription for avoiding selection bias is an experimental design entailing random assignment of students to coaching treatment groups and noncoaching control groups, for only with random assignment can coaching effects be presumed to be independent of prior status on any of a host of personal or background characteristics. The FTC coaching study did not employ random assignment but, rather, compared students attending two selected coaching schools with uncoached students (selected from College Board test files) who took the SAT during the same time period in the same geographical area. In the absence of randomization, there is an inevitable equivocality in the interpretation of these results because some unmeasured

personal characteristics might have influenced both the students' decisions to participate in the coaching program and their subsequent SAT performance, which would tend to inflate the coaching program's apparent effectiveness.

When demographic and personal characteristics of the coached and uncoached groups in the FTC study were contrasted, it was found that the coached group was significantly higher than the uncoached group in high school class rank, parental income, most recent English grades, most recent math grades, and number of years of math taken. In addition, the coached group included significantly more nonpublic school students than the uncoached group. Although appropriate statistical adjustments were made for these and other preexisting group differences for which measures were available, there is no way to take into account unmeasured factors also likely to differentiate the groups, such as student motivation or level of parental education. In the absence of random assignment of students to coached and uncoached groups and especially in view of the large and extensive differences confounded in the nonrandomized groups ultimately analyzed in the FTC study, any score effects derived from these data must be interpreted as combined coaching/self-selection effects. In the absence of random assignment, there is no technically sound way to disentangle the effects of coaching from the effects of self-selection in the FTC study, but some rough estimates of the likely proportion of each might be obtained by contrasting the FTC results with the findings of all other SAT coaching studies, some of which employed random assignment. This approach is discussed in the response to the next question.

(2) Is it possible to estimate the benefits of coaching for the SAT from other coaching studies and, if so, what are the likely score improvements attributable to coaching?

The FTC coaching study is not unique in being seriously flawed all of the available studies of coaching for the SAT are flawed in one way or another. Most of the studies were subject to the influence of selection bias discussed in response to question (1), which severely compromises interpretations as to the source or determinants of score effects in particular whether they may be unequivocally attributed to coaching experiences as opposed to personal or background characteristics of the (self-) selected students. In this regard, some of the studies involved control groups of uncoached students attending different schools from those of the coached students or else drawn from other extrinsic sources such as test-score files, thereby confounding coaching effects with differential school effects and numerous selfselection factors. In some other studies, control groups of uncoached students were specially constituted to match available samples of commercially coached students on a number of variables, but this still allows systematic differences between the groups on unmatched variables. Another defect common to several studies is an unfortunate reliance on small samples of coached students, which results in imprecise estimates of score effects and a reduced likelihood that

« iepriekšējā Turpināt »

Grāmatas

The Educational Testing Act of 1981: Joint Hearings Before the Subcommittee ...