In the same study, Ferguson and Ladd find that the difference in
math score gains between the top and bottom quartile of districts is
2.77 standard deviations. Thus, a gain of 0.026 standard deviations
is a trivial effect.
In sum, the Ferguson and Ladd study provides relatively weak
support for a relationship between graduate training in education
and student achievement. They find much stronger support for a link
between teachers' general academic achievement (as measured by the
ACT) and student test scores.
c. One of the studies on which the
commission relies is a meta-analysis conducted by Greenwald, Hedges,
and Laine. The NCTAF describes this as a review of "sixty production
function studies," and summarizes the GHL findings in a chart that
purports to show the gain in student test scores per $500 invested
in various educational reforms: reducing class size, raising teacher
pay, increasing teacher experience, and investing in teacher
education. The largest "effect" is the impact of spending on teacher
education: according to the commission, for every $500 invested in
teacher education, student test scores will rise by more than
two-tenths of a standard deviation. As the standard deviation of
student achievement is large, this is a very impressive rate of
return. But there is much less here than meets the eye.
First, the commission's description of the GHL study is
inaccurate. The relevant table in GHL (Table 6, p. 378) shows that
the findings on teacher education are based not on 60 studies, but
eight. Moreover, "teacher education" as used by GHL refers merely to
whether a teacher has earned an advanced degree. The "investment" in
teacher education described by the commission is simply the extra
salary teachers typically receive for holding a master's degree. The
legend accompanying the chart in Doing What Matters Most is
therefore misleading. As a reading of the GHL study shows, it is not
a one-time investment of $500 in teacher training that is at issue
here, but rather an ongoing expenditure of $500 per student per
year, GHL's estimate of the salary cost associated with an
increase of five standard deviations in the proportion of teachers
holding an MA. 8
Thus, the claim that we can significantly raise student
achievement by spending $500 on teacher education turns out to be
merely the familiar argument that teachers with master's degrees are
better. We have already discussed the reasons one should be
skeptical about research supporting this claim. This skepticism is
not allayed by a close reading of the GHL study itself.
The same issue of the journal that
published the GHL paper contained a sharply critical review of the
study by economist Eric Hanushek which concluded that "... [GHL's]
manipulations and interpretations systematically distort the
conclusions that should be drawn from the evidence." (Hanushek,
1996, p. 397). Not a criticism of meta-analysis per se, Hanushek's
complaint was based on the selective use of evidence by the
investigators. In the full set of 46 estimates with which the
investigators began, seven found a significant positive effect of
teacher graduate education on student performance, while six showed
a significant negative effect. The remaining 33 estimates
found no statistically significant effect, with approximately
equally many positive and negative point estimates.
To arrive at an estimate that showed a large positive effect for
teacher education, GHL excluded most of these studies from their
meta-analysis and relied on a subset of eight studies which measured
teacher education as a simple dichotomous variable (1 for a master's
degree, 0 otherwise). According to the investigators, they regarded
the dichotomous measure as more reliable than years of
post-secondary education or number of credits (used in the remaining
studies), although they provided no evidence for this belief. Their
results, however, are highly sensitive to this choice. Had they used
all twelve of the available studies to compute the impact of a $500
"investment" in teacher education, they would have found an increase
of just .0015 standard deviations -- basically, a zero effect.
9
Finally, the research considered by GHL comprised a mix of
student and district-level studies. Nearly all relied on
cross-section data. However, most researchers consider studies based
on longitudinal data, which provide both a pretest and follow-up
achievement score for each student, to furnish more reliable
information on the relationship of schooling inputs to student
outcomes. Only two of the teacher education studies considered by
GHL used longitudinal data. If their $500 calculation had been based
on these studies alone, they would have found a small
negative effect of a teacher MA on student achievement (-.05
standard deviations).
Teacher Qualifications and NAEP Scores
The evidence reviewed in Doing What Matters Most is not
limited to the four studies we have considered in detail. Some of
the additional discussion is perfunctory, consisting of by-now
familiar assertions that "hundreds of studies" support the
commission's claims, with citations to the education literature. We
will not consider such remarks here, which we examined in an earlier
paper (Ballou & Podgursky, 1997). However, Doing What Matters
Most also contains discussion of statistical evidence assembled
by the commission, notably a claim that the commission's reform
proposals receive support from results of the National Assessment of
Educational Progress (NAEP), the periodic examination of American
students' learning in various core subjects conducted by the U.S.
Department of Education.
The commission writes:
The National Assessment of Educational Progress has
documented that the qualifications and training of students'
teachers are also among the correlates of reading achievement.
Students of teachers who are fully certified, who have master's
degrees, and who have had professional coursework in literature-
based instruction, whole language approaches, study strategies,
and motivational strategies do better on reading assessments (see
Table 1). (DWMM, pp. 11-12)
We reproduce a portion of Table 1 from the commission's report
below. 10
The entries in the table are the mean scores on the fourth grade
reading test, contrasting students whose teachers have low or
substandard qualifications (by the commission's reckoning) with
those whose instructors have had superior training.
Table 1
CORRELATES OF READING ACHIEVEMENT
(Average Student Proficiency Scores, National
Assessment of Education Progress, 1992)
|
Correlates of Reading Achievement |
|
Lower Scores |
|
Higher Scores |
| |
|
|
|
|
|
Teacher Qualifications |
| |
|
|
|
|
|
Level of Certification |
|
Substandard or none
214 |
|
Highest level
219 |
| |
|
|
|
|
|
Levels of Education |
|
Bachelor's Degree
215 |
|
Master's Degree
220 |
| |
|
|
|
|
|
Coursework in literature-based instruction |
|
No coursework
214* (216) |
|
Yes coursework
218 |
| |
|
|
|
|
|
Coursework in whole language instruction |
|
No coursework
214* (216) |
|
Yes coursework
218 |
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
Reproduced from Doing What Matters Most, p. 12. Entries
marked with an asterisk are misreported; correct numbers appear in
parentheses.
In several respects, the commission's claims do not stand up to
scrutiny. First, the NAEP results in the commission's table differ
from those reported by the U.S. Department of Education. The average
1992 score for teachers who had no coursework in literature-based
instruction was not 214, as the commission reports, but 216.4. (All
of our statistics come from the National Center for Education
Statistics NAEP website: http://nces.ed.gov/nationsreportcard/y25alm/almanac.shtml)
For teachers with no coursework in whole language instruction, the
mean student score was 215.8, not 214.
Second, the commission reports results for the 1992 NAEP,
overlooking data from the more recent 1994 assessment. It turns out
the 1994 results are considerably less favorable to the commission.
We present the 1994 results in Table 2 below. Observe that in the
1994 results there is no difference between teachers with bachelor's
degrees and those with master's degrees. Moreover, the result for
whole language instruction has been reversed -- teachers with
coursework in whole language approaches have lower student
test scores than do those who have had no such training.
Table 2
CORRELATES OF FOURTH GRADE READING ACHIEVEMENT
(Average Student Proficiency Scores, National
Assessment of Education Progress, 1994)
|
Correlates of Reading Achievement |
|
Lower Scores |
|
Higher Scores |
| |
|
|
|
|
|
Teacher Qualifications |
| |
|
|
|
|
|
Level of Certification |
|
Substandard or none
212 (3.2) |
|
Highest level
216 (1.4) |
| |
|
|
|
|
|
Levels of Education |
|
Bachelor's Degree
215 (1.2) |
|
Master's Degree
215 (2.0) |
| |
|
|
|
|
|
Coursework in literature-based instruction |
|
No coursework
214 (2.4) |
|
Yes coursework
216 (1.1) |
| |
|
|
|
|
|
Coursework in whole language instruction |
|
No coursework
218 (2.9) |
|
Yes coursework
215 (1.1) |
| |
|
|
|
|
Results are still less favorable to the NCTAF in the 1994
assessment of eighth-grade reading achievement. Eighth grade
teachers with a bachelor's degree have higher-achieving students
than teachers who hold a master's degree (240.4 to 236.3). Teachers
with substandard or no certificates outscore teachers with the
highest level of certificate (263.2 to 261.2). Students whose
teachers lack whole language training do better than those whose
teachers have had it (262.2 to 259.8).
Finally, we have included in Table 2 the standard errors for NAEP
proficiency scores as computed by the National Center for Education
Statistics -- information that the NCTAF did not provide in its
report. These standard errors are so large that it would be unwise
to make much of the contrasts in Tables 1 or 2, since differences of
two or three percentage points are generally not statistically
significant. Had the NCTAF reported standard errors in the first
place (along with correct proficiency scores), informed readers
would have seen at once that there was no meaningful difference
between teachers the commission deems less qualified and those it
regards as more highly qualified. The fragility of the comparisons
in Table 1 is all the more evident given (as we have seen) that
these differences do not hold up in the 1994 NAEP or in the
eighth-grade assessment.11
To summarize, the NCTAF is selective in its use of NAEP evidence,
choosing the year and grade-level that appear to support its thesis
and ignoring the year and grade level that do not. Some of the
numbers the commission presents are simply wrong. Standard errors
are omitted that, had they been reported, would have shown how
little basis there is for the commission's conclusions.
NAEP Scores and State-Level Reforms
Doing What Matters Most devotes
several pages of discussion and three charts to an analysis of NAEP
math and reading test score changes. Rather than plotting all of the
available data, however, the commission plots outcomes for a subset
of states only. For example, for fourth grade math only 11 of 36
potential cases are displayed. (There are only 36 potential cases
because state-level scores are not reported for all 50 states on
each NAEP administration.) For reading, only 9 of 37 cases are
shown.
The discussion that accompanies these charts begins with states
whose efforts are deemed praiseworthy.
The critical importance of investments in teaching is
demonstrated by states' experiences over the past ten
years...Notable among them for the size and scope of investments
were North Carolina and Connecticut. Both of these states coupled
major statewide increases in teaching salaries within intensive
efforts and initiatives to improve preservice teacher training,
beginning teacher mentoring, and ongoing professional development.
Since then North Carolina has posted among the largest student
achievement gains in mathematics and reading of any state in the
nation, now scoring well above the national average in 4th grade
reading and mathematics, although it entered the 1990's near the
bottom of state rankings... Connecticut has also posted
significant gains, becoming one of the top scoring states in the
nation in mathematics and reading... (DWMM, p.
11)
For some praiseworthy states with flat or below-average test
score gains, the commission switches to a discussion of levels
rather than changes.
Meanwhile, there are a number of states that
repeatedly lead the nation in achievement, each of which has made
longstanding investments in the quality of teaching. The three
long-time leaders -- Minnesota, North Dakota, and Iowa -- have all
had a long history of professional teacher policy and are among
the 12 states that have state professional standards boards ...
(DWMM, p. 13)
Other states are chastised.
By contrast state reform strategies during the 1980's
that did not include substantial efforts to improve teaching have
been much less successful. For example, the first two states to
organize their reforms around a student testing strategy were
Georgia, with its Quality Basic Education Act of 1985, and South
Carolina, with its Education Improvement Act of 1984... As figures
7-9 show, student achievement in mathematics has been flat in
these states while achievement in reading declined since 1990.
(DWMM, p. 14)
Unfortunately, this type of ad hoc exercise lacks validity.
Researchers must consider all the data, not just those observations
which fit their theories. As noted, the commission makes much of
recent trends in North Carolina. Governor (and NCTAF chairman) James
Hunt has made his state a showcase for the commission proposals.
However, had the commission plotted data for all states, the reader
would have seen that there were math gains almost as great in Texas.
The 1992-94 fourth grade math score gains of North Carolina and
Texas were identical and first in the nation (+11). The 1990-96
grade 8 test score gains were +17 in North Carolina (rank 1) and +12
in Texas (rank 2). (The changes in 4th grade reading scores were
statistically insignificant for both states.) The Texas case is
noteworthy because in 1996 the commission rated Texas dead last in
efforts to assure a high quality, professional teaching work force.
(On a ten point scale, Texas received a score of zero, tying it with
two other states for the lowest score in the nation.) How is it that
a state that in the commission's judgment devotes so little
attention to teacher quality has posted gains that essentially match
those of the showcase state and that exceed 47 other states? If the
North Carolina experience sheds light on the hypothesis, so does
that of Texas.
Moreover, even for the states which are plotted, the commission
is very selective about what part of a state's experience counts and
what part does not. Consider Connecticut, the other showcase state.
In 1996 the commission gave it a rating of 3 (on a ten-point scale)
for its efforts to assure a high-quality, professional work force.
This below-average rating reflected the state's poor performance
with respect to several of the commission's criteria. The proportion
of teacher education programs accredited by NCATE was just 13%, one
of the lowest in the nation. As of 1996 Connecticut provided no
incentives for teachers to obtain National Board certification and
had not established an independent professional board. In its 1997
report, however, the commission has ignored all of this, lauding the
state for enacting very large increases in teacher pay combined with
"performance-based" teacher exams. Thus, the Connecticut experience
is taken to support the NCTAF's agenda. One wonders: if Connecticut
test scores had fallen would they have been used as evidence in
favor of NCATE-accreditation, professional boards, and National
Board incentives?
On the other hand, the commission faults Georgia, where test
score changes were below-average. Yet Georgia's 1996
professionalization score was above Connecticut's (4 versus 3). The
state's proportion of NCATE-accredited programs is 53%. Georgia has
created an independent professional board and provides incentives
for National Board certification. Yet we are told that Georgia did
not make the right investments in teacher professionalization.
We could give many other examples of selective use of the
evidence and ex post rationalization of the data. However, a more
scientific approach is to examine all of the state-level NAEP data
to determine whether there is any relationship between the
commission's proposals and student achievement.
As noted, in its 1996 report the NCTAF
rated each of the 50 states on its efforts to assure a high quality
workforce. The resulting state report card, entitled "Indicators of
Attention to Teacher Quality," purports to measure how much each
state has done to implement reforms the NCTAF considers important
for professional quality. Like other features of the commission's
reports, these indicators have been taken up by the media. For
example, many of these "professionalization" indicators are now used
by Education Week in its widely-publicized Quality
Counts annual survey. Given that the NCTAF has rated all states'
efforts, we examine whether there is a relationship between these
ratings and student achievement when all states are included in the
analysis.
First, however, a caveat. Many of the state-level policies on
which these ratings are based have only recently been implemented
and will take some time to affect aggregate test scores. We do not
believe that comparing state-level NAEP scores to NCTAF ratings of
state reform efforts constitutes a powerful test of the hypothesis
that these reforms are valuable. We undertake the analysis solely
because the commission has claimed that evidence from NAEP supports
its recommendations. Our goal is to determine whether this
conclusion holds when data from all states are considered rather
than from a select group of states.
In Table 3 we present results from a regression of state average
NAEP scores on the ratings in the NCTAF state report card. Column
(1) reports simple regression coefficients and column (4) reports
the estimated regression coefficients in a model which controls for
the student poverty rate in the state. The dependent variables in
the models are reading and math test score levels (rows 1-3) and
changes between the 1992 and 1994 administrations of these tests
(rows 4-6). In none of the 12 sets of estimates do the NCTAF's
ratings of states have a statistically significant relationship with
NAEP outcomes.
Table 3
Regression Coefficients: Student Test Scores on
NCTAF State Scores (p-values in parenthesis)
|
Dependent Variable |
1996 State
Score |
1996 State
Score* |
|
NAEP 1994 Reading
Grade 4 (n=39) |
1.053
(.20) |
.329
(.62) |
|
NAEP 1996 Math
Grade 4 (n=44) |
1.190
(.16) |
.362
(.60) |
|
NAEP 1996 Math
Grade 8 (n=41) |
1.725
(.11) |
.508
(.51) |
|
NAEP Reading
Grade 4 , Change
94-92 (n=37) |
.373
(.20) |
.397
(.19) |
|
NAEP Reading
Grade 8 , Change
96-92 (n=36) |
.101
(.68) |
.049
(.85) |
|
NAEP Math
Grade 8, Change
96-90 (n=31) |
.348
(.32) |
.323
(.39) |
*These regressions control for student poverty rate
in the state.
sources: NAEP test data from various volumes of National Report
Card, published by the National Center for Education Statistics
(U.S. Department of Education), NCTAF teacher quality state score
from Doing What Matter Most, appendix A.
Conclusion
The NCTAF proposes an agenda which would shift considerable
regulatory power out of the public domain and into the hands of
private professional education organizations. Unlike markets for
medical or other professional services, most education consumers
(parents and children) have little choice as to their teachers or
schools. Given such a captive market, the potential for harm from
such a policy cannot be ignored. The burden of proof is on the
commission to make a convincing case that such a change will improve
the performance of schools, rather than simply promoting the
interests of education producers.
This burden has not been met. The commission's latest report,
Doing What Matters Most, reviews several studies from the
education production function literature to make the case that
teacher expertise matters. The commission also turns to the NAEP for
findings that contrast teachers who are "highly qualified" with
those whose qualifications are "substandard." Yet on close
inspection, it turns out that the evidence on expertise and
qualifications offers little support for the commission's specific
proposals. In particular, the evidence that teachers' effectiveness
is enhanced by advanced degrees earned in schools of education is
very weak. By contrast, the data establish more clearly that it is
important to recruit teachers of above average general intelligence
and academic ability. This pattern is at odds with the NCTAF's
reform agenda, which would lengthen teachers' pre-service training
but offers little to attract more capable people into the
profession.
Notes
1. This paper, published in the Government Union
Review, is available on-line at www.psrf.org/doc/v174_art.html.
2. A similar situation arises in the pharmaceutical
industry, for example. The Food and Drug Administration routinely
relies on firms that develop new drugs to test the efficacy and
safety of those drugs. Any sign that the industry deviates from
recognized scientific procedure (e.g., carefully controlled clinical
trials) in this process immediately calls the wisdom of this policy
into question and leads to proposals that the FDA itself assume
testing as part of its regulatory functions. A still closer analogy
is the self-regulation of the medical profession. If the AMA and the
medical specialty boards were not careful and even-handed in their
treatment of data that underlie medical protocols, it is doubtful
that elected officials would (or should) continue to entrust these
organizations with the accreditation of medical schools and the
development and administration of licensure and certification tests
for physicians.
3. An examination of ACT and NTE scores by
institution in Missouri, NTE scores Pennsylvania and Virginia, and
state mandated teacher exams of basic verbal and communications
skills in Massachusetts finds that many institutions in the lower
quartiles are NCATE-accredited. See Ballou & Podgursky
(1999).
4. To be fair to the commission, some of the
studies it describes have concerned teaching methods. For example,
reviewing Cohen and Hill's (1998) report on California's new
mathematics assessment, the commission writes: "[T]eachers who
participated in sustained professional development based on the
curriculum they were learning to teach were much more likely than
those who engaged in other kinds of professional development to
report reform-oriented teaching practices. These practices and this
professional development participation were, in turn, associated
with higher mathematics achievement for students on the state
assessment..." But even here, the policy implications are narrow.
The new assessment differed substantially from the old. Teachers who
had more training to prepare their students for it altered their
teaching methods and saw their students do better. Students who were
not so prepared were more likely to be bewildered by the new exam.
Thus the research shows that professional development can be
efficacious. This is not a trivial finding, and it supports the
commission's contention that professional development can be better
designed than at present. But it does not establish that the
"reform-oriented teaching practices" would have been superior had
there been no change in the exam or that the new assessment was a
better measure of student accomplishment than the old. Indeed,
California has since abandoned both the new test and the mathematics
curriculum on which it was based.
5. The claim that teacher attributes explain more
than 90% of the variance in student test scores should not be
accepted at face value, however. The investigators did not have
individual student test scores for their study, and relied instead
on average scores in each of the schools. Since there were only
eight schools in the study, the aggregation of scores to the school
level removed most of the initial variation in the data. Moreover,
by matching each teacher with the average test score for the school,
the investigators produced a textbook case of an over-fitted model:
eight distinct test scores, regressed on the characteristics of 186
different teachers. This explains why the explanatory variables
account for such an extraordinary proportion of the variation in
test scores.
6. As noted above, this 40% figure has been widely
cited. For example, it appears in an article on NCTAF research in
the Educational Researcher (Darling-Hammond, 1998). The
School Board News reports: "[Doing What Matters
Most] quotes a Texas study that found that teachers'
expertise accounted for 40 percent of the difference in mathematics
and reading achievement - more than any other factor." (School
Board News, 11/25/97, p.2)
7. We are indebted to an anonymous referee for this
point.
8. About 50% of all teachers now hold master's
degrees. The standard deviation of the proportion with an MA in the
GHL data was 10%. The authors have apparently calculated that by
spending an additional $500 per pupil per year, it will be possible
to pay for an MA for every teacher. This represents an increase of
five standard deviations in the independent variable. Hence, they
simply multiply the coefficient on the standardized regressor by 5.
This is a dangerous procedure. They are extrapolating far outside
the observed data and assuming that the estimated linear
relationship still holds. In fact, the data tell us very little
about the impact on student test scores that would result from such
an extreme increase in one independent variable.
9. This equals the "full analysis" coefficient from
GHL (Table 6, p. 378) times an increase of five standard deviations
in the percentage of teachers with master's degrees (.0003 x 5). The
cost of this increase in teacher education, on GHL's reckoning, is
additional $500 per student in teacher salaries. See fn. 8
above.
10. We have not reproduced the part of the table
that compares NAEP scores on the basis of teaching practices, since
practices used in the classroom are influenced by the ability level
of the students. A simple comparison of means without controls for
student ability therefore tells us little about the efficacy of
these teaching methods.
11. Even if the NAEP results had been as favorable
to the commission as the NCTAF's report claims, it is far from clear
what a comparison of means would actually establish. Tables 1 and 2
include no controls for student background or ability. If it is the
poorest schools that have to hire teachers with substandard
credentials, it is wrong to attribute the whole difference in scores
to the credential per se. In addition, the NAEP is a measure of
cumulative student learning through the fourth and eighth
grades, whereas the information collected on teachers pertains to
the instructor in the year in which the assessment was
administered. It is therefore far from clear how much weight should
be placed on the answers these teachers give.
12. We do not regress the NCTAF teacher quality
measures at the bottom of the table on the state scores since the
teacher quality measures are used in the computation of the state
scores.
Appendix
Decomposing Explained Variation in a Multiple Regression
Model
In his study of Texas school districts, Ferguson estimates the
following regression model (we suppress the subscripts for each
district and express all variables in deviations from their
means):
T5 = b1 T3 + b2 HFF + b3 CS +
b4 TQ + e (1)
where T5 is the fifth grade math test score for the school
district, T3 is the third grade math score, HFF is a vector of home
and family factors (including community characteristics), CS is
average class size in the district, TQ is teacher quality (average
literacy test score for teachers and average teacher experience and
education in the district), and e is the residual.
Total variation is decomposed as follows:
V(T5) = b12 V(T3) +
b22 V(HFF) + b32 V(CS) +
b42 V(TQ)
+ 2b1b2 C(T3,HFF) +
2b1b3 C(T3,CS) + 2b1b4
C(T3,TQ)
+ 2b2b3C(HFF,CS) +
2b2b4 C(HFF,TQ)
+ 2b3b4 C(CS, TQ) + V(e) (2)
where V(.) denotes variance and C(.) covariance. The sum of the
right hand terms except V(e) equals the explained variation. The
presence of the covariance terms accounts for the familiar problem,
mentioned in the text, that it is not possible to uniquely decompose
explained variation into a part due to HFF and a part due to CS,
etc. Only in the special (and empirically irrelevant ) case where
all of the sample covariances are zero will this be possible.
What, then, does the commission report? Professor Ferguson has
provided a copy of the printout which the commission used in its
calculations. In that printout, he calculated the following terms:
b1 SD(T3), b2 SD(HFF), b3 SD(CS),
b4 SD(TQ), where SD(.) refers to the standard deviation
of the regressor (hence, SD(.) = V(.).5). Ferguson
correctly labels these products as the effect of a one standard
deviation change in the regressor on fifth grade test scores. The
commission took these estimates, ignored the T3 value, summed the
values for HFF, CS, TQ (apart from an error, noted below) and
computed the ratio of each term to the totals. This is what they
report as "Proportion of Explained Variance in Math Test Score Gains
(from Grades 3 to 5)." For example, they compute the contribution of
TQ as follows:
Contribution of TQ to total Explained
Variation = |
|
b4 SD(TQ) |
|
|
b2 SD(HFF) + b3 SD(CS) +
b4 SD(TQ) |
There are numerous errors in this procedure. First of all, the
denominator of the preceding expression omits the most important
explanatory variable of all -- students' scores on tests
administered the preceding year. Because scores are highly
correlated over time, by omitting this variable from the exercise
the commission dramatically increases the percentage of the "total"
due to teacher expertise. Moreover, by failing to make clear that a
pre-test score was included among the explanatory variables, the
commission obscures a principal reason that students' family
backgrounds are not more important in the analysis. Since family
background is largely stable from one year to the next, its impact
on student achievement is captured already in the pre-test score.
Had the original regression equation omitted the pre-test score, the
measured effect of family background would have been substantially
greater.
Possibly, however, the commission assumed that the regression in
equation (1) can be reinterpreted as a gain score regression (i.e,
with T5 - T3 as the dependent variable) simply by dropping terms
involving T3 from the right side of the equation. Yet this does not
save commission from error. First, respecifying the dependent
variable as T5-T3 would change the coefficient estimates
(b1, b2, etc.). Converting to gain scores is
not done as simply as the commission seems to believe.
In addition, the variance of T5-T3 is not simply V(T5)-V(T3) but
involves C(T5,T3), which has not been accounted for. Thus, even if
the regression model had been run with T5-T3 as the dependent
variable (a gain score), the above calculation would be incorrect as
it ignores the covariance terms in equation (2). Moreover, even if
the covariance terms were all zero, the ratio should be based on
variances and not standard deviations.
Finally, apparently due to an oversight, in performing these
calculations the proportion of teachers with an advanced degree was
omitted from the set of teacher quality variables (TQ) and included
instead with the home and family factors (HFF).
References
Armour-Thomas, E., et al. 1990. An Outlier Study of Elementary
& Middle Schools in New York City. Unpublished
manuscript.
Ballou, D. & Podgursky, M. (1997). Reforming teacher training
and recruitment. Government Union Review, 14 (4):
1-53.
. (1999). "Are NCATE Teachers Better?" Unpublished
manuscript. University of Missouri.
Bishop, J. (1994). "Signaling, Incentives, and School
Organization in France, the Netherlands, Britain and the United
States: Lessons for Education Economics." (Working Paper
#94-25). Ithaca NY: Center for Advanced Human Resource Studies, New
York State School of Industrial and Labor Relations, Cornell
University,.
Cohen, D. & Hill, H. (1998). Instructional Policy and
Classroom Performance: The Mathematics Reform in California.
Philadelphia: Consortium for Policy Research in Education, CPRE
Research Report Series RR-39.
Darling-Hammond, L. & Berry, B. (1998). "Reforming teaching:
Another view of why and how." Opportunity, forthcoming.
Ferguson, R. (1991). Paying for public education: New evidence on
how and why money matters. Harvard Journal on Legislation 28:
465-498.
Ferguson, R. & Ladd, H. (1996). How and why money matters: An
analysis of Alabama schools." In H. Ladd, ed., Holding Schools
Accountable. Washington, DC: Brookings.
Greenwald, R., Hedges, L. & Laine, R. (1996). The effect of
school resources on student schievement. Review of Educational
Research 66(3): 361-396.
Hanushek, E. (1996). A more complete picture of school resource
policies. Review of Educational Research 66(3): 397-409.
Hirsch, E.D. (1996). The Schools We Need and Why We Don't Have
Them. New York: Doubleday.
National Commission on Teaching and America's Future (NCTAF).
(1996). What Matters Most: Teaching for America's Future. New
York: Author.
National Commission on Teaching and America's Future (NCTAF).
(1997). Doing What Matters Most: Investing in Quality
Teaching. New York: Author.
Richardson, J. (1994). Two foundations create national panel on
teaching. Education Week (November
23).