Urban Schools
One of the things I like most about Matt Yglesias’s blog is that he likes to use data. He has an interesting post up in which he makes the point that New York, Boston and DC schools all show much lower performance on standardized tests than the national average, but that if you only consider students eligible for subsidized lunches, then New York and Boston score almost the same as the national average for such students, while DC does much worse. He concludes that “once we control for demographics”, New York and Boston have school systems that are “doing fine”, while DC’s is failing.
Of course, an alternative conclusion is that if one simple adjustment for differing populations seems to lead to such a massive re-ordering of estimated school system performance, maybe there are other population differences that are also important in assessing performance. Maybe, in other words, the low-income populations of Boston, New York and DC vary from one another, and these differing test scores are not caused by differences in school practices, but instead by these population differences. We might have to start adding other “controls” into the analysis. This road eventually leads to the kind of sophisticated hierarchical modeling which probably reached its apotheosis almost 20 years ago with Chubb and Moe’s seminal analysis of school performance. This is, to put it mildly, quite a bit more involved than a single-factor adjustment using three data points. Even this analysis has been subjected to sustained methodological criticism. The phenomenon is too complex for us to determine causality with such techniques. This is why education research increasingly uses controlled experiments, analogous to clinical drug trials, to determine effectiveness of educational methods.
This post seems to me to be on solid ground when cautioning against drawing facile conclusions (that usually confirm pre-existing beliefs) from data, but is less convincing in establishing the relative performance of New York, Boston and DC public schools.
The blogger Audacious Epigone did a clever study to try to figure out which states’ public schools are adding the most value, given the capabilities of the children they have to work with. His goal was to look at the relative job the schools were doing instead of the usual look at the absolute job. So, he looked at federal NAEP scores for 8th graders in 2007 by state vs. NAEP scores by state for 4th graders in 2003.
This makes for something of an apples to apples comparison, although migration in and out of states and just plain sampling error will affect the results.
(You could also look at 12th grade scores, but states with, all else being equal, higher high school dropout rate would wind up looking better, so he skipped that.)
The results were interesting but no particular patterns emerged. Many of the states near the top of the relative list of self-improvement were ones that score well on the absolute list as well — Massachusetts, Montanta, and North Dakota. The bottom of the relative list was West Virginia, which has the lowest achieving whites in America by far. But also down there were North Carolina and Connecticut.
But #1 in terms of self-improvement from 2003-2007 was a surprise — the uniformly maligned District of Columbia! I have no idea how to explain this. Perhaps DC has terrible K-4 schools and better 5-8 schools? Perhaps the home environment in DC is so bad that the longer they are in school, the better they get? Or maybe DC schools really aren’t as bad as they are made out to be? Keep in mind that the white people of D.C. are far better educated than the whites of any state, so they may have unrealistically high standards.
Anyway, enough speculation, if you are interested, you should check out Audacious Epigone’s work:
http://anepigone.blogspot.com/2007/11/state-rankings-by-naep-improvement-from.html
— Steve Sailer · Jun 25, 06:59 AM · #
A phenomenon that is little understood outside of actual school systems but well understood within them is that, if you go simply by metrics of standardized testing and grades, there aren’t any consistently “good” teachers out there— educational testing data for individual teachers varies wildly from year to year. A teacher whose class received the best test scores and notched the highest improvement will routinely have the class with the worst the following year. Happens all the time.
Of course, this makes perfect sense, if one realizes that educational output is vastly more indicative of student performance than teacher performance. A teacher’s class will vary a great deal from year to year in aptitude and you will naturally see their test scores vary accordingly. And teachers who are thought to be the best educators by their peers and administration can easily have the worst test scores and change in test scores.
This is the 800-pound gorilla in discussions of merit pay and other incentives designed to increase test scores; they are ultimately a capricious way to sort job performance because the selection error is so great. People tend to say “let’s reward the best teachers”; but when the data swings so wildly from year to year— from class to class— it makes it close to impossible to select “top performers”.
— Freddie · Jun 25, 11:21 AM · #
Freddie:
That may be true for the big system-wide standardized tests, but you can measure progress much more closely with more frequent testing designed to be a diagnostic tool for pedagogy – to let you know what the class, and each individual student, has mastered and what not. The big, system-wide standardized tests are used primarily for sorting, a legitimate purpose but a very different one.
I’m the chair of the board of a charter school in Harlem. We give our teachers bonuses, and those bonuses are based on performance, measured according to a large number of metrics, some absolute and some value-added. We put a great deal of energy into designing tests that will measure what we are trying to measure, and use them actively all through the year to improve instruction. I feel pretty good about our ability to measure what it is our teachers and students are achieving.
By contrast, NYC measures schools by gross measures of performance that contain a lot of noise. Demographic change in a school can wildly change results; as well, there’s a system-wide problem of turning instruction into test-prep, which can yield short-term gains but long-term problems.
This deserves a longer discussion, but the real indictment is not of the idea of measuring teacher performance – do you really believe that what a teacher does, unlike a stonemason or a bond trader or a physician, cannot be measured as to quality? – but the ability of a large bureaucracy subject to political pressure to do the job of measuring both objectively and accurately.
— Noah Millman · Jun 25, 12:06 PM · #
Steve:
Thanks for the pointer. I remember reading that post. As you know, this is in edu-jargon “value-add”. In the ongoing reinvention of statistical concepts in different contexts, it is almost always superior to use such longitudinal dta, rather than cross-sectional data. You highlight one of the may reasons why longitudinal data is insufficient to establish causality in this case.
Freddie:
Interesting point. The software company that I started actually focused on exactly the isolation of causal signal in high-noise environments such as this, so I know how difficult measurement can be. Even worse, it’s very easy to get false “measurements” that look convincing. We’re all to some extent prisoners of our experiences, and I suspect this is why I’m so skeptical of claims of evidence of causality in data.
Noah:
I agree that, in the end, we are reposnsible to judge performance, and that in the right context, this must be achievable.
— Jim Manzi · Jun 25, 09:42 PM · #
What America needs are independent K-12 testing organizations, similar to what ETS and ACT do for college admissions, to get around conflict of interest. The NCLB told every state to make up their own test, administer it themselves, grade it themselves, and then report back to the feds on how they’re doing. The result: Mississippi has the highest scoring students in America! (Well, at least according to Mississippi’s test, which is what the NCLB looks at).
— Steve Sailer · Jun 26, 06:03 AM · #
Steve:
What’s the incentive for elementary, junior or senior high schools to use such an independent test? That’s the question that needs to be answered before you can posit the existence of such institutions.
Also: just because an institution is independent doesn’t mean it can’t have pervasive conflicts of interest, with catastrophic consequences. Take a look at the ratings agencies’ role in the current credit crisis.
— Noah Millman · Jun 26, 01:19 PM · #
Steve, Noah:
I think the need for info and the need for incentives to use the info are a classic chicken-and-egg problem, and we need both. Here’s how I put it in an article a few months ago in NR:
— Jim Manzi · Jun 26, 09:04 PM · #