The American Scene

An ongoing review of politics and culture


Articles filed under Economics


Are Wisconsin Public Sector Workers Underpaid?, ctd.

Ezra Klein has responded to my post in which I argued that the EPI study that claimed to show that Wisconsin public sector workers were underpaid is unpersuasive. His response begins with this:

Jim Manzi has posted a critique of the Economic Policy Institute’s study (PDF) suggesting that Wisconsin’s public-sector workers are underpaid relative to their private-sector counterparts. It basically boils down to the argument that this sort of thing is hard to measure. The study controls for most every observable worker characteristic that we can imagine controlling for.

But my basic criticism was that it fails to control for lots of plausible, common-sense differences. That is, that the study doesn’t control for all the characteristics we can imagine, but rather, some of those for which we happen to have data.

Klein is correct to say that my post “basically boils down to the argument that this sort of thing is hard to measure.” But he then argues that the purpose of the original study was not to demonstrate that public sector workers are underpaid, but rather to rebut the claim that they are overpaid:

[T]he EPI study is aimed at a very specific and very influential claim: that Wisconsin’s state and local employees are clearly overpaid. It blows that claim up.

That may have been the author’s motivation, but here is the final conclusion of the executive summary of the report:

[P]ublic sector workers in Wisconsin earn less in annual or hourly compensation than they would earn in the private sector.

The report makes a positive claim that it has determined a compensation “penalty” for working in the public sector, and repeats it many times. My argument was that this report does not establish whether or not this claim is true.

By the same logic, it also fails to “blow up” the claim that Wisconsin’s public workers are overpaid. The methodology is inadequate to the task of establishing whether these workers are overpaid, underpaid, or paid perfectly. As the last paragraph of my post put it:

I don’t know if Wisconsin’s public employees are underpaid, overpaid, or paid just right. But this study sure doesn’t answer the question.

Statistician and political scientist Andrew Gelman has a very interesting response to my post, in which he agrees that this conclusion “sounds about right,” but cautions that the study is not “completely useless either” because this kind of adjusted comparison is better than simply comparing raw averages between public and private sector workers. I agree with that entirely. But that is, of course, a very different thing than saying that these adjustments create sufficient precision to support the bald statement, made in the report, that the author has analytically established that there is a “penalty” for working in the public sector.

(Cross-posted at The Corner)

Are Wisconsin Public Employees Underpaid?

Ezra Klein and a variety of other thoughtful liberal bloggers have been pointing to an Economic Policy Institute analysis that they claim demonstrates that Wisconsin’s public employees, even after adjusting for benefits and hours worked, face a ” compensation penalty of 5% for choosing to work in the public sector.” Unfortunately, when you get under the hood, the study shows no such thing.

Klein links to an executive summary to support his claim, but reading the actual paper by Jeffrey H. Keefe is instructive. Keefe took a representative sample of Wisconsin workers, and built a regression model that relates “fundamental personal characteristics and labor market skills” to compensation, and then compared public to private sector employees, after “controlling” for these factors. As far as I can see, the factors adjusted for were: years of education; years of experience; gender; race; ethnicity; disability; size of organization where the employee works; and, hours worked per year. Stripped of jargon, what Keefe asserts is that, on average, any two individuals with identical scores on each of these listed characteristics “should” be paid the same amount.

But consider Bob and Joe, two hypothetical non-disabled white males, each of whom went to work at Kohl’s Wisconsin headquarters in the summer of 2000, immediately after graduating from the University of Wisconsin. They have both remained there ever since, and each works about 50 hours per week. Bob makes $65,000 per year, and Joe makes $62,000 per year. Could you conclude that Joe is undercompensated versus Bob? Do you have enough information to know the “fundamental personal characteristics and labor market skills” of each to that degree of precision? Suppose I told you that Bob is an accountant, and Joe is a merchandise buyer.

Even if Bob and Joe are illustrative stand-ins for large groups of employees for whom idiosyncratic differences should average out, if there are systematic differences in the market realities of the skills, talents, work orientation and the like demanded by accountants as compared to buyers, then I can’t assert that either group is underpaid or overpaid because the average salary is 5% different between these two groups.

And this hypothetical example considers people with a degree from the same school working in the same industry at the same company in the same town, just in different job classifications. Keefe is considering almost any full-time employee in Wisconsin with the identical years of education, race, gender, etc. as providing labor of equivalent market value, whether they are theoretical physicists, police officers, retail store managers, accountants, salespeople, or anything else. Whether they work in Milwaukee, Madison, or a small town with a much lower cost of living. Whether their job is high-stress or low-stress. Whether they face a constant, realistic risk of being laid off any given year, or close to lifetime employment. Whether their years of education for the job are in molecular biology, or the sociology of dance. Whether they do unpredictable shift work in a factory, or 9 – 5 desk work in an office with the option to telecommute one day per week.

Keefe claims – without adjusting for an all-but infinite number of such relevant potential differences between the weight-average public sector worker and the weight-average private sector worker – that his analysis is precise enough to ascribe a 5% difference in compensation to a public sector compensation “penalty.”

And his use of the statistical tests that he claims show that the total public-private compensation gap is “statistically significant” are worse than useless; they are misleading. The whole question – as is obvious even to untrained observers – is whether or not there are material systematic differences between the public and private employee that are not captured by the list of coefficients in his regression model. His statistical tests simply assume that there are not.

I don’t know if Wisconsin’s public employees are underpaid, overpaid, or paid just right. But this study sure doesn’t answer the question.

(Cross-Posted at The Corner)

Against the Negative Income Tax

The Manhattan Institute’s City Journal is a reliably excellent magazine, and the current issue is no exception. I found myself disagreeing, however, with Guy Sorman’s article arguing for a Negative Income Tax (NIT).

The NIT, in plain English, is a government-guaranteed minimum level of income. Sorman makes the classic argument that the biggest advantage of the NIT is that it would eliminate almost the entire welfare bureaucracy, and the tangle of often counter-productive programs that go with it. He dismisses the obvious common-sense objection – that the prospect of a lifetime income for doing nothing might discourage people from slapping the top of the alarm clock every weekday morning at 6AM and going to work – with the argument that the NIT could be made progressive in a way that would theoretically encourage work as compared to the current welfare system:

Say the government drew the income line at $10,000 for a family of four and the NIT was 50 percent, as most economists recommend. If the family had no income at all, it would receive $5,000—that is, 50 percent of the amount by which its income fell short of $10,000. If the family earned $2,000, it would get $4,000 from the government—again, 50 percent of its income shortfall—for a total post-tax income of $6,000. Bring in $4,000, and it would receive $3,000, for a total of $7,000. So as the family’s earnings rise, its post-tax income rises, too, preserving the work incentive. This is very different from many social welfare programs, in which a household either receives all of a benefit or, if it ceases to qualify, nothing at all. The all-or-nothing model encourages what social scientists call “poverty traps,” tempting the poor not to improve their situations.

Sorman goes on to reference the famous (among nerds) series of randomized experiments in the 1960s and 70s that tested the effects of the NIT on labor force participation. He says that the results of these experiments were “fuzzy,” but that the first of the experiments showed that the unemployed people who received the NIT grant were “likelier to try to get back to work.”

Sorman is wise to look to these tests of the theory. The power of randomized experiments is that they provide the scientific gold standard method for cutting through the tangle of correlations inherent to these kinds of social questions to create a definitive answer to the question of the causal effects of a tested program on a measured outcome. But I believe his interpretation of the NIT experiments is incomplete, in ways that are fundamental to his argument.

There were four large NIT experiments conducted in the U.S. between 1968 and 1980. These tested a wide variety of program variants among the urban and rural poor, in better and worse macroeconomic periods, and in geographies from New Jersey to Seattle. There was, and is, a lot of scholarly debate about many of the experimental results, and their potential application to policy-making. But there are two consistent findings across this body of experiments.

First, the tested NIT programs reduce number of hours worked versus the existing welfare system. Directly contrary to how I understand Sorman’s claim that the the New Jersey – Pennsylvania experiment showed that the NIT made recipients “likelier to try to get back to work,” this reduction in labor hours was primarily caused by workers remaining non-employed longer if and when they became non-employed.

Second, the tested levels of progressivity of implicit tax rates did not get around this problem by encouraging work, as Sorman’s theoretical argument asserts they should. See for example, Robert A. Moffitt (the subject matter expert cited by Sorman), who has written extensively about why the NIT experiments showed that “lowered tax rates had essentially no effect on labor supply.” Moffitt has, in fact, written a series of papers trying to show why even the theoretical argument that implicit tax progressivity would encourage work was incorrect.

While generalizing from experiments to broad policy is always complicated in social science, true randomized experiments confirm common sense: when it has been put to the test, the NIT reduced work effort.

There was a further series of about thirty welfare experiments done around the time of the welfare debates of the early 1990s. These tested many ideas for improving welfare. What emerged from them was a clear picture: work requirements, and only work requirements, could be shown experimentally to get people off welfare and into jobs in a humane fashion. (I’ll have a lot more about this in the upcoming book.) These experiments were an important input into the central tenet of welfare reform -work requirements – which was one of the greatest conservative domestic policy successes of the past twenty years. It moved us in exactly the opposite direction of the permanent, lifetime dole of the NIT.

But what about the argument that there is an important benefit – namely, the elimination of the welfare bureaucracy and the dog’s breakfast of “food stamps, public housing, Medicaid, cash welfare, and a myriad of community development programs” that according to Sorman’s article accounts for $522 billion of annual federal spending? This is extremely unlikely.

By analogy, the debate in global warming circles sometimes pits a “simple, efficient carbon tax” against the politicized mess of the cap-and-trade bill that emerged from Congress. But in that case, the real difference is not that between a theoretical carbon tax and a theoretical cap-and-trade system; it is the difference between an academic idea that has not yet been subjected to lobbying and legislation, on one hand, and real laws that are the product of a democratic process, on the other. In the same way, there is nothing inherent about an NIT that will prevent Congress from creating thousands of pages of special rules, exemptions, tax expenditures and so on, that are collectively just as convoluted as the current welfare system. After all, “tax each person a given fraction of income” is a pretty simple idea too, but look at the 2011 federal income tax code.

And the likely maintenance or reemergence of the functional equivalents of many of these programs isn’t just the result of cynicism, but of healthy intuitions of natural justice that are essential to maintaining a well-functioning political order. As one example, do we really think it’s a good idea to further disconnect work and income? As another, if part of the motivation for giving adults income is that they spend it supporting their children, would we really allow parents receiving taxpayer money to spend it any way they want with no requirements for child welfare beyond child abuse laws? And as another, a huge portion of the costs of the list of programs Sorman provides is healthcare. Suppose we gave every adult in America an annual grant of $10,000, and some person who did not buy health insurance with it got sick with an acute, easily-treatable condition. Would we really bar them from any urgent medical care and just say “tough luck, but it’s time to die”? Even if you think this would be a desirable public policy, it’s not plausible that the existence of an NIT would somehow change the political calculus enough to make it substantially more of a reality than it is today.

Inevitably, and justifiably, the taxpayers who will have to work more overtime, take shorter vacations and eat out less in order to come up with the taxes to pay for an NIT or any other welfare system, will demand some degree of accountability from the recipients; which will, in turn, require monitoring, enforcement, adjudication, and the other manifestations of a welfare bureaucracy. As I put this in a recent article in National Review:

This basic idea arose again and again during the 1960s and 70s. Examples of such proposals advanced by the political right and left included Milton Friedman’s negative income tax (1962), Robert Theobald’s guaranteed income (1965), James Tobin’s guaranteed income plan (1965), R.J. Lampman’s subsidy plan (1967), Edward Schwartz’s guaranteed income, the negative income tax plan of President Johnson’s Income Maintenance Commission (1969), President Nixon’s Family Assistance Plan (1969), George McGovern’s $1,00-a-year plan (1972), and HEW’s Income Supplementation Plan (1974). This idea constantly arises even today in academic discussions, and is being pursued seriously by the current coalition government in the U.K. The key reason we haven’t implemented such a scheme in the U.S. is that we are afraid that many of the recipients will not work, and then blow the money on Cheetos, beer and big-screen TVs. In more academic language, the moral legitimization of the welfare system requires that the recipients earn and then use this money in support of a way of living that comports in some rough sense with the idea of the good life held by taxpayers who provide the funds.

The NIT is a fascinating and useful thought experiment, but it’s not a practical public policy.

(Cross-posted at The Corner)

Jim's On A Roll

And all I can do is roll with him.

I’m not especially well-versed in either orthodox or unorthodox economists, but I agree completely on the essential role of the entrepreneur in wealth-generation. Jim takes this point in the direction of reinforcing skepticism about the dominant policymaking paradigm at the moment, focused on things like aggregate demand rather than the congeniality or uncongeniality of the environment for entrepreneurship.

Myself, I’m inclined to believe that if you have a fiat currency, which should mean you can mitigate the impact of commodity-based external shocks on the supply of credit (which is the problem you run into with a commodity-based currency), you really do have to do your best not to create “artificial” shocks by keeping money “too” tight or “too” loose. That, in turn, is a problem “orthodox” economics is pretty much organized around trying to solve.

I want to take his point in a different direction. He and I may disagree about central bankers. But what about banker bankers.

If entrepreneurs are the motor that drives economic progress, capital is the fuel. And banks are (still) the central institution in the economy responsible for supplying that fuel. Their job is (still) to take savings and turn it into capital – and then allocate that capital.

But how do they allocate that capital?

Well, the dominant investing paradigm falls prey to exactly the mistake that Jim highlights in his post: the confusion of risk and uncertainty. Indeed, pretty much all of modern finance theory and practice is devoted to the project of measuring and managing risk. This professional predisposition to fear uncertainty and focus on risk is reinforced and magnified by our regulatory paradigm.

What this has meant in practice is that banks (and other institutional investors) all use the same tools to construct the same kinds of “optimal” portfolios so as to minimize risk. Of course, all this does is squeeze all the profit potential out of these kinds of portfolios. Competition, then, drives these same investors to use additional leverage and more sophisticated “risk-management” techniques to squeeze more profits out of this same compressed investment space. And so the bubbles grow until they burst.

Fifty years ago, banking was a sleepy profession, with easy hours, comfortable but not extraordinary salaries, and a professional culture that was anything but entrepreneurial. Now, it’s an extremely aggressive culture, but all of that aggressive, competitive energy goes into figuring out better ways to rig the same game. A lot of people think the solution is to go back to what banking was fifty years ago. I’m not sure that’s possible, and I’m pretty sure if entrepreneurship is as important to the economy as Jim thinks it is that it’s not optimal. Rather, we need bankers with more of an entrepreneur’s appreciation of uncertainty.

How to get there from here, though is something about which I’m afraid I’m uncertain.

The Eternal Sunshine of the Entrepreneurial Mind

Tyler Cowen points to an interesting article on how entrepreneurs think, and quotes the following passage:

That is not to say entrepreneurs don’t have goals, only that those goals are broad and—like luggage—may shift during flight. Rather than meticulously segment customers according to potential return, they itch to get to market as quickly and cheaply as possible, a principle Sarasvathy calls affordable loss. Repeatedly, the entrepreneurs in her study expressed impatience with anything that smacked of extensive planning, particularly traditional market research. (Inc.’s own research backs this up. One survey of Inc. 500 CEOs found that 60 percent had not written business plans before launching their companies. Just 12 percent had done market research.)

…Sarasvathy explains that entrepreneurs’ aversion to market research is symptomatic of a larger lesson they have learned: They do not believe in prediction of any kind. “If you give them data that has to do with the future, they just dismiss it,” she says. “They don’t believe the future is predictable…or they don’t want to be in a space that is very predictable.”

While I have some quibbles – for example, many failed entrepreneurs share this frame of mind, and there are other characteristics joined with this one that tend to characterize successful entrepreneurs – the gestalt of this is, in my experience as a technology entrepreneur, exactly right.

At root, what’s so fascinating to me here is the distinction between risk and uncertainty. By “uncertainty,” I mean non-quantifiable lack of predictability. For example, we could roughly say that there is a 50% risk of getting heads if I flip this quarter, but that the probability that Egypt will experience a military coup this month is uncertain – somebody might venture to place odds on it, but it is not reliably quantifiable in the same sense.

I think that this distinction points to a fundamental cleavage in worldviews in economics that turns on the role of the entrepreneur. This meaning of the term uncertainty is, in fact, often referred to as “Knightian” uncertainty, after the great early twentieth century American economist Frank Knight, who used it to try to explain the role of the entrepreneur.

Entrepreneurs choose to operate in sectors in which uncertainty dominates. This is inherent to what entrepreneurship is. The kind of predictive tools that work well for the U.S. aluminum market don’t work very well when you’re inventing the Software-as-a-Service business model. What works better is trial and error learning, or more formally, experimentation. As an entrepreneur, you throw yourself into an evolutionary competition, and use whatever resources you have to succeed. You don’t believe that you (or anybody else) can predict the multi-step game in advance.

There is a heterodox tradition of economists who focus on the centrality of these issues for the long-run growth of the economy. Frank Knight, Joseph Schumpeter, F.A. Hayek, Vernon Smith and Douglas North are obvious examples. This focus leads to an emphasis on uncertainty, experimentation and evolution, and stands in contrast to the currently-dominant paradigm within university economics departments of risk, quantification and equilibrium.

I believe that entrepreneurship, broadly defined, is central to economic growth, and that determining public policy using economic models that inherently under-emphasize this is a very bad idea. Professional economists, in my view, have a class interest in obscuring this. One that is as powerful as the class interest of entrepreneurs in conflating luck and skill.

(Cross-posted at The Corner)

P.S. As commenter Jeff Singer points out, blogger / economist / entrepreneur Arnold Kling writes about this kind of thing all the time, and with great insight. If this topic interests you, you should be reading his blog.

Re: Evaluating Teachers

Jon Chait has reacted to my post on teacher evaluations, but I don’t think that he was responding to my actual claims. He seems to think that I was arguing that quantitative measurements of teacher performance (“value-added” metrics) are not achievable, and that therefore we should not use them. My actual argument was that we need such systems, but that we should be realistic about what is required to make them work.

Chait quotes a long illustrative dialogue that I used to show some of the problems that often arise from trying to use a complex regression model to measure employee performance in a corporate setting. But the sentences in my post that immediately follow the quoted dialogue are:

Not all attempts to incorporate rigorous measures of value-added fail. Let me make some observations about when and how workable systems that do this tend to be designed and implemented.

And later in the post, I also say that:

More serious measurement of teacher performance, very likely including relative improvement on standardized tests, will almost certainly be part of what an improved school system would look like.

My post wasn’t about if we should use quantitative measures of improvement in their students’ standardized test scores as an element of how we evaluate, compensate, manage and retain teachers, but rather about how to do this.

Two of the key points that I tried to make are that the metrics themselves should likely be much simpler than those currently developed by economics PhDs, and that such an evaluation system is only likely to work if embedded within a program of management reform for schools and school systems. The bulk of the post was trying to explain why I believe these assertions to be true.

An additional point that I mentioned in passing is my skepticism that such management reform will really happen in the absence of market pressures on schools. Continuous management reform, sustained over decades, that gets organizations to take difficult and unpleasant actions with employees is very hard to achieve without them. There’s nothing magic about teachers or schools. The same problems with evaluation and other management issues that plague them arise in big companies all the time. It’s only the ugly reality of market discipline that keeps them in check.

(Cross-posted at The Corner)

Some Unsettling Observations about Teacher Evaluations

Recently, Megan McArdle and Dana Goldstein had a very interesting Bloggingheads discussion that was mostly about teacher evaluations. They referenced some widely-discussed attempts to evaluate teacher performance using what is called “value-added.” This is a very hot topic in education right now. Roughly speaking, it refers to evaluating teacher performance by measuring the average change in standardized test scores for the students in a given teacher’s class from the beginning of the year to the end of the year, rather than simply measuring their scores. The rationale is that this is an effective way to adjust for different teachers being confronted with students of differing abilities and environments.

This seems like a broadly sensible idea as far as it goes, but consider that the real formula for calculating such a score in a typical teacher value-added evaluation system is not “Average math + reading score at end of year – average math reading score at beginning of year,” but rather a very involved regression equation. What this reflects is real complexity, which has a number of sources. First, at the most basic level, teaching is an inherently complex activity. Second, differences between students are not unvarying across time and subject matter. How do we know that Johnny, who was 20 percent better at learning math than Betty in 3rd grade is not relatively more or less advantaged in learning reading in fourth grade? Third, an individual person-year of classroom education is executed as part of a collective enterprise with shared contributions. Teacher X had special needs assistant 1 work with her class, and teacher Y had special needs assistant 2 working with his class – how do we disentangle the effects of the teacher versus the special ed assistant? Fourth, teaching has effects that continue beyond that school year. For example, how do we know if teacher X got a great gain in scores for students in third grade by using techniques that made them less prepared for fourth grade, or vice versa for teacher Y? The argument behind complicated evaluation scoring systems is that they untangle this complexity sufficiently to measure teacher performance with imperfect but tolerable accuracy.

Any successful company that I have ever seen employs some kind of a serious system for evaluating and rewarding / punishing employee performance. But if we think of teaching in these terms – as a job like many others , rather than some sui generis activity – then I think that the hopes put forward for such a system by its advocates are somewhat overblown.

There are some job categories that have a set of characteristics that lend themselves to these kinds of quantitative “value added” evaluations. Typically, they have hundreds or thousands of employees in a common job classification operating in separated local environments without moment-to-moment supervision; the differences in these environments make simple output comparisons unfair; the job is reasonably complex; and, often the performance of any one person will have some indirect, but material, influence on the performance of others over time. Think of trying to manage an industrial sales force of 2,000 salespeople, or the store managers for a chain of 1,000 retail outlets. There is a natural tendency in such situations for analytical headquarters types to say “Look, we need some way to measure performance in each store / territory / office, so let’s build a model that adjusts for inherent differences, and then do evaluations on these adjusted scores.”

I’ve seen a number of such analytically-driven evaluation efforts up close. They usually fail. By far the most common result that I have seen is that operational managers muscle through use of this tool in the first year of evaluations, and then give up on by year 2 in the face of open revolt by the evaluated employees. This revolt is based partially on veiled self-interest (no matter what they say in response to surveys, most people resist being held objectively accountable for results), but is also partially based on the inability of the system designers to meet the legitimate challenges raised by the employees.

Here is a typical (illustrative) conversation between a district manager delivering an annual review based on such an analytical tool, and the retail store manager receiving it:

District Manager: Your 2007 performance ranking is Level 3, which represents 85% payout of annual bonus opportunity.

Store Manager: But I was Level 2 (with 90% bonus payout) last year, and my sales are up more than the chain-wide average this year.

DM: [Reading from a laptop screen] We now establish bonus eligibility based on your sales gain versus the change in the potential of your store’s trade area over the same time period. This is intended to fairly reflect the actual value-added of your performance. We average this over the past three years. Your sales were up 5% this year, but Measured Potential for your store’s area was 10% higher this year, so your actual value-added averaged over 2005 – 2007 declined versus 2004 – 2006.

SM: My “area potential” increased 10%? – that’s news to me. Based on what?

DM: The new SOAP (Store Operating Area Potential) Model.

SM What?

DM: [Reading from a laptop screen] “SOAP is based on a neural network model that has been carefully statistically validated.” Whatever that means.

[Continues reading] “It considers such factors are trade area demographic changes, competitor store openings, closures and remodels, changes in traffic patterns, changes in co-tenancy, and a variety of other important factors.”

SM: What factors are up that much in my area?

DM: [Skipping to the workbook page for this specific store, and reading from it] A combination of factors, including competitor openings and the training investment made in your store.

SM: But Joe Phillips had the same training program in his store, and he had no new competitor openings – and he told me that he got Level 2 this year, even though his sales were flat with last year. How can that be?

DM: Look, the geniuses at HQ say this thing is right. Let me check with them.

[2 weeks later, via cell phone]

DM: Well, I checked with the Finance, Planning & Analysis Group in Dallas, and they said that “the model is statistically valid at the 95% significance level “ (whatever that means), “but any one data point cannot be validated.”

[10 second pause]

Let me try to take this up the chain to VP Ops, and see what we can do, OK?

SM: Whatever. I’ve got customers at the register to deal with. [Hangs up]

Not all attempts to incorporate rigorous measures of value-added fail. Let me make some observations about when and how workable systems that do this tend to be designed and implemented. I doubt these will please either side in the debate.

1. Remember that the real goal of an evaluation system is not evaluation

The goal of an employee evaluation system is to help the organization achieve an outcome. For purposes of discussion, let’s assume the goal of a particular school to be “produce well-educated, well-adjusted graduates.” The question to be asked about this school’s evaluation system is not “Is it fair to the teachers?” It is not even “Does it measure real educational advancement?” Ultimately, all we should care about is whether or not the school produces more well-educated, well-adjusted graduates with this evaluation system than if it used the next-best alternative. In this way, it is like a new training program, investment in better physical facilities, or anything else that might consume money or time.

The fairness or accuracy of the measurement versus some abstract standard is the means; changing human behavior in a way that increases overall organizational performance is the end. To put a fine point on it, if a teacher evaluation that is based on a formula that considers only blood type, whether it is raining on the day of the evaluation and the last digit of the teacher’s phone number is the one does the best job producing better educated and adjusted graduates, then that’s the best evaluation system.

In practice, of course, an effective evaluation system normally has to have some reasonably clear linkage to what we think of intuitively as performance, but clarity about means versus ends helps keep the organization focused. On one hand, it prevents the perfect from being the enemy of the good – all we need to show is that this program is better than its next best competitor for resources to accept that it should be implemented. And on the other hand, it prevents the endless search for theoretical perfection, by constantly forcing this specific cost / benefit test on proposed “enhancements” to any evaluation system. Because there is enormous practical value to employees understanding and accepting the metrics used to evaluate them, this tends to produce evaluations using simpler metrics, even if they are theoretically less comprehensive.

2. You need a scorecard, not a score

There is almost never one number that can adequately summarize the performance of complex tasks like teaching that are executed as part of a collective enterprise. Outputs that can be measured with good precision and assigned to a specific employee, even when using very sophisticated statistical techniques, tend to be localized by time and organizational unit; therefore, evaluation systems that rely exclusively on such measures tend to reward short-term and selfish behavior to an irrational degree. In a business, this usually means that if we rely, for example, only on this year’s financial metrics to reward a salesperson, we will incent him to undermine the company’s brand, give away margin potential, and not work well with other salespeople on big sales projects that are shared and may take years to come to fruition. In some sales forces, this is no big deal, and we can just pay straight commission as a percent of sales, and get on with life. But for, say, most retail chains, it would be long-term disaster to pay store managers only based on that year’s store profits – you’d be likely to end up with a bunch of stores that were poorly maintained, had untrained staff, and ran constant promotional sales targeted specifically to customers who shopped at nearby branches of the same chain (hold the jokes about retailer X that you don’t like). For this reason, most organizations create a so-called Balanced Scorecard for each such employee that combines several financial and several non-financial performance metrics, some of which are almost always involve some degree of management judgment.

It’s not like this concept is alien to all schools. In fact, to most experienced practitioners in just about any relevant field, this is common-sense. But note that the attempt to bundle all of this into a number called “value added” directly contradicts this understanding. It is very unlikely to work.

3. All scorecards are temporary expedients

Beyond this, no list of metrics can usually adequately summarize performance, either. In absolute theory, what we would want to know in a business would be the impact of a given employee’s behavior on company stock price. But we can never really measure that. Instead, we have a bunch of proxies that we believe collectively approximate this. But the attempt to build up such a perspective up as a pure data-analytic exercise always ends up creating some kind of Rube Goldberg system. We have maybe a few tens of thousands of relevant employee data points, and the complexity of a phenomenon that we only understand very partially overwhelms this amount of data.

Normally, an effective balanced scorecard for the kinds of positions I have been discussing is not constructed through such a process. Instead, its design starts with the view that the practical purpose of the evaluation system is to get the employees focused on a combination of basic priorities, plus a few more targeted issues that are the object of current management attention. In this way, the scorecard partially depends on the current strategy of the organization. By example, for a store manager, annual sales would almost certainly be on any scorecard, but warrantee penetration (the percentage of sales in which the store also cross-sells the consumer a warrantee) and percentage of store employees participating in sales effectiveness training might only be on the store manager’s scorecard for one or two specific years for a given retailer, and not at all for another competitive retailer with a different strategy. Beyond this, when their own comp is at stake, any group of thousands of people will always figure out how to outsmart any team of analysts who design the scorecard. That is, they will always figure out how to game the metrics, and get the comp in ways that violate the (often implicit) assumptions that were used to link these metrics to performance improvement. Therefore, it’s very helpful to present a moving target by changing some of the metrics each year. Finally, effective scorecards also tend to have a short list of metrics, since otherwise you have the “anybody with many priorities really has no priorities” problem.

Taken together, these realities – linkage to strategy, avoiding gaming, and the need to have a short list of metrics to capture a very complicated phenomenon – mean that effective scorecards change a lot over time. Once again, they are correctly thought of as a management tool to improve performance, not as some Platonic measure of effectiveness.

4. Effective employee evaluation is not fully separable from effective management

One conclusion of this is that effective teacher evaluation is not fully separable from effective management of those teachers. This statement can be read both directions, and therefore cuts both ways in this debate. The model of “measure and publish a metric for individual teacher value-added, and use a combination of shame, money and external pressure to convert this to improved schools” is not consistent with anything that I’ve ever see work in comparable situations. On the other hand, neither is the argument one often (though not as often as in the past) hears that somehow “teaching is special,” in that reasonable attempts to objectively evaluate teachers – and link these evaluations to material changes in comp, promotions and retention – should not be expected to help the organization improve performance.

So where does this leave us? Without silver bullets.

Organizational reform is usually difficult because there is no one, simple root cause, other than at the level of gauzy abstraction. We are faced with a bowl of spaghetti of seemingly inextricably interlinked problems. Improving schools is difficult, long-term scut work. Market pressures are, in my view, essential. But, as I’ve tried to argue elsewhere at length, I doubt that simply “voucherizing” schools is a realistic strategy.

More serious measurement of teacher performance, very likely including relative improvement on standardized tests, will almost certainly be part of what an improved school system would look like. But any employees, teachers included, will face imperfect evaluation systems, and will have to have some measure of trust in this system and its application. The evaluation system will have some direct linkage to the strategy of the school, and this will have to be at least a decent strategy that has a real shot at improve learning. The evaluation system will have to have teeth, and this means realistic processes that link comp (and probably more important, promotions and outplacement) to performance.

In other words, better measurements of teacher value-added are useful on the margin, but teacher evaluation as a program to improve school performance will likely only work in the context of much better school organization and management.

Still Too Shy And Retiring

This is cute, but what, ultimately, is Yglesias’ point?

Matt’s poking able fun at a beltway-favorite solution – raise the retirement age – to a real problem. And he’s right: that favored “solution” puts the burden of “solving” the entitlements crisis overwhelmingly on those in need: poor elderly with a shorter-than-average life expectancy. Hence his “modest proposal” alternative, that allocates the burden in an obviously unfair and ethically horrible way but that can be similarly dressed up to look progressive.

But what’s his solution?

Based on his larger body of work (his crusade for unlicensed barbers, praise for the relatively unregulated labor markets of the Nordic countries, etc.) I would think Matt’s preferred solution in an If-I-Ran-The-Zoo mode of argument would be to abolish the concept of retirement altogether.

I mean, think about it. Everybody lives off the productive activity of the people (and machines) that do the work. If the percentage of people engaged in that productive activity goes down, either productivity has to go up to compensate or living standards will drop. It doesn’t matter whether those people drop out of the labor force because they are living on Social Security or because they are living off their private savings; it doesn’t matter whether they are spending more years in school or whether they are begging on the streets; it doesn’t matter whether they are children who are not yet able to work or elderly and disabled who are no longer able to work. Obviously, it matters down the road – more education might well increase one’s productivity in the future (though it might just be a waste of time) but begging almost certainly will not; children will (hopefully) grow up to become productive citizens, while the disabled and the elderly will not; etc. But in the moment, it doesn’t matter: those who are not contributing to the production of wealth by work are living off those who are so contributing.

The existence of a “retirement age” provides an incentive for people to plan on dropping out of the labor force – or, more correctly, specific jobs, because many retirees do at least some work in retirement – at a particular point in their life. That may drive a material misallocation of labor resources, as people either stay in jobs too long or drop out too early. How material, I have no idea – I’d be interested to know whether there’s any good research on that point. But the number is bigger than zero, and as the population ages the size of any such effect must be growing.

I’m totally on-board with a social responsibility for those unable to provide for themselves. I’m totally on-board, in fact, with socializing a pretty wide variety of risks – certainly I have no knee-jerk objection to such socialization. But we don’t want to create additional unnecessary incentives to be unproductive, and that applies to all age categories, youth, elderly and the “prime working age” folks in between.

A key component of the real rationale for a “retirement age” is paternalistic: we can’t trust that people will plan properly, so we need to socialize some economic risks, and we can’t afford to provide “social security” – i.e., lifestyle insurance – from cradle to grave. So we compromise: we provide that insurance to the elderly, who are more vulnerable in aggregate and have less time to “make up” for past mistakes in planning, and withhold it from everybody else. And the result, undoubtedly, is some misallocation of labor resources. But the aging of the population makes that misallocation a bigger and bigger problem, which will require us to revisit that compromise in some fashion.

And there are lots of ways to revisit it. The “raise the retirement age” solution is a relatively regressive one, as Matt is fond of pointing out. But we could revisit it by being less-paternalistic and more progressive. For example, we could enact something like a Guaranteed Minimum Income and abolish Social Security entirely, saying, in effect, we’d rather not provide lifestyle insurance at all (if you want a comfortable retirement, then you’d better plan for it) but we don’t want anybody to live in conditions of true poverty, elderly or not. The strongest arguments against such a scheme are, again, paternalistic – that providing a no-strings-attached income to working-age people creates a very bad incentive structure for individuals who are poor planners, a set that overlaps substantially with the set of people living in poverty.

In any event, the point is that the problem is a real one. There are only three general categories of solutions to the problem of too many non-workers living off too few workers: (1) allow living standards to fall (nobody likes this, but it will happen inevitably if nothing else is done); (2) increase the productivity of the existing workforce (easier said than done); and (3) increase the number of workers. This third category again can be broken into three: (3a) import workers (but imported workers generally come with dependents, and the workers need to be comparably or more productive than the existing workforce or you have a productivity problem that will prove a big drag long-term); (3b) increase the birth rate (easier said than done, operates only with a substantial time lag – initially you get more dependents, not fewer – and Matt should favor a slowly declining world population for environmental reasons); or (3c) increase workforce participation rates of the existing population. Raising the retirement age is both a regressive way of achieving (3c) (the poorest elderly will face the most pressure to remain in the workforce longer) and is perceived as a somewhat progressive way to achieve (1) (the elderly in general are wealthier than average, and raising the retirement age reduces the share of national consumption allocated to the elderly).

If Matt has serious, immodest but more progressive proposals to achieve (3c), I think they’d be worth hearing.

A Corporate Version of the "Resource Curse"?

Matt Yglesias points to an interesting post by Ron Burk about cash cow disease, an affliction of corporations with extraordinarily large revenue streams from existing businesses. These corporations frequently wind up “wasting” billions of dollars on new ventures that don’t pan out and that likely would not have been financed if the money had to be raised on the open market. They are able to waste this money because the core business is so profitable that shareholders basically don’t notice – and therefore do not apply the necessary discipline.

It’s a provocative argument. A few thoughts:

First of all, this sounds substantially like an agent-principal problem rather than a problem caused by the mere existence of cash cows. There are lots of private businesses out there that are cash cows – car dealerships, for example. Do owners of private cash cows go in for this sort of behavior? I wonder. It occurs to me that a company that has a true cash cow should actually be very cheap to run. If shareholders knew that, their reaction might not be good for management’s own bottom line. (“Why do we need such an expensive CEO? This company runs itself! It’s a cash cow!”)

Relatedly, the “conservative” (from a corporate governance theory perspective) answer to this kind of problem is to say that, in principle, companies shouldn’t hold material amounts of cash – ever. They should return virtually everything they earn, net of costs, to investors, who, in turn, should expect highly variable but sometimes very large dividends. Then, if the company wants to make a strategic investment (either buying a company or developing a new product), they would either need to raise funds on the market (debt or equity) or explain to shareholders why they weren’t getting their full dividend. We’re obviously very far from such a dividend culture, for a variety of reasons (taxes, agent-principal conflict, the longstanding expectation on the part of investors that dividends will be “regular,” etc). But to an extent, one can understand the problem as described to be a problem not of “cash cows” but more broadly of “cash” – there is a real case for the proposition that public companies should be pretty much “fully invested” at all times, returning any cash not needed to run and grow the business to shareholders.

But I can think of an argument for the kind of behavior the author of the piece observes that might better rationally explain this behavior than as simply an instance of “waste” or as an instance of management acting in its own rather than shareholders’ interests. One justification given for investing in ventures that are expensive and unlikely to bear fruit is that said ventures contain embedded “real business options.” What’s a “real business option”? Well, an option is the right to buy or sell something at a specified price on a specified date. It’s a right, not an obligation. So a call option on a stock is the right to buy a certain number of shares of that stock at a certain price per share (the “strike price”) on a certain date (the “expiration date”). A real business option is some business decision that gives one the option to enter into a future business. Thus, for example, suppose that company A sees a bright future for product Q, something that doesn’t yet exist but which is expected to be possible in the future. But when Q comes into being, it’ll probably develop out of P, which also doesn’t exist yet, but which will probably develop out of M, N or O. So this becomes an argument for getting into the M, N or O business – even at an expected loss. Because the value of the M, N or O business isn’t in that business and its projected revenue stream – its in the option to then develop P, which would make it possible to develop Q.

I think most people would agree that such an option is valuable. What’s really impossible to know is how valuable – that depends on what Q turns out to be worth and how valid this path-dependency theory turns out to be, neither of which can be known in advance. But one of the funny things about options is that, so long as the potential upside payoff is unbounded, they become more valuable the more uncertain the outcome is. Why? Because an option has a limited up-front cost and a potentially unlimited upside. A stock today is worth $10. Tomorrow it’ll be worth either $9 or $11. An option to buy the stock for $10, expiring tomorrow, will either not pay off or will pay off $1. Assuming even odds (the correct risk-neutral assumption if the current price is $10 and those are the only possible terminal prices), the option is worth $0.50. Now assume that tomorrow the stock will be worth either $5 or $15. Tomorrow, that same option will pay out either zero or $5. So that option is worth $2.50 – even though the “expected” value of the stock tomorrow is the same in each case, namely: the current price of the stock, $10. Same thing with real business options. The more uncertain the outcome of the speculative future business – the more plausible the scenario by which it becomes a massive cash cow of its own, even if that also means the odds are higher that it’ll never amount to anything and become a huge cash sink – the more the real business option to enter that business is worth, and therefore the more you should be willing to spend to own it.

So apparently wasting money on ventures that are unlikely to yield adequate returns may be the result of an entirely rational calculation of the value of real business options, particularly in a field like technology where there is plenty of historical evidence of some businesses turning into these game-changing gold mines (at least for a while). But here’s the thing: options are negative cash-flow investments. So the only entities who can afford to invest in them are entities with high positive cash-flow from other sources. In other words: cash cows. Entities with more precarious cash-flow streams, or without positive cash-flow streams at all, can’t afford to engage in options-acquisition as a strategy. They need to make money – now. And they can’t afford to put themselves in a position where they are insufficiently liquid to maintain their option portfolio – because then the investment is truly wasted.

But there’s another reason cash-cow companies might behave this way: to protect the cow. Why did Microsoft invest in Xbox – one of Burk’s main examples – after all? Arguably because they thought they’d make a lot of money on Xbox. A better theory is because they thought being in the game console “space” would give them options on future businesses that could be enormously lucrative. But most likely as a defensive move. If game consoles evolved into the new PCs, and those game consoles didn’t run an OS created by Microsoft, then Microsoft’s cash cow would run dry. The option they were buying wasn’t a call on a new cash cow so much as a put on the one they already had – a way of protecting it from a world in which game consoles replaced PCs, and therefore what mattered was who made the OS for the game console, not who made the OS for the PC.

That drive – to protect the cash cow – is what, in my opinion, frequently makes large companies seem to be lacking in innovation. It’s just rationally playing the odds. The odds of hitting a new cash cow are low. The potentially threats to the existing cash cow are many. It makes more sense to spread money around in various ways that do something to protect that incumbent position – say, shoving Windows into as many devices and layers of the global computing infrastructure as possible, even if the ventures don’t make much sense in their own terms, in the hopes that, if a particular device or layer turns out to be the “next” whatever that Windows is still a player – than to do something completely new (to say nothing of doing anything that actually threatens the cash cow).

But that same kind of motivation can drive innovation. Google needs to maintain its share of minds and eyeballs to maintain its profits. So it is constantly developing new products. Even if the new applications it develops bring in no incremental revenue, if they “shore up” that mindshare position, they may well be worth the investment – particularly if Google’s profits depend on having an overwhelmingly dominant position in their core market. Would these applications have been developed anyway – and better – elsewhere? If they aren’t profitable, it seems unlikely that they would have. Does that mean there’s been a net gain to innovation from Google’s cash cow? Well, what else would these developers have been doing if they weren’t working for Google? How can one even assess that counterfactual?

Here’s one way to think about the question. A cash cow is something that makes outsized profits. Anything that makes outsized profits over an extended period of time is an instance of some kind of failure of the market. It might not be a market failure – it might be a consequence of a regulatory intervention (car dealerships? liquor distributors? taxi medallion companies). Or it might be a market failure (a “natural” monopoly, a “network effect” monopoly, whatever). But the mere existence of large, excess returns over a long period from the same activity – that’s what a cash cow is – is a sign that something weird is going on. Those returns should attract huge amounts of capital to that business to fund competitors who, by their actions, will bring down rates of return to more normal levels. If that doesn’t happen, you have to ask why. The answer to that why might tell you something about whether there might be something problematic about efforts to maintain and extend that cash cow.

But I’d really like to know what Jim Manzi thinks.

Some Measures Of Inequality Are More Equal Than Others

Tyler Cowen has written a really excellent piece in the American Interest about inequality, how it matters, and how it doesn’t. I strongly recommend reading the whole thing, but if I can summarize his two key points:

- On the one hand, it’s not clear that gross inequality across the social spectrum is increasing, and if it is, it isn’t clear that it’s a problem. It may not be increasing because prices have been declining for inexpensive goods even as they have been increasing for luxury goods, so apparently stagnating wages in the bottom four quintiles may actually represent increases in purchasing power, while rising wages in the top quintile may be substantially eaten up by inflation in that quintile’s typical basket of goods. It’s not clear it matters because the overall level of wealth has gone up so much that even relatively poor people in modern America are relatively comfortable on an absolute scale. Moreover, people don’t really worry about the “deserved” wealth earned by Bill Gates or Tiger Woods or J. K. Rowling; they only get annoyed at “undeserved” wealth.

- On the other hand, it is clear that inequality is increasing massively at the very top – between the top 1% of earners and the rest of society, and within the top 1% of earners. Some of that increase in inequality is due to the increased market power of outstanding individuals in a winner-take-all market driven, in turn, by the greater reach of modern communications. This explains Tiger Woods, J. K. Rowling, and, arguably, Bill Gates. We don’t have to worry about that. Most of the increase in inequality at the top, however, is driven by changes in the nature of modern finance, and this we do need to worry about, not so much because of the effect (rising inequality) as the cause (rent-extraction and socialization of risk rather than productivity-increasing innovation). But it’s not clear that we know how to solve this problem.

I have three major comments:

1. My narrowest point, about the “deserving rich.” Cowen makes the point that most people appear to admire rather than despise extremely wealthy individuals like Bill Gates, Tiger Woods and J. K. Rowling who have “earned” their great wealth through great achievement. The far greater rewards that accrue to such extraordinary individuals today rather than thirty or sixty years ago is, he argues, not a matter of general social concern, being the result of changes in technology and impersonal market forces.

I would point out, however, that the state of the law does have some bearing on this question, specifically the state of patent and copyright law. The deserving “winners” that he cites – and, indeed, the ones we usually cite – are “winners” because they own intellectual property that has proven to be enormously valuable. But intellectual property, much more so than other forms, is a creation of the law. It is a form of legal monopoly, granted to innovators. J. K. Rowling, for example, would be far less wealthy if she did not own a robust passel of rights to the characters and stories she created, rights that, while nominally of limited lifespan, in practice appear to be trending toward permanence.

One can argue that such an arrangement is a good one – but it’s not simply the result of “market forces.” The law has changed on these matters since Dickens’ day, and those changes are not irrelevant to the existence of massive intellectual property fortunes. And we should not lose sight of the fact that the purpose of intellectual property law is not to reward innovators but to encourage innovation. And it is not at all clear that winner-take-all is the optimal strategy for encouraging most players.

2. My most technical point, about finance. Cowen correctly points out that much of modern finance involves bets against extreme outcomes. During normal times, these bets pay off, and everyone looks very bright. But when the extreme outcomes happen, the entire system is shocked, and so the risk is socialized. The banks (or, rather, their creditors and in many cases shareholders) get bailed out, and the losses are borne by the taxpayer. Cowen does a very good job of detailing how this happens. (Loose money advocates, pay particular attention to the section on the stealth bailout of banks via a steeply upward-sloping yield curve, paying interest on reserves, and other ways in which money is kept relatively tight.)

But the problem of asymmetric risk is more general than he says. Any public corporation has an agent/principal problem with respect to executive managers versus shareholders, and there’s always a structural asymmetry between shareholders and creditors, to say nothing of stakeholders in the larger society. All of this can drive behavior that socializes risk and privatizes reward. (Remember, GM got bailed out, too.)

What’s unique about finance, I would argue, is a crucial kind of information asymmetry – not merely that outside observers cannot know as much as insiders do (this is also generally true, not merely true of finance), but that in finance the asymmetry pertains to precisely the information needed to game the system. Put simply, financial executives will always know more about how to game the financial system than anybody else, because manipulating the financial system is their job.

Cowen points this out as well, and says this is one of the main reasons that solving the problem is so difficult. But solving the problem may appear difficult in part because we parameterize that problem as “how can we keep up with the smart finance types” and that may be the wrong way to think about the problem. As I’ve argued before, the existing regulatory architecture is designed to intelligently measure risk and then intelligently charge for it. This creates a perverse incentive to hide risk where it cannot be measured, because then a financial institution can leverage returns enormously. This was precisely what happened with all that triple-A-rated mortgage debt. Triple-A investments are virtually riskless (supposedly) and so require virtually no capital. (Oops.)

We need to learn how to fight intelligence with stupidity, a paradigm change that emphasizes the impossibility of perfectly measuring risk, and therefore overcharges in ham-handed ways for apparently riskless positions. This would create an incentive to simplify balance sheets and to take risks that can be measured, and therefore charged for intelligently.

I’ve harped on this a bunch in the past, particularly with respect to Finreg, and the maximum leverage ratio provision is one limited step in the direction i’m talking about. But I think it’s more broadly relevant. But I think it’s relevant not just to the stability of the financial system but to this point about inequality. I think the case that the financial sector is delivering itself outsized rewards relative to their contribution to economic growth is very powerful. Focusing on trying to reduce those rewards may be grasping the wrong end of the stick. It may make more sense to focus on those activities that generate large rewards but don’t contribute to the productivity of the economy (or even detract from it). My sense is that those activities are concentrated in areas that don’t use a lot of capital.

In the financial sector, we actually want to punish institutions for finding ways to make money apparently for free, because we have the conviction that it is impossible to do so at any kind of scale – if it looks like you’re making money for free, you’re really taking risk that isn’t being captured by the existing models. Tackling the activities that are most directly contributing to the instability of the financial sector will have the happy side effect of either reducing the earnings of financial executives or forcing them to earn their keep by facilitating the allocation of capital to genuinely productive activities.

3. My most philosophical argument. Cowen’s point about inequality possibly not increasing among the lowest four quintiles because of changes in the prices of goods is a valid and good one, as is his point about absolute improvement in material conditions. But the prices of some goods have not gone down – some of the most essential goods have, in some cases, gotten quite expensive indeed.

It is possible today to grow up in an American home with a 40-inch flat-screen television and a daily caloric intake so high that it actually becomes detrimental to health, but to lack access to basic medical and dental care, to run a material daily risk of rape or other profound physical violence, and to leave school functionally illiterate. Poverty today means something very different than it did in Dickens’ day, but it has not been abolished. And it seems to me that asking about the welfare of the lowest quintile is a much better way of approaching the problem of inequality in our society than looking at Gini coefficients.

On the other hand, I think his point about “threshold earning” is also very important, and raises a philosophical question about the meaning of wealth. I would argue that the right measure of wealth is not where one stands on a relative scale to other people but, simply, can you live off your assets. At whatever your chosen lifestyle level in your subculture is, do you need to work for a living, or can you live off the work of others (which is what living off your assets actually means). If you can, you’re rich, even if you continue to work, because your work is a choice, not a necessity. if you can’t, you aren’t. Perhaps you could be if you simply adjusted your perspective on what counts as an acceptable lifestyle – but we all know people who are “house poor” and the phrase has real meaning. (By which I do not intend to suggest it is truly analogous to actual poverty.)

We are living in a very interesting period in history, where the boundaries between work and play, and between productive and nonproductive activity, are becoming more and more fluid. Having the freedom to cross and recross that boundary is enormously valuable to an individual, and a society. But that freedom is not remotely evenly distributed. I hate to sound like some kind of socialist, but when we think about inequality, in my view we should be thinking primarily about two factors: first, whether absolute deprivation of essential goods is a real problem in our society (I would argue that it is, at least for some essential goods); second, whether some measure of wealth in the sense I am using it – as freedom, rather than as relative positioning – is reasonably broadly distributed across society, and is a plausible aspiration for most people at some point in their lives.

Unbundle the Welfare State

I have a long article on the cover of the current National Review that they have been kind enough to make available online.

In it, I make several arguments:

1. There is an inherent tension between certain aspects of human nature and capitalism.

2. The welfare system exists, in part, to help manage this tension.

3. All major entitlement programs, when you look under the skin, have a common structure.

4. This structure generally bundles together features designed to achieve several separable goals.

5. Recognizing the facts of contemporary political economy, we should unbundle and modernize these programs.

A Reply to Noah Millman

Noah Millman makes a lot of complicated and interesting criticisms about an ongoing theme in a lot of my posts. There is a whole lot in his post, and what I take to be several independent arguments, so I’ll just try to address what I see as some of the most important topics.

Noah (if I may) ends by saying that he “has no truck with radical skepticism.” He begins with an anecdote about going to see his doctor with a pain, and the doctor going through a diagnostic process leading to the prescription of cortisone.

Noah says this:

Jim Manzi is fond of making unfavorable comparisons between economics and physics, and those sorts of comparisons are pretty good for making economics look bad. But I think a much better comparison is of economics to medicine. How does economics stack up in that comparison?

Well, is medicine a science? Doctors certainly have a lot of scientific knowledge. But there’s also a great deal they don’t know. And much of their actual practice involves operating in the area where knowledge is limited.

It turns out that I did a post right here at TAS on this topic in October. I was very critical of an article in The Atlantic that had expressed radical skepticism about the findings of medical research. The title of my post was “Has Medical Science Discovered Anything Useful?” I began with a summary answer in the form of a one-word paragraph: “Yes.” I argued that we need to be more fine-grained about claims of medical knowledge than the Atlantic author had been. I noted that even using the data presented in the article, one could see that two kinds of medical research findings appeared to hold up very well: (i) the results of well-designed randomized trials, and (2) findings concerning “traditional” medical procedures, rather than long-term, behaviorally-oriented interventions.

As I pointed out at the time, it is striking that the opposite of these characteristics – validation through non-experimental methods of data analysis rather than through controlled experiments, and interventions that that are attempts to change human behavior over extended periods – precisely describe the parts of social science that Noah notes that I have criticized.

Noah goes on to extend his story to say that the cortisone shot did not work to relieve the pain, and asks how we should use this fact to make further decisions, given that we don’t have a comprehensive understanding of the causal maze of the human body. This part of his story gives away the game.

How do we know that the cortisone didn’t relieve the pain? Because he (presumably) had a relatively consistent level of pain that he could reliably forecast would have continued absent some intervention. That is, the counterfactual of “what would have happened absent treatment?” is straightforward to answer in practical terms.

This is exactly why, as Noah notes, conscious trial-and-error progress has been possible in surgery for so long. There is, for example, an Egyptian papyrus dated to about 1500 BC that documents a surgical procedure approximating modern jaw surgery. The effects of many successful surgical procedures are so immediate and dramatic that abstract debates about causality are not necessary.

Progress for therapeutics was more problematic, however, because the change in outcomes was usually not so immediate and dramatic, and was often manifest as the reduction in probability of a disease or an increase in the probability of recovery. In information processing slang, we would say that the “signal-to-noise” ratio is usually much lower for therapeutics than for surgery. This is not always true, of course: Pasteur’s anthrax vaccine, for example, was exactly 100% effective for test animals for a disease that was exactly 100% deadly within a short period, and not every surgical procedure produces immediate, dramatic change.

So to return to Noah’s example, he has selected an analogy which assumes away those characteristics that make analysis of stimulus so hard: We can measure the counterfactual in the cortisone example, so we can learn after the injection whether or not it worked. As I’ve argued many times, if we could reliably measure the impact of stimulus spending, we would have much greater capacity to make intellectual progress, by beginning a process of trying alternatives, seeing what works, building up a range of case histories, and generally, developing a true expertise around when and how stimulus works.

To make Noah’s cortisone analogy more apt, we would have to imagine that he had some fleeting pain that came and went unpredictably, with widely varying magnitude. Historically, it is statistically correlated with lots of other external changes. It seems to be worse, on average: on Wednesdays and Sundays (though these are also days where he tends to run, and/or have visits from his mother-in-law); when it rains, but not when it snows; between 7 and 10 days after TV shows in Bangladesh on the British royal family, and so on. There are many thousands of such correlations. It also has some complicated statistical relationships with chemical properties of Noah’s body, as well as measurements of his mental state. There is extensive debate about whether each of these is a causal link, and if so, the structure of the causal relationships.

Noah gets a cortisone shot on Monday. The next day, the pain is slightly greater. One group of researchers builds a set of regression models showing that but for the cortisone injection, Noah’s pain would have been 10% worse. An academic discipline builds up around them at many leading universities. They are called the “Cortisonians”. An alternative group of scholars, dubbed the “anti-Cortisonians” builds up an alternative set of regression models showing that the cortisone had no effect.

What would I do? I’d run a randomized trial with 1,500 patients in the test group and 1,500 in control group, and I’d believe that answer. That’s what modern medicine does in this situation. (Of course, in reality, it usually does it before all this modeling takes place).

But doctors (just like engineers) are not making rote application of a set of known research-based treatments in a set of known situations, even if they have access to the results of a well-designed experiment. There is, as Noah notes, a valid role for expertise. I go into exactly this situation in some detail in the upcoming book (and go through the history of the related battles for control of decision-making). It is a complicated topic to describe fully. I’ll just try to focus in this post on a few items that I think are most relevant to Noah’s questions.

Suppose you are the doctor making a decision about whether or not to give Noah the cortisone injection, and there has been no research of any kind. I think you, rationally and morally, should have wide scope to try it, and also should be highly cautious about what you do in non-extreme situations because your ignorance level is so high. If you then are given access to the “dueling regressions” analysis, I think you should read the appropriate literature, likely filtered through intermediary interpreters. I further think that the formal results of the regression models should have limited impact on your decision-making, and if anything, the detailed case histories of individual patients are likely to be more helpful as input to your decision process.

Now suppose you are given access to the clinical trial results. This should carry great weight. But it’s still not as simple as saying: IF trial is successful, THEN always prescribe cortisone, and IF trial is failure, THEN never prescribe cortisone.

Even if one were to accept that, all else equal, treatment X is better than treatment Y, the problem is that all is else is never equal – patients have varying co-morbidity, are at different stages of life, have different lifestyles, needs and home situations, and so on ad infinitum. In practice, clinical judgment is required to determine the best course of action for a specific patient. If this were not the case, then simple observation would usually suffice to determine efficacy, as it has for thousands of years of trial-and-error learning about some kinds of surgery, or as it did for Pasteur in testing his anthrax vaccine. By example, this concern would be highly relevant for a treatment that demonstrated better outcome results than the best available alternative treatment for 52% of sufferers from a complex, chronic lifestyle-related disease with extensive and varying co-morbidity, but a worse outcome than the alternative treatment in 48% of cases. The wide variation of treatment effectiveness versus the best alternative is an indicator of significant hidden conditionals, and there are numerous realistic treatment alternatives. This objection can be applied to quite serious medical conditions as long as the believed effect of the complexities created by contextual issues are of comparable magnitude to the improvement created by the tested treatment, so that while the “best” treatment performs best on average, there is a large proportion of instances within the test and control population for which alternative treatments appear to do as well or better than the test treatment.

A single RCT as a test of some proposed therapeutic, then, is most appropriate for treatments that are in an intermediate zone of signal-to-noise: on one hand, that are not effective in more less every case, or else the conclusion would be obvious without the need for sophisticated controls; but on the other hand, that do not show improvement in a small enough majority of cases that even if a trial shows both statistical and practical significance it cannot provide a practical guide to action because too many other factors would have to be considered to make a rational decision. This is a special case of the problem of generalization from a known finding, which I have addressed in other articles, and will in the book, but in this post I’ll just note that this is really where the scope for expertise in the face of scientific findings applies most centrally, and that the way to replace some of this scope with further scientific findings is through a series of RCTs to test impacts under an ever-wider variety of conditions.

As a practical matter, however, a doctor who simply disregards a well-structured RCT showing some treatment is either beneficial or harmful, and especially a body of such RCTs that have shown this over and again that under a wide variety of conditions, is not doing his job very well. He might decide that other considerations than the reduction of pain are more important, he might, very rarely, make the decision that some case is so exceptional that these trials don’t apply and so on. But we would rightly castigate him as ignorant or ill-intentioned if he instead cited his intuition, or his reasoning from the first principles of biology, as making such RCTs irrelevant.

Now, try to apply (my extension of) Noah’s medical analogy to stimulus. The doctor is an elected official, and the researchers are credentialed macroeconomists. What kicked all this off was my contention that macroeconomists vastly overstate the reliability of their knowledge when they claim that elected officials are ignorant or ill-intentioned in ignoring validated findings about the impact of stimulus when making decisions about taxes, spending and policies for management of central banks. I claim that these economists are armed with the dueling “Cortisonian” and “anti-Cortisonian” regressions. I claim their knowledge is useful, but is not a reliable predictive tool that should trump political judgments even on specific, narrow decisions about stimulus in the way that the results from a series of well-structured RCTs should for a physician.

Jim Manzi and a Conversation With My Doctor

“Thanks for seeing me, doctor. I’ve been having this persistent pain in my left hip, sometimes extending down the leg to the ankle. Could you help me with that?”

“How long have you had the problem?”

“It’s been several months, and it’s been getting worse over time.”

“Did you have any trauma to your leg that you recall?”

“No. I had knee surgery a couple of years ago, but that was the other leg.”

“Has there been any change to your regular activities that you think could have affected your left leg?”

“Not that I can think of.”

“Does any activity you engage in exacerbate or alleviate the pain?”

“Driving makes it worse, definitely, and sitting often does in general. Walking or flexing my gluteal muscles sometimes helps, but usually only temporarily.”

“How are you treating the pain when it occurs? Does the treatment help?”

“Usual with ibuprofen, and yeah, it often helps, but not always, and the pain keeps recurring.”

“Well, let’s take a look.”

[Various manipulations of leg, knee, hip, interspersed with “does this hurt” and “how about now.”]

“My best guess is it’s an inflammation of the bursa. I’m going to prescribe a cortisone injection. It’ll probably take a few days to know if it’s been effective, so why don’t you call the office a week from today and let me know if the pain has recurred. It would be particularly helpful for you to drive the day before you call to see if the pain recurs under conditions that have tended to exacerbate it in the past.”

“Thanks, doc.”

Jim Manzi is fond of making unfavorable comparisons between economics and physics, and those sorts of comparisons are pretty good for making economics look bad. But I think a much better comparison is of economics to medicine. How does economics stack up in that comparison?

Read the full article

Science, History and Economics

Imagine that the president is considering his options vis-à-vis the Iranian nuclear program. First, a science advisor comes into the room and predicts that if the Iranians take the following quantity of fissile material and compress it into a sphere of the following size under the following conditions, then it will cause an explosion large enough to destroy a major city. Next, an historian comes into the room, and predicts that if external attempts are made to thwart Iranian nuclear ambitions, then a popular uprising will sooner or later ensue in Iran that will change governments until Iran has achieved nuclear capability.

The president would be incredibly irresponsible to begin debating nuclear physics with his science advisor. Conversely, the president would be incredibly irresponsible not to begin a debate with the historian. This would likely include having several historians present different perspectives, querying them on their logic and evidence, combining this with introspection about human motivations, considering prior life experience, consulting with non-historians who might have useful perspectives on this, and so on.

Next, an economist walks into the room. She predicts that if the CIA were to successfully execute a proposed Iranian currency counterfeiting scheme designed to create an additional ten points of inflation in Iran for the next five years, then the change in Iranian employment over the next decade would be X. Is this more like the historian’s prediction or the physicist’s prediction?

Superficially, she might sound a lot more like the physicist. If pressed for an explanation of how she reached this conclusion, she would use lots of empirical data, equations and technical language. The problem is that the abstraction from reality implied by the data and equations is vastly more severe for the economist than for the physicist. Some parts of the prediction would have some firm foundation, e.g., a build-up of alternative production capacity at all known manufacturing plants based on measurement of physical capacity. But lots of things would arguably remain outside the grasp of formal models. How would consumer psychology in Iran respond to this change, and how would this then translate to overall demand changes? How would the economy respond to this problem over time by shifting resources to new sectors, and what innovations would this create? How would political reactions by other countries lead to war and other decisions, which would, in turn, feedback to economic changes? And so on, ad infinitum.

Any sensible economist would, of course, put all kinds of qualifications around her ten year employment prediction to reflect such issues. Often (and this kind of language goes back all the way at least to J.S. Mill) such complexities will be described as something like “disturbances” around the “basic thrust” or “central trend” or whatever. But once these qualifications are accepted as material, then how do we evaluate the reliability of the prediction? That is, how do we know that the “disturbances” aren’t, in fact, more fundamental than the “basic thrust” of the economic theory?

The physicist’s answer to challenges to the reliability of his prediction is simple: Please view the following film taken from a long series of huge explosions that result when independent evaluators combine the materials I described in the manner I described. Note that this prediction is not absolutely certain. It is possible, as per Hume, that the laws of physics will change one second from now, or that there is some unique, undiscovered physical anomaly in Iran such that these physical laws do not apply there. But for all practical purposes, the president can take this predictive rule as a known fact.

How would the economist respond if challenged with respect to the reliability of her prediction? As far as I can see, she can respond with recourse to three lines of evidence: (i) a priori beliefs about human nature, and conclusions that are believed to be logically derivable from them, (ii) analysis of historical data, which is to say, data-driven theory-building, and (iii) a review of the track record of prior predictions made using the predictive rule in question. The analogous lines of evidence that the physicist could have used would be (i) common sense observations of the physical world, and conclusions that are believed to be logically derivable form them, (ii) analysis of observational data, historical experiments and the logic of the physical theories that were developed from these sources, and used to create the predictive rule in question, and (iii) the results of controlled experiments that tested the predictive rule in question. The reason the physicist need only concentrate on (iii) is that controlled experiments are accepted as the so-called “scientific gold standard” method for testing theories. Distrust of untested theories, no matter how persuasive they sound, has been central to the scientific method at least since the time of Francis Bacon. Note that the first president faced with this kind of a briefing actually had an enormously expensive experiment conducted to test the theory in Trinity, New Mexico before using nuclear weapons.

The problem with the economist’s reference to her version of (iii) is that, in practice, so many things change in a macroeconomic event that it is not realistic to isolate the causal impact of any one factor. To call some of these macro events “natural experiments,” is almost always to dress up rhetoric in analytical language. In analyses of true macro events as natural experiments, you will almost inevitably find either unsupported assumptions (or in the sophisticated cases, econometric modeling) embedded within the analysis of the “experiment” because of non-random assignment of units of analysis to alternative treatments and other issues. It is really more observational data. Further, even the definition of the “event” within the continuous flow of history embeds all kinds of assumptions.

This brings us back to where we started. How does the economist know that her predictions, which sound like the physicist’s predictions, are reliable in a way that the historian’s are not? She doesn’t. Therefore the president would be wise to treat the economist’s prediction like the historian’s prediction, in that it should be subjected to useful cross-examination by laymen, weighing of technical and non-technical opinions, introspection concerning human motivation, and all the rest. Beyond this, he should always keep in mind the unreliability of such predictions, and treat the fog of uncertainty about the potential effects of our actions as fundamental when considering what to do. I’m not arguing that the economist’s output is valueless – I would no more advise a president to make a major economic decision without professional economic advice than I would advise him to make a decision about war and peace with consulting relevant historians – but I am arguing that we should be extremely humble about our ability to make reliable, useful and non-obvious predictions about the results of our economic interventions.

I think that this story gets to the essence of an exchange that I have been having with economist Karl Smith. In his most recent post in this series, responding to my challenge to him – “You say that you have the ability to predict the effect of stimulus. Prove it.” – Smith says this:

I don’t think think I am saying this. At least, not how I think Jim means it. I am saying I have reason to believe that the effects of stimulus will be X and I can make an argument for it.

I accept that Smith has (non-trivial) reasons that support his beliefs about what will happen in response to stimulus, and that he can make an informed argument for them. More than this, I agree that his theory is at least plausible. My question continues to be the same: Where is the proof that his plausible theory is correct?

Smith goes on to argue that no predictive rule even in physical science is ever proven in the absolute philosophical sense.

I sometimes tell my students that scientists don’t prove, mathematicians and philosophers prove. Scientists accumulate evidence that seems to suggest.

This I think is true in all fields of science and is doubly true when that science is applied to actually engineering results in the real world. Not only have well relied upon theories in physics been upended upon careful examination but there is no one I know of who can design an airplane using a physics textbook. Nor, would many people trust an airplane to fly without testing it first.

And, despite despite all of the testing that is done, airplanes can a do malfunction and crash. There simply isn’t a “proving it” when it comes to making predictions about the real world. What we hope to do is give an answer that’s better than random and better than folk wisdom. [Bold added]

Smith won’t get a lot of debate on this from me. As he indicates, no matter how great the engineers sound when describing the plans for a spiffy new engineering feature on a plane, we still want test flights. And no number of tests can ever prove in a philosophical sense that this predictive rule will continue to operate in future contexts. And further, the standard for accepting this proposed feature is normally not “scores perfectly on every test every time,” but is instead more like “is superior to the existing alternatives.”

So, Smith here seems to me to be accepting the principle that the standard of evidence by which we should judge a predictive rule is by how it stands up to rigorous, real-world tests. It is the straightforward application of this principle that leads me to ask for the tests that show some proposed predictive rule (“the effects of stimulus will be X”) is, in fact, “better than random and better than folk wisdom.”

Smith’s response is that:

Now perhaps Jim is not confident that we can achieve our goal of beating randomness and folk wisdom. There are two basic lines of reasoning I can offer.

One is evidence and logic.

He goes on to argue that, in effect, even without formal testing of the theory, external to the theory-building process, we should take the arguments internal to the theory seriously, as they are built on a lot more than “hey, sounds good to me.”

To use Smith’s analogy, this is like saying that these are very smart aeronautical engineers who have applied well-accepted engineering principles to create this new feature. The thing is, as he indicates, we still would like to see actual test flights for a sufficiently important change. The whole point of our exchange is that economic theories don’t get a free pass from falsification tests. Quite the opposite, in fact: The astounding complexity of the subject matter under consideration should lead us to be even more skeptical of counter-intuitive claims made in social science than of those made in physical science.

He goes to describe a second argument, which is closer to what I have meant by falsification testing. He begins with this:

The second line I offer is that of experience. That when economists had the helm we really were able to produce results. In the 1980s Central Banks were largely turned over to their economists who produced low inflation and low unemployment by manipulating the overnight lending rate.

This is the crux of his reply, and it strikes me as not very compelling. First, it seems to beg the question. Lots and lots of important things happened on planet Earth in the 1980s. How do we know that it was the central banks who “produced” low inflation and low unemployment? Where is Smith’s evidence that he has isolated causality in this way? It is a textbook example of a highly confounded problem. Second, even if we were to give central banks complete credit for the economy of 1980 – 2008, Smith would then further have to show (i) that it was the application of some specific predictive rule for the effect of stimulus that accounted for this results starting in the 1980s, and (ii) that we can reliably generalize this hypothetical rule to the situation for 2008 – 2010, before this would count as empirical verification of some rule to be used to predict the effects of the stimulus program under consideration.

Smith extends this example, and concludes with this:

If you are arguing that I don’t know for sure that these tools will work then you are right. I don’t know. What I am suggesting is that the same logic and evidence that worked for controlling the overnight rate is telling me certain things now.

I don’t ask that you simply trust this. We can go through the models. We can go through the logic. We can look at all the evidence. However, at the end of the day we have to make a choice. Even the choice to do nothing is a choice, with consequences for which we will be responsible.

I think making our choice based on logic, evidence and the experience of the Great Moderation is the way to go.

Of course we have to make choices all the time, and we can’t opt out of the game; but the issue under consideration is the reliability of the predictions of macroeconomic theories for the future effects of various alternative potential decisions. As I’ve said before, Smith sounds like a smart and practical guy, but does this sound like the economics profession has created a good answer to my request for proof? Does this sound like something that should lead a rational observer to reject other lines of non-technical reasoning as irrelevant in the way we would in the case of actual scientific knowledge?

Economics and Abstraction, ctd.

I wrote a post arguing that much of the economics profession’s asserted confidence about its ability to forecast the effect of certain kinds of policy interventions is unwarranted, and that this problem is especially acute for predicting long-term effects. Karl Smith, an economics professor who blogs at Modeled Behavior, responded somewhat critically. (Reihan Salam weighed in as well.) I then replied to Smith.

Smith has now replied to me again. If you are at all interested in the question of the reliability of the knowledge produced by economics, I think it is worth reading his latest post in full. I complement him for being willing to engage on this at length, and in a spirit of open-minded discussion.

Smith’s latest post begins with this:

Jim Manzi responds to my post. It seems I came off a bit harsher than I intended. Other posts lead me to think that some believe I’m rejecting Manzi’s argument against overconfidence in models. Quite the contrary I am suggesting that academics don’t actually have the level of confidence Manzi and Brooks ascribe.

Fair enough. But if economists don’t really have sufficient confidence in their models to rely upon them, then how do they predict the effects of policies that they propose? Smith goes on to describe this with clarity and candor:

Manzi and Brooks seem to believe that policy advice comes plugging and chugging on the big models but instead it comes from an intuition honed by working with both the simple theoretic models and war gamming the big models. [Bold added]

Smith sounds like a very smart and sensible guy. But if the predictive method is in fact the economist’s intuition rather than some predictive algorithm that can be validated empirically, then how do we know the prediction is reliable? I guess it would help a lot if a given economist had a long track record of making accurate, value-added predictions about the impact of similar interventions – but as I’ve argued at length, it is difficult even to measure the effect of many macroeconomic policy interventions after the fact.

This is crucial. Unless Smith can demonstrate that his intuition can actually predict the effect of such interventions more reliably than some alternative method, then all we have is his opinion that macro model X is useful, or that war-gaming exercise Y is relevant, or that X and Y should be combined using method Z (which can’t even be defined explicitly if it’s intuition).

What Smith is describing here is intelligent and data-driven theory-building. What’s missing is the part where the theory is tested, and proven to be reliable.

In other words: You say that you have the ability to predict the effect of stimulus. Prove it.

At the end of his post, Smith goes into the specific example issue that I used in my original post to try to illustrate some of the enormous difficulties that formal economic forecasting of stimulus must confront. My example was basically that if we execute a policy that creates greater economic activity today, this will lead to a set of investments that otherwise would not have happened, and this will in turn change the alternatives available to us in future periods. The term for this general idea is path dependence. I argued that path dependence might turn out to have effects on total long run economic output that are significantly positive, significantly negative, or not material one way or the other.

Smith, in his original reply, said that things like this would make no difference:

For example, if you asked what effect would properly done fiscal stimulus today have on the economy 20 years from now, the fairly easy and straight forward answer is, none.

Stimulus is not central planning or industrial policy. It should have no lasting effects. If it does, then you did it wrong.

In his latest reply, he says this:

Thus monetary policy should have no large predictable effects on the economy. There are always butterfly effect type stories we could tell – a crucial company getting funding at just the right time, etc – but from our perspective this is white noise. We could just as easily crush the butterfly as set him free.

If Smith means the first sentence literally as written – that there should be no large predictable effects on the economy – then I obviously agree, as this is a simple restatement of what I have said. I’ll assume, subject to correction, that what Smith means by this paragraph is something like the following: The net path dependent effect of a stimulus action on total economic output over a period of decades is drawn randomly from a conceptual distribution that (i) is centered around zero, and (ii) will not produce impacts that are of material size.

If that’s a roughly correct interpretation, then it’s not at all obvious to me that it’s true. My request to Smith is, once again, simple: Prove it.

Economics and Abstraction, ctd.

Karl Smith at the Modeled Behavior blog has responded to my recent post on the problems in using some kinds of economic theory to guide policy interventions. His reaction is critical, to put it mildly. It also seems to me to be an almost perfect illustration of the attitude that I was trying to describe.

Smith begins by saying that I have some “wrong ideas” about economics, and then quotes this portion of my post:

In practice, the problem of excessive abstraction by economic theory that Brooks identifies becomes increasingly severe as we try to evaluate the effects of proposed interventions and programs over years and decades, rather than months and quarters.

He then immediately says this about it:

First, this is backwards. With cyclical policy its generally speaking easier to access the effect of interventions over longer and longer horizons because the economy increasingly resembles a frictionless market as you extend time in to the future.

For example, if you asked what effect would properly done fiscal stimulus today have on the economy 20 years from now, the fairly easy and straight forward answer is, none.

Stimulus is not central planning or industrial policy. It should have no lasting effects. If it does, then you did it wrong. What’s more difficult is the short run.

Unless he is using “properly done” as tautologically implying his conclusion, then I’m not so sure he’s right about this.

A stimulus action is designed to change behavior in the real world – some investment decisions must change. Imagine, as an illustrative example, that a stimulus program in country X executed in 1820 resulted in the digging of a large number of canals that otherwise would not have been constructed. This makes incremental investment in shipping capacity and development of improved shipping technology more attractive over the next several decades than it would otherwise have been, as compared to the development of land-based transport systems. X then develops a greater reliance on ship than rail transport. This leads to some gains beyond what was unanticipated in 1820. Over the succeeding decades, X becomes a world leader in the shipping industry; by 1850, total output in country X is higher than it would have been had no canals been dug.

Countries that had not built early canal networks invest relatively more building out rail networks, just as X would have done had it not built all those canals in 1820. When, hypothetically, a set of technical innovations occur, it becomes suddenly apparent in the second half of the 19th century that rail is now a superior form of transport. Country X finds itself behind, and by 1880, total economic output is significantly below where it otherwise would have been. Further, increased competition with a neighboring country over shipping lanes leads to escalating tensions, finally resulting in a full-scale war in 1890 that would not have occurred had X’s shipping industry been smaller, which further reduces total output.

Of course, the exact opposite situation could have obtained, in which the decisions changed by the stimulus action turned out to position the society unexpectedly well for future developments through the end of the 19th century. It’s also possible that, as per Smith’s assertion, that building all those canals would have turned out to have had no material effect after decades.

But how could any of this be predictable in 1820 when making the stimulus decision? For that matter, how could we even know after the fact in 1890 what the effect of this decision was on total output, since this would require that we know what the counterfactual path of development would have been in the absence of the stimulus?

Changing, for example, the rate of economic activity today will lead to changes in actual investment decisions taken today in the context of the opportunities that are perceived to be available today. This will to some extent, and potentially to a material extent, change the array of choices available to us in the future. Professor Smith’s confident assertions notwithstanding, this change in the future option set may or may not change total future output materially. I believe the fancy name for all of this is path dependence.

Economics and Abstraction

David Brooks has a great column in the New York Times arguing that technocratic management of the economy leaves something to be desired. He particularly focuses on the growing disillusion with attempts to manage our response to the economic crisis. My favorite part is this:

The liberal technicians have an impressive certainty about them. They have amputated those things that can’t be contained in models, like emotional contagions, cultural particularities and webs of relationships. As a result, everything is explainable and predictable. They can stand on the platform of science and dismiss the poor souls down below.

Yet over the past 21 months, it has been harder to groove to their certainty.

In practice, the problem of excessive abstraction by economic theory that Brooks identifies becomes increasingly severe as we try to evaluate the effects of proposed interventions and programs over years and decades, rather than months and quarters.

Consider the role of very low interest rates in stimulating economic growth in the software industry where I work. Easy monetary policy, along with various other forms of stimulus, has at least in part, likely worked as advertised; it has likely stimulated some extremely-difficult-to-quantify general economic growth, which has in turn created demand for enterprise software, among many other things. And low interest rates probably have resulted in certain additional development projects within large companies being greenlighted because they face a lower discount rate. In fact, many traditional large enterprise software companies have built large cash hoards. But they are mostly using them to finance acquisitions, not to expand capacity and increase aggregate output. Why this is so turns out to be important for understanding the potential effects of this policy on the industry.

The biggest reason is that a series of disruptive technical / business model innovations – most prominently, Software-as-a-Service (SaaS) and open-source – is transforming the industry. The rational incentive of the incumbent managers is to suppress the innovations, and at best, slow-walk them and channel them in directions consistent with their current business models. What’s happening right now is the jockeying between entrepreneurs, incumbent company management teams and the capital markets to seize the potential value that these innovations are unleashing.

Large company growth will disproportionately come not from just adding more of the current “capacity” (mostly people) in response to stimulus, but different kinds of capacity that are required for these new business models. For example, more software engineers trained in traditional languages and accustomed to working on large, structured projects are less useful for growth than engineers with experience in web-focused technologies used to working in a so-called agile development environment. And it’s not as simple as incumbent companies simply changing their hiring specs, because it is very difficult to transform settled company expertise, systems, compensation plans, culture and so forth to operate in this new environment.

Large software companies do not have plans on the drawing boards for the moral equivalent of a new ball-bearing factory if only demand were higher – their primary strategic problem is this regard is that they don’t know how to build the new capacity. But the existence of the competitive threat forces their hand, and they buy the new kind of capacity in the form of corporate acquisitions.

One major effect of a Fed policy of easy money, then, is that large software companies can go borrow lots of money cheaply, and then use this acquire entrepreneurial companies that usually require more equity financing rather than debt financing. This does not add capacity to world, but simply transfers management control over some very important assets from entrepreneurs to incumbents.

Will this lead to higher or lower economic output in 2015, 2020 and 2030? I don’t know. But then again, neither does anybody else.

The example I have chosen to highlight focuses on the complications in trying to forecast the impact of lower interest rates in the software industry created by the emergence of new technologies and business models. But of course, there are many other complicated effects of very low interest rates on the software industry beyond simply pumping up aggregate demand – impacting everything from the feasibility of leveraged buy-outs to the re-opening of the IPO window. Each will advantage or disadvantage some part so the industry at the expense of others. And stimulus can be anything from low interest rates to running deficits to quantitative easing. And the software industry in one small part of the overall economy. So this is an example of one complication for one type of stimulus in one industry.

Where is any of this complexity captured in econometric models that purport to explain how fiscal deficits, interest rates and quantitative easing are driving everything from car dealerships to television broadcasters to consumers of dog food, all of whom face their own unique dynamics? But without it, I doubt the ability of any model to forecast the long-run impacts of a multi-trillion dollar program to intervene in the economy in the name of creating self-sustaining growth in the long -term. All I can say with confidence is that if you believe as I do that a good rough rule-of-thumb is that “over any sustained period markets supported b y an appropriate culture will do a better job than politicians in allocating resources to generate high economic growth,” then at some point, the distortions created by such a policy would likely outweigh any benefits it can create.

In an emergency, the idea of stimulus is not an inherently bad one; in fact, I have advocated it in certain circumstances. But it is inherently dangerous. Its effects are at best only extremely loosely predictable in the short-run; it is addictive; and it is likely pernicious if sustained.

From at least the time of J.S. Mill, the fundamental methodology of economics has been to use introspection to develop theories about human behavior, systematize them into theories, and then try to compare the predictions of these theories to the real world. For reasons I have gone into at boring length, it is very difficult to conduct such tests of useful, non-obvious rules that predict the effects of our proposed interventions reliably in economics and other social sciences. The big problem with most economic theories that claim to be able to guide our interventions with confidence is not usually that the causal pathway that they propose is incorrect, but that it’s radically incomplete. It is typically one of an all-but-innumerable array of causes that are interconnected in a maze of causation that produces highly unpredictable outcomes as a result of any intervention. Despite confident assertions by academicians, the Law of Unintended Consequences remains in force.

What We Talk About When We Talk About Productivity

I talk about how “we need higher productivity” a lot, so I should probably clarify what I mean by that.

Productivity is output divided by input. If you get more economic output with the same number of input labor hours, productivity has grown. Similarly, if you get the same amount of economic output from fewer worked hours, productivity has grown. And if you get the same amount of economic output from more worked hours, productivity has shrunk.

But here’s the thing. Worked hours is an aggregate figure. Individuals who work zero hours are not contributing any labor inputs, but they are still “on the books” as people who consume a portion of the national income. Rising unemployment and underemployment can appear to be increasing national productivity, but whether that’s truly the case depends on where these workers end up once the economic climate improves.

Compare two situations of countries entering recessions. As the first country enters recession, businesses begin to lay people off. Partly this is due to a need to reduce output in the face of lower demand, but partly it is due to pressure to maintain profits when sales are stagnant or falling; the only way to do that is to “do more with less” – i.e., increase productivity. So the business will lay off marginally productive employees and rationalize a variety of processes and wind up with a more productive workforce. Multiply this across the economy, and you wind up with a higher unemployment rate, and negative economic growth, but a sharp rise in labor productivity. This is what usually happens: when unemployment rises, productivity does as well.

Now let’s look at the second country. As it enters recession, it faces the same pressures. But it has strong laws and/or norms against layoffs. So businesses retain more workers than they really need. But sales and profits are still falling. So the businesses in this country cut costs by reducing the hours and/or wages of their employees rather than laying off marginal workers. In aggregate, because less-productive workers are retained, the businesses in this country will produce less output per labor input. Multiply this across the economy, and this second country will have a lower unemployment rate and lower labor productivity.

But is that “really” the case? Is the second country’s economy “really” less productive? A proper measure of productivity would account not only for the drag of retaining less-productive workers, but also the drag on the economy of having to support people who aren’t working at all. The productivity measure that makes perfect sense for an individual firm makes very little sense for the economy as a whole because you can’t lay people off from an entire economy. People who are completely unproductive – because they don’t generate any output at all – aren’t a drag on any individual business, but they are a drag on the economy as a whole.

The case for not following the second country’s path isn’t that in the depths of the recession their economy is less productive. In fact, the second country is probably doing better in the middle of the recession even though their productivity statistics look worse. The case is that once the economy recovers, the first country will retain its productivity gains, and the unemployed workers will be reabsorbed into the workforce in more productive roles than they would have had in the second country’s situation, and that this effect will overwhelm the drag associated with temporary unemployment. But that’s an empirical question that will be tested once the economy recovers. It’s not something that can simply be assumed.

Let’s take another illustrative example. Assume for the sake of argument that, on average, people over age 62 are less productive than people under age 62. Now suppose one country has a retirement age of 62 and another has a retirement age of 65. The first country should show higher labor productivity than the second, because it has created incentives for people to retire from the labor force earlier. But its economy as a whole should be less productive, because citizens in the first country who are aged 63 are producing nothing, while in the second country citizens aged 63 are contributing to the economy. In this case, there’s nothing temporary about the choice. We’re not talking about temporary unemployment and whether it makes the economy more or less productive over the longer term. We’re talking about permanently reducing the size of the labor force. That’ll make it appear more productive, but it’s actually reducing the productivity of the economy as a whole.

Of course, there are problems with any statement that so-and-so isn’t “contributing to the economy.” There’s a lot of (arguably) valuable work that is undertaken on a volunteer basis: raising children, organizing a softball league, (blogging). In the classic example, if two women each raise their own children, they are not contributing to GDP, but if they each pay each other $15/hour to provide each other with childcare services, suddenly they are contributing to GDP, even though no additional value has been created. Notwithstanding this objection, I think my point is still valid. Saying that productivity goes up in recessions is only “really” true if the gains are sustained through the period when the economy gets back to a comparable level of labor force participation. If the size of the potential workforce (able-bodied individuals of working age) is stable between two points in time, and total hours worked across the labor force is the same at each of those points, but output is higher at the second point than at the first, then we can say that labor productivity has increased. If, on the other hand, lots of people dropped out of the labor force between the two points – retired early, were sent to prison, stayed in school an extra year without acquiring useful skills, became discouraged workers – then the apparently higher productivity at the second point is at least partly an illusion. The economy isn’t actually more productive; we’ve just taken the least-productive people and made them completely unproductive.

So when I talk about “increasing productivity” I mean productivity properly measured: from peak to peak in terms of labor force participation. Right now, we show pretty decent productivity numbers, but I pay essentially no attention to that because they appear to be achieved by laying off marginal workers and not much else. If we don’t lay the groundwork for re-employing those workers more productively than they were employed before – whether because their skills atrophy and they drop out of the workforce permanently, or in the unlikely event that we decide to employ them in make-work jobs that are even less-productive than what they used to do – then the apparently decent productivity numbers will turn out to be entirely illusory.

Four Questions For Matt Yglesias About QE2

Apropos of his latest and plenty of pieces before that.

People often talk about interest rates as if they can be divided into two parts: the “real” rate and the component that reflects inflation expectations. The real rate should be reflective of expectations for true growth in the economy; the difference between the real rate and the nominal rate reflects expectations about the change in the value of money. So, if inflation was running at 2% and was expected to continue to do so indefinitely, and the 10-year bond yield was 4%, you’d say that “real” interest rates were 2%. Similarly, if the 10-year bond yielded 2% but we were experiencing deflation of 1% per year, and that was expected to continue, you’d say that “real” interest rates were 3%.

The idea behind QE2 is for the Fed to force up inflation expectations while keeping down nominal yields. If inflation expectations go from 1% to 2%, and the 10-year Treasury yield is also 2%, then real rates go to zero. That’s very stimulative. Obviously, business and individual borrowing costs aren’t the same as the Treasury’s cost, but they are pretty closely linked, and if real rates were zero the incentive to borrow money and invest it in, well, just about anything would be pretty high.

Another way to think about it is this: right now, expectations for real growth are pretty anemic. Arguably, though, market rates reflect higher real interest rates than would be justified by these anemic growth expectations, simply because of the zero bound problem (nominal rates will always remain positive even if inflation is near or below zero). So if the Fed can engineer higher inflation expectations while keeping rates down, real rates would be more reflective of the real current expectations for growth – namely, that there won’t be much. Which, in turn, would be good, because it would remove one barrier to the resumption of growth, namely: relatively high real interest rates.

That’s a big “if,” though. Hence my questions:

1. If the Chinese intend to let their currency float, and if the general assumption that a free-floating Yuan would appreciate significantly against the dollar, they should probably unload their Treasury holdings first, to avoid taking a big loss. Certainly, they should stop buying more of them. But America is producing debt at a prodigious rate. If what QE2 accomplishes is mostly to convince the Chinese to stop subsidizing low interest rates in America, leaving the Fed to basically pick up the slack, how is that going to improve the American economy? Wouldn’t we just wind up with higher inflation and higher nominal interest rates – i.e., stagflation?

2. The reserve army of the unemployed in China is numbered in the hundreds of millions. Millions of new workers migrate annually from rural areas to China’s cities, whether the economy is growing at 10% per year (producing enough jobs to absorb the newcomers) or 6% per year (producing not nearly enough jobs to absorb the newcomers). That problem is really the only thing China’s economic managers think about. Trade may not be terribly important to America’s economy all things considered, but it is enormously important to China’s economy, and specifically enormously important to providing for rapid employment growth. The Chinese have been doing a lot of infrastructure investing to increase domestic demand, but it is not going to happen overnight. So, assuming the last thing the Chinese are going to do is accept a significant dislocation in their export industries and allow employment growth to slow, what do you think China will do to respond if America makes it clear we’re acting without regard to their interests? Doesn’t it make sense that their first response would be not to show up for the next Treasury auction? And wouldn’t that be a problem?

3. Higher inflation expectations should reduce demand for money and increase demand both for depreciating goods (consumer goods) and for goods that retain value in an inflationary environment (commodities). But while trade is a relatively small proportion of America’s economy, so is QE2. The volume of our international trade is large enough to absorb a good chunk of the money created by the Fed, and if much of that money does “leak out” that could well have negative feedback on the real economy. If, for example, higher inflation expectations mean higher oil and copper prices, that would act against any stimulative effect of a weaker dollar. If easier money means the marginal investment dollar goes to a commodity-based economy like Chile’s, that creates real problems for Chile without particularly helping stimulate demand in the United States. The question, of course, is the magnitude of this leakage. I don’t know what that magnitude will be any more than Yglesias does, but the people responsible for managing the economies of a variety of emerging markets seem to think it could be big.

4. The “big bang” danger lurking behind any proposal for a seriously expansionary monetary policy is the risk of a dollar crisis. I continue not to expect one, but I am cognizant of the fact that the United States benefits greatly (in the form of lower nominal interest rates than we deserve) from our status as the world’s reserve currency, and that inevitably we will lose this status one of these days. We’re in the process of losing it right now, in fact – but we’re losing it slowly. I can’t see how it could be good to lose it precipitously. But many of the more radical proposals for monetary stimulus have a flavor of “we need to convince people that we’re being irresponsible – that we actually want to trash the currency, so that they don’t just wait for us to withdraw the liquidity we’ve created but actually go out and buy stuff.” And it seems to me that anything that actually convinced the market that the Fed was going to be irresponsible in money generation would surely also convince the market that dollar should no longer be the preferred reserve currency. Right? So, again, you aim for lower real rates and wind up with higher – much higher – nominal rates.

I keep harping on this point, so I’m going to harp on it again. The United States cannot fund its own debts. I don’t mean the United States government – I mean the United States. Japan, for example, has an absurdly high public debt, but historically – though this is in the process of changing – had so much private savings that they could more than cover the cost of public debt domestically and still export capital abroad. The high public debt was really an accounting problem, something that impacted the distribution of wealth within the society but not the solvency of Japan as such.

But in America, private as well as public debt exploded in the last decade. If your economy has a private savings rate high enough to finance large public deficits, then you can run an expansionary fiscal policy as a way of basically getting around the big rise in the risk premium that accompanies recessions. You can also run an expansionary monetary policy as a way of getting private capital moving again without recycling it through the government. You can devalue your currency to improve your trade position and not worry that this will make it prohibitively expensive to borrow abroad because you don’t need to borrow abroad.

Our private citizens are busy rebuilding their balance sheets. Personal savings rates have finally gone up, to around 5%, which is low by longer-term historic standards but high relative to the last 20 years. That is absolutely right and necessary, even though it contributes to the recession in the short term (people are spending less, hence businesses expect people to spend less and don’t invest in new factories and bigger sales forces, and so forth). But net of individual, corporate and government accounts, nationally we are still borrowing from abroad – borrowing more, in fact, than we ever have before. Any expansionary policy, fiscal or monetary, depends on the goodwill of those who are actually financing that policy. So commentary on what our monetary policy should be that ignores or dismisses the international dimension strikes me as downright peculiar, unless it’s inviting us to play a kind of game of chicken, in which it makes game-theoretic sense to tell your opponent that you are unaware of or unconcerned by facts that are kind of important to your well-being.

Right now, real growth expectations are very low, and arguably real interest rates are too high relative to those expectations. If we get monetary policy just right, perhaps we’ll be able to bring down real interest rates, keep nominal rates low, and engineer a more robust recovery. But if we get it wrong, we’ll get higher inflation expectations without a recovery, and we’ll be substantially more constrained in terms of what kind of economic policy we can pursue going forward because of our continued dependence on foreign capital and the jaundiced eye with which foreign investors will view America after such a failure. So I don’t want to bet America’s economic future on getting monetary policy “just right.”

The other response to real interest rates that are “too high” relative to growth expectations is to figure out how to improve the prospects for real growth – which means, primarily, productivity growth. The traditional Republican mantra on this subject – cut taxes and all will be well – doesn’t even deserve the dignity of a response, which is why I spend essentially no time responding to it. But the question which that mantra is attempting to answer is the right question, the most important question for us to be asking.

The Difficulties of Stimulus in a Debtor Nation

As of FY2010, in round numbers the Federal budget looks like this:

Estimated receipts: $2.38 trillion
Mandatory expenditures: $2.18 trillion
Discretionary expenditures (including DoD): $1.37 trillion

One way to look at this is: at the current level of deficits, virtually the entire Federal government is debt financed. All the money we take in through taxes is sent out as checks to somebody or other: retirees mostly, but also the unemployed, the poor, the disabled, and holders of U.S. government debt. We are not buying anything with our taxes: we are just redistributing the national income.

Every actual government activity – from the military to the FBI to the post office to whatever the Department of Commerce does – is being financed by debt.

Right now, interest rates are very low. This makes it relatively easy for the government to continue on this path, piling up debt against the day that a stronger economic recovery both increases tax receipts and automatically reduces mandatory expenditures. The long-term deficit, driven primarily by health-care costs, is another matter, but the solution for the massive cyclical deficit is economic recovery: that’s it.

Hypothetically, there are two ways the government could boost the economy. The government could engage in fiscal stimulus, spending money to directly (if the money is spent by the government on goods and services) or indirectly (if the money is rebated to taxpayers to spend) increase aggregate demand. Or the government could raise inflation expectations, which will make consumption look more attractive relative to saving (and hence increase aggregate demand) and will make investing in risky assets look more attractive relative to safe assets (since safe assets aren’t so safe if unpredictably high inflation is expected to erode their value).

I say hypothetically, because the United States is not only running a massive public deficit, but is coming off a decade of national dissaving. Savings rates have been negative since the days of the internet bubble, and only turned positive with the onset of the financial crisis. A public spending program could only be financed domestically if the national savings rate increased to offset the increase in the government deficit, which would be counter-productive from a stimulus perspective. Any fiscal stimulus, to be effective, would need to be financed abroad.

Similarly, an increase in inflation expectations would indeed increase the preference for consumption over savings, and for risky investing over risk-free investing, among dollar investors. But this would only have a net effect of stimulating the economy if the rise in inflation expectations doesn’t result in a shift in investor preferences against the dollar and in favor of other currencies.

Progressive critics of the Obama/Bernanke administration’s reluctance to pursue stimulus without reservation tend to debate whether they are doing the most that could be done given the domestic political environment or whether President Obama could have done more to shape that political environment in a more helpful direction. But it has always seemed to me that the objective constraint on their behavior is America’s status as a debtor nation.

If the Fed set out to convince the markets that he was actively trying to produce at least 3% inflation, and would not act to reduce inflation until domestic economic conditions were significantly improved, why wouldn’t China shift its portfolio out of Treasuries and into Euros; why wouldn’t investors, domestic and foreign, shift investment out of the U.S. and into jurisdictions that are either producing higher returns without accelerating inflation (such as some emerging markets) or that are still committed to aggressive disinflation (such as the Euro zone)? Quite apart from the direct negative effect this capital flight would have on the American economy, the indirect effect would be to make it much more difficult to finance our high fiscal deficits. Monetary stimulus would then lead to fiscal contraction.

On the other hand, if the Treasury set out to massively increase fiscal stimulus – running a $2 trillion deficit, say – it could only achieve the massive increase in foreign financing necessary to raise the addition $1 trillion dollars per year by convincing foreign investors that their investment was safe. With nominal interest rates low and the risks of default essentially zero, “safe” means “safe from inflation.” A precondition for substantial fiscal stimulus, then, is a conservative, disinflationary monetary policy.

The most recent round of inflation-protected Treasury debt was issued at a negative yield. Negative interest rates on TIPPS mean that the market expects nominal returns below the expected inflation rate. To me, that’s not a very attractive market to invest in. One possibility is that investors look at the negative yields on TIPPS and say, “gosh, we’d better find some real economy American investments to make so we get some return.” But the other possibility is that they’ll just look elsewhere, and American monetary stimulus will fuel investment in risky ventures in Brazil or China or India, or encourage investors looking for “safe” investments to look to the Euro zone. In that case, jawboning inflation upward would result in higher long-term rates but not increased economic activity.

I don’t mean to paint things as entirely black and white. The United States is not a developing country overwhelmingly dependent on foreign investment. In the wake of the Greek crisis in particular, there are good reasons why investors might see risk in the Euro-zone. Emerging markets have run up very sharply and many are already taking actions to restrict foreign investment precisely because they fear a bubble; China has long had a variety of capital controls in place. Such controls should limit the degree to which “hot money” runs to these markets in pursuit of higher returns. There are a variety of ways that the United States can “sell” a stimulus program, monetary or fiscal, to foreign investors. My point is just that these are the people we need to sell on the concept, and everyone in government is aware of it. The United States is very big, but it is not autonomous.

Newer articles ↑

Older articles ↓