## How economists get tripped up by statistics

Look at this scatter chart. There will be a quiz. Another dot is going to be added to this chart, in line with the distribution you see here. You get to choose what the X value of the dot is — and your aim is to get a Y value of greater than zero. So here’s the question: at what value of X are you going to have a 95% chance of getting a dot above the axis, in positive territory on the Y axis?

Emre Soyer and Robin Hogarth of the Universitat Pompeu Fabra, in Barcelona, recently asked a group of economists that question — all of them were faculty members in economics departments at leading universities worldwide. There’s a right answer: it’s 47. And there’s a spectacularly wrong answer: anything less than 10. The economists being asked the question are smart, highly-educated people who are intimately familiar with regression analyses. And it turns out that as a group, they did very well: just 3% got the question very wrong.

But the economists also like being precise. They don’t like eyeballing answers: they like to be certain. And so, Soyer and Hogarth write:

Most of the participants, including some who made the most accurate predictions, protested in their comments about the insufficiency of information provided for the task. They claimed that, without the coefficient estimates, it was impossible to determine the answers and that all they did was to “guess” the outcomes approximately.

This is fair enough. So Soyer and Hogarth found some other economists — chosen randomly in exactly the same way. And they presented those economists with all the coefficients and data they could want, just as it would be presented in an academic paper. It looked like this:

Everything’s there — the formula for the random perturbation, the means and standard deviations for both variables, the OLS fit, the lot. With all this information to hand, economists can be much more accurate when being asked to do something like work out a value for X such that there’s a 95% chance that Y will be greater than zero.

But here’s the thing: when the economists were shown *both* the graph *and* the detailed numbers, the number of economists getting the answer spectacularly wrong — the number giving an answer of less than 10 — soared. Just working with their eyeballs, 3% of economists got it wrong. Working with the numbers as well, that proportion rose to 61%! And when a third group was given the numbers and no chart at all, fully 72% of them — professional economists all — got the answer badly wrong.

What the authors conclude is that economists tend to overstretch when they read academic papers — they think that papers show much more than in fact they do. And the more academic papers that economists read, the more misguided they’ll become:

By reading journals in economics they will necessarily acquire a false impression of what knowledge gained from economic research allows one to say. In short, they will believe that economic outputs are far more predictable than is in fact the case.

We make all of the above statements assuming that econometric models describe empirical phenomena appropriately. In reality, such models might suffer from a variety of problems associated with the omission of key variables, measurement error, multicollinearity, or estimating future values of predictors. It can only be shown that model assumptions are at best approximately satisfied (they are not “rejected” by the data)… There is also evidence that statistical significance is often wrongly associated with replicability.

I’m certainly guilty of this kind of thing: I see a paper demonstrating a statistically significant correlation between one variable and another, and I generally assume that if the experiment were repeated, we’d see the same thing again. But that’s not actually true.

And so it’s easy to see, I think, how economists become convinced of things that the rest of us aren’t sure of at all — and how the economists often end up being wrong, while the rest of us were right to be dubious.

What’s more, if economists are bad at this kind of thing, just imagine what other social scientists are like, or even doctors. Next time you see a piece of pop-science talking about interesting findings from some paper or other, bear this in mind. A lot of papers are written; a few of them have interesting findings. Those are the papers which tend to get publicity. But there’s also a very good chance that they don’t actually show what the headlines say that they show.

(Via Dave Levine. And please, don’t get me started on all the meta-implications of this post; suffice to say I’m fully aware of them.)

This article is depressing, but not entirely surprising. From my limited teaching experience durin gmy PhD a lot of economists avoid learning about econometrics like the plague during their degrees on account of the math content.

The great ER Tufte starts his classic book ‘The Visual Display of Quantitative Information’ by showing that graphics can be more revealing than conventional statistical computations. This seems to bear him out.

@cwhope

Hahahah. Numbers are arbitrary. The interpretation is key.

I wouldn’t read too much into this study. I do agree that we should be skeptical about the how strongly results we see in journals follow from the data, but this is a different issue.

When I see these kinds of studies, I always ask myself, how does with the effort necessary to carry out the computation compare with the incentive to produce the correct result. The effort in the first case is pretty small. I read the question, thought it was a cute problem, looked at the graph for about 20 seconds, and guessed Y=55. Not perfect, but close enough for rock n’roll. When I read the second question, I just thought, “This is actual work. I am not going to expend the effort necessary to do this. I have better things to do with my time.” The third one requires combining the efforts of the other two — even more work.

If we tied the salary or grant funding of the economists to their response on this question, I bet you would see many more correct answers.

Dilbert: Presentation

I didn’t have any accurate numbers so I just made up this one. Studies have shown that accurate numbers aren’t any more useful than the ones you make up.

Audience member: How many studies show that?

Dilbert: 87

Interesting, but not at all surprising to someone who studied Biology at University. Biologists don’t need to predict where in a field a specific species of plant will be found, they just need to know what the overall field will look like. Mathematical models are always inaccurate; observation on the other hand, represents reality.

It’s like the difference between analog and digital audio. The digital version actually cuts out slices of what is real, and then pastes them into a stream that is a series of steps, where the accuracy is determined by the sampling rate (how many slices per second) and the bit rate (how much information is saved in each slice). The analog audio captures and plays back everything, and a good ear can tell the difference.

This study shows that if you rely too much on maths, you are going to fall over. That’s what LTCM and quite a few others have done in the past, and will in the future too. But get a Biologist in and you’ll find someone more used to measuring reality than an ivory tower assumption of what reality is.

I don’t understand your takeaway here Felix. I’m with DoctorChris and CRamakrishnan: this result shows how useful graphs are. That graph, with the standard error written below it, is perfectly perfectly clear – at X = 10, something like half the Y values are below the axis, and you just can’t give that as the answer unless you’ve been drinking heavily, as it seems 3% of economists were.

I’m surprised that 72% of trained economists got it wrong with just the numbers, but as an untrained non-economist I’m sure I would have been among them. The lesson is not about repeatability or overconfidence, it’s about data presentation. If you want to convey your findings in an intelligible way, pages of regression statistics at the back of the paper are worse than useless. Intelligently chosen graphs work to a two-sigma certainty.

@MattL: perhaps you didn’t read closely enough, but the article actually says that of economists given *both* the graph and the equations, 61% got it wrong! It wasn’t simply about data presentation; they had a strict superset of information, but the extra information lead them astray.

I’ll go against the grain and say that this story isn’t a big deal, largely because I think it misrepresents what economists are typically doing. That is, the authors are testing how well economists can measure *something that economists never care about.*

When economists analyze data with the goal of identifying relationships between X and Y, they’re typically interested in getting the best estimate of that particular effect, while controlling for all of the factors that may introduce noise into that relationship. That’s not what the authors of this study are looking at, though.

Consider the following question that is likely to interest both economists and policymakers:

What is the effect of education on net worth? (you could also think of this as the effect of education on income, but I chose net worth because, like the data in the graph, it is often negative).

So in other words, you’ve got Net worth = Y, Education (say, measured in years of schooling) = X and you’re estimating the equation over a sample of individuals:

Y = C + X

You can think of this as saying that everyone has some baseline level of net worth (estimated as C), which may vary based on the years of education (X). When you perform this regression, you’re estimating both C (the baseline level of net worth) and X (the effect of an additional year of education on net worth).

What economists typically care about is getting a precise estimate of X – the effect of education. If they can reliably say that X is positive, it supports the idea that additional schooling has value. Depending on the magnitude of that effect vs. the cost of an additional year of education, you can further advocate for more or less schooling.

What economists wouldn’t care about in this setting is the estimate of C – the baseline level of net worth. That’s because C is fixed – it’s the condition that people are born into.

The “problem” that Felix is talking about stems from the fact that the authors are testing economists based on both X (which economists care about) and C (which economists don’t care about). It’s not surprising that economists are thrown off – the effect here is caused by the variation in C.

Here’s what the authors are doing, translated to the education analogy:

They’re asking, “At how many years of education will a randomly-chosen individual have a 95% chance of having a positive net worth?”

But that’s never what economists are asking. Why? Because that value depends both upon the distribution of C (the baseline level) and the precision/distribution of X (the relationship between education and net worth).

What economists are *really* asking is, “What is the range of education effects that you can say captures the ‘true’ effect with 95% certainty”. The reason they ask that is because they want to know:

- What is the estimated effect of X on Y (i.e., the effect of education on net worth)?

- What is the possible magnitude of that effect?

So a typical economist would report a result like this:

We estimate that an additional year of schooling increases net worth by approximately $100,100 (the value from the example Felix gave).

Moreover, based on the standard error, we estimate a 95% confidence interval of between $93,632 and $106,568 (estimated effect +/- 1.96* standard error).

which is exactly the kind of thing they’d infer from the data presented.

And the typical economist wouldn’t care at all about the question, “At what level of schooling is a random individual from the general population 95% likely to have a positive net worth?”. That question:

- is clouded by the variation in baseline net worth (C)

- doesn’t inform the policy debate about whether schooling adds value

I could not understand the post because I could not understand the problem, which was trivial. But when I looked at the paper, I realized it is because you mis-stated it. You wrote:

So here’s the question: at what value of X are you going to have a 95% chance of getting a dot above the axis, in positive territory on the Y axis?

There are many values of X where you have a 95% chance of getting a positive value of Y. Nobody would err, if they picked 100, for example.

But what the authors asked was pick the minimum value of X that yields a 95% chance for a positive value of Y.

If you are going to start ragging on economists at least get the problem stated correctly. What does that say about journalists. Cannot even read papers.

Fresnodan (05:17am) beat me to the Dilbert quote, everything you need to know right there. (btw, good post salmon)

For a moment I panicked, I really thought I was stupid for picking, say, X=100. But then reading the paper I realized Felix’ post has a mistake. It should not read:

“So here’s the question: at what value of X are you going to have a 95% chance of getting a dot above the axis, in positive territory on the Y axis?”

but instead

“So here’s the question: what is the MINIMUM value of X for which are you going to have a 95% chance of getting a dot above the axis, in positive territory on the Y axis?”

Phew.

Some of the comments here are an excellent illustration of the paper’s claims.

Not meaning to pick on these ones in particular, but those wanting the question to have been the “minimum value of X” instead of the “value of X”: if you pick X=100, there is a much higher than 95% chance that Y will be above the axis. The question is clearly implying that you should aim for exactly a 95% chance – the rationale for this is explained very well in the econometric version of the problem, but it shouldn’t have to be stated.

C.f. Tversky and Kahneman, “The Law of Small Numbers”, also Thinking Fast and Slow generally.

In Excel:

=(TINV(0.1,999)*29-0.32)/1.001

47.38

What do I win?

Bwickes and Nic22: the words “a 95% chance” do not mean “a 95% or greater chance.” You are inferring “or greater” because typically a researcher is looking for statistical significance, defined at 95% or greater. But in this case, we are trying to find the point at which p 95% of future values will be above zero, and the written discription advises that the desired answer is exactly the 95% point because “increasing X is costly.”

Yes, intelligibility could have been improved by inserting word “precisely.” But your reading is not the correct reading; it’s just an understandable mistake.

Blox, as soon as one mentions that increasing X is costly, than you are totally right. However I was stuck on Felix’ two first paragraphs, the ones I am quoting, where there is no mention of a cost of increasing X. That’s why I freaked out when I read in the second paragraph that the correct answer was 47. My ego took a big hit… turns out I should have gone straight to the statement from the regression analysis which was much clearer!

@Nic22 I had no problems understanding it was a minimum, the wording of the question was clear enough. If you needed more detail, then it may be that you’d be in the group that preferred the detail to an overview. i always go for the overview first; details are for later.

“Next time you see a piece of pop-science talking about interesting findings from some paper or other, bear this in mind”

Quite ironic in that this statement seems to be applicable to the very article in question.

Is there a link to the original paper? I want to see the exact wording of the question. I am willing to bet much of the “mistakes” came from poor wording (e.g. the Economist who said “10″ might have actually mis-interepted the question as “what value of X will ensure that the probability of the MEAN will be above 0″)?

@RuetheDay

not quite: t distribution should have 998 degrees of freedom, and you need to include the uncertainty associated with predicting the mean of the distribution (which would be standard error times square root of 1/n + (x*-mean)^2/variance of x * something… can’t remmeber). shouldn’t make much of a difference.

@sisyphus, here’s the actual question:

What would be the minimum value of X that an individual would need to make sure that s/he obtains a positive outcome (Y > 0) with 95% probability?

thanks Felix – is there a link to the paper?

it would be very interesting to see the distribution of answers for each question (e.g. if the distribution of answers is clearly bimodal, then it might indicate a misinterpretation of the question)

To me, this underscores the importance of presenting results (to the public and to oneself) in a manner that reflects the underlying assumptions of the model and the underlying point of the model’s results. It takes time, and cognitive resources, to go from reading someone’s results to understanding patterns in someone’s data. The process can be made more efficient by using graphs or even having a mental checklist – did I think about what this effect size means? Did I think about variance in the model? The quantified error? In the end, though, the process simply shouldn’t be rushed, and if you do rush, you should have a certain skepticism about the quality of your interpretations.

This is similar to what Steven Levitt talked about in his book “Freakanomics”, where criminologists thought that better policing and increase number of officers were the main contributor to the fall of crime rate, but they didn’t look deeper and figure out that it was because of the legalization of abortion.

These results do not surprise me when you see a lot of the posts professional economists make online. They are as a whole bright, but not exactly world beaters.

Now you why better engineers look down on liberal arts types. We want facts not guess. Statistics is the core of economic data for decisions, but for got it.

I worked with engineers who got rid of their text books long ago and wanted to get by with smile. Lots of them said “You cannot do that” (calculate the results).

Of cause when money is involved statistics can be rigged.

Hello,

We are Emre and Robin, the authors of the study featured in this post.

We would like thank Mr. Salmon for his insightful discussion of the study and all the commentators for their remarks.

This study is now a discussion paper in International Journal of Forecasting. Hence, we would like to contribute to the ongoing discussion by posting links to the comments made on the study and our reply to those comments.

Due to copyright issues, we cannot share freely the journal versions, but can put links to the last working papers.

Here is the study:

http://emresoyer.com/Publications_files/ Soyer%20%26%20Hogarth_2012.pdf

Here is the comment paper by Scott Armstrong:

https://marketing.wharton.upenn.edu/file s/?whdmsaction=public:main.file&fileID=1 929

Here is the comment paper by Stephen Ziliak:

http://papers.ssrn.com/sol3/papers.cfm?a bstract_id=2104279

Here is the comment paper by Nassim Taleb and Daniel Goldstein:

http://papers.ssrn.com/sol3/papers.cfm?a bstract_id=1941792

Here is the comment paper by Keith Ord:

http://papers.ssrn.com/sol3/papers.cfm?a bstract_id=2016195

Here is our reply to the comments:

http://emresoyer.com/Publications_files/ Response_by_Soyer_Hogarth_2012.pdf

Thank you once more for the engaging discussion.

Best wishes,

Emre and Robin

Generalizing this study to doctors and/or other social scientists is bad social science. In my opinion, the issue isn’t just the misuse of statistics but the misuse of the scientific method.

I’ve just had a look at the paper, and it does not seem to me that 47 is the correct answer, based on the statistics presented in the paper. The reason the authors obtain 47 is that they use 1.645 as the z value for the 95% confidence interval. However, the correct value is 1.96 (cf. http://en.wikipedia.org/wiki/Standard_de viation#Rules_for_normally_distributed_d ata ). This means that the correct value for X would be 56.

Unless there’s something I’ve missed?

@Scartaris: Since the goal is to get the dot above the x-axis, they are are using a one-sided interval. The 95th percentile is at 1.645, so that’s the critical value in this case. (For a two-sided interval 1.645 corresponds to a 90% confidence interval. That makes sense: 90% removes 5% from each tail, and so the upper bound is at the 95th percentile.)

Ah, I get it now, thanks!