Wednesday, March 7, 2012

About the cult of statistical significance

A large part of economic research is devoted to empirical studies, and the name of the game there is statistical significance. Once you get an interesting effect being significant, it becomes a study worth writing, and possibly publishing. If you cannot find an effect, then try another specification until it is statistically significant. Nobody will know how much you tried, the one that is significant is all that counts. If one cannot find a statistically significant result, a null result, publishing it becomes really difficult. This game of finding statistical significance is unfortunately misleading, as this hunt dominates theory or even common sense when choosing specifications, and often completely neglects economic significance. What if a statistically significant results is tiny even though it is precise? And what about a large effect that is statistically weak?

I am certainly guilty as well of confusing statistical and economic significance, including on this blog. Indeed, it is often difficult even understanding what the size of the effect is, because the specification does not allow one to relate the effect to something tangible.

The reason I am mentioning all this is that I came across a recent paper by Walter Krämer, who tries to revise the argument made by Stephen Ziliak and Deirdre McCloskey that any statistical significance is useless. While he seems to concur on the general abuse of statistical significance, he claims it can still be useful under some circumstances. And these are when you do exploratory testing, as it allows to discard unviable hypotheses or specifications. But one has to remember, even though this is elementary statistics yet so often ignored, one can only reject or not reject, but never accept a hypothesis. So even the exploratory testing has its limitations.

1 comment:

Carlos Cinelli said...

To be fair, you actually could say that you accept a hypothesis if you had put your problem in a real decision theory framework (wich Neyman-Pearson theory is, with a "0-1" loss function).

So, if you had studied your problem, thinking about the power of the test, thinking about the seriousness of both kind of erros you could say: "hey, with the evidence we have, considering the power and loss function, it's ok to accept that this coeficient is zero".

But I'm still looking for an applied econometric paper that has even sketched something like this.