Fake data have been in the news lately.
First there was Michael LaCour, the grad student who claimed to have done a study last year on political persuasion and attitudes toward gay rights. No voters were harmed in this nonexistent survey, but LaCour was successful in persuading leading researchers in political science and a top scientific journal that he’d made a great discovery. Only months later was the fabrication discovered (see the report at Retraction Watch), leading to a major scandal in political science.
Then came John Bohannon, a scientist/journalist who did a real study (a diet-and-health experiment in which he randomized some participants and fed some of them chocolate) but followed it up with an intentionally bad analysis of the sort parodied by Randall Munroe, trying all sorts of manipulations on his data until he succeeded in finding “statistical significance.”
Bohannon’s goal was to demonstrate the credulity of news outlets by performing this ridiculous study, getting it published somewhere (not a problem given the huge number of scientific journals out there), and then seeing if it got media attention. Which it did.
Some people pointed out that Bohannon’s story had a bit of selection bias: the journalists who fell for his hoax wrote about it uncritically, while those with more skepticism did not bother to cover it. So the appearance of his article in some news outlets does not mean that all, or most, or even many science reporters are fools.
True enough, but this is one of the problems of statistics reporting: There’s a bias toward uncritical, gee-whiz, press-release-replicating stories because the skeptics don’t want to waste their time on the bad stuff.
Finally, New York Times columnist David Brooks, who loves statistics so much he doesn’t bother checking their accuracy, was again challenged, this time by David Zweig in Salon magazine. Brooks is somewhat unusual as a journalist in that sometimes he seems to go to the trouble of distorting numbers himself rather than merely reporting dodgy press releases, but I see his attitude toward facts as part of the same big picture: authoritative-seeming but false claims, coming from major newspapers, scientific journals, and other sources that we would like to trust.
Some would say that all the above stories show that the system functions well. The exposure of Bohannon’s hoax received much more news coverage than his original joke study about chocolate and weight loss, and nobody takes David Brooks seriously anyway.
Here’s what my colleague Gur Huberman wrote regarding the LaCour story, the one that faked its science about changing attitudes toward gay rights:
The system actually worked. First, there’s a peer-reviewed report in Science. Then other people deem the results (or, rather, the methodology) sufficiently powerful that they imitate it to answer a related question of their own. These other people’s apparently same method fails to produce a similar response rate. These other people inform the senior author of the original study that their response rate was far lower than the one he had reported in Science. The senior author requests an explanation from his partner who actually was in touch with the data collecting firm and was in possession of the raw data. The senior author fails to receive an adequate and timely explanation from his partner. The senior author requests that Science retract the article. Only a few months elapsed between publication and retraction.
My first reaction was: Hey, to say “the system worked” here is like saying that if someone robs a bank, then gets caught six months later, then the bank security system worked. No, it didn’t!
But then I thought more, and it’s not so clear. I don’t think “the system worked.” But the story is a bit more complicated.
It goes like this:
The first point is the rarity of the offense. It’s been nearly 20 years since the last time there was a high-profile report of a social science survey that turned out to be undocumented. I’m referring to the case of John Lott, who said he did a survey on gun use in 1997, but, in the words of Wikipedia, “was unable to produce the data, or any records showing that the survey had been undertaken.” Lott, like LaCour nearly two decades later, mounted an aggressive, if not particularly convincing, defense. (Lott disputes the Wikipedia characterization of his survey.)
We don’t often hear about outright fraud in the social sciences. And so, if the only concern were faked data, I’d agree with Huberman that the system is working: only two high-profile cases that we know about in 20 years, and both fakers were caught. Sure, there’s selection here, there must be other fraudulent surveys that have never been detected—but it’s hard for me to imagine there are a lot of these out there. So, fine, all’s OK.
But faked data and other forms of outright fraud are not the only, or even the most important, concern here. The real problem is all the shoddy studies that nobody would ever think to retract, because the researchers aren’t violating any rules, they’re just doing useless work. I’m thinking of studies of ESP, and beauty-and-sex ratios, and ovulation-and-voting, and himmicanes and hurricanes, and all sorts of other studies that received widespread and often uncritical press coverage.
This is what happens in all these cases:
1. An exciting, counterintuitive claim is published in a good journal, sometimes a top journal, supported by what appears to be strong statistical evidence (in statistics jargon, one or more “p < .05” comparisons).
2. The finding is publicized.
3. Skeptics note problems with the study.
4. The authors dig in and refuse to admit anything was wrong.
The result is a mess. The result is that there’s not much reason to trust the social science research that appears in top scientific journals or that get featured by The New York Times or NPR or the British Psychological Society or other generally respected outlets. And that’s a problem.
I think one reason for all the attention received by LaCour’s study (or, I should say, non-study) was that it’s the most extreme case of a general problem of claims being published without real evidence. Those ovulation-and-clothing researchers and the fat-arms-and-voting researchers didn’t make up their data—but they were making strong claims without good evidence, even while thinking they had good evidence. That’s the way of a lot of today’s published science, and publicized science. And, to the extent that policymakers use science to make political judgments and allocate resources, this is a problem.
Andrew Gelman and Kaiser Fung are statisticians who deal with uncertainty every working day. In Statbusters they critically evaluate data-based claims in the news, and they usually find that the real story is more interesting than the hype.