John Ioannidis wrote an article in Chance magazine a couple years ago with the provocative title Why Most Published Research Findings are False. Are published results really that bad? If so, what’s going wrong?
Whether most published results are false depends on context, but a large percentage of published results are indeed false. Ioannidis published a report in JAMA looking at some of the most highly-cited studies from the most prestigious journals. Of the studies he considered, 32% were found to have either incorrect or exaggerated results. Of those studies with a 0.05 p-value, 74% were incorrect.
The underlying causes of the high false-positive rate are subtle, but one problem is the pervasive use of p-values as measures of evidence.
Folklore has it that a p-value is the probability that a study’s conclusion is wrong, and so a 0.05 p-value would mean the researcher should be 95 percent sure that the results are correct. In this case, folklore is absolutely wrong. And yet most journals accept a p-value of 0.05 or smaller as sufficient evidence.
Here’s an example that shows how p-values can be misleading. Suppose you have 1,000 totally ineffective drugs to test. About 1 out of every 20 trials will produce a p-value of 0.05 or smaller by chance, so about 50 trials out of the 1,000 will have a significant result, and only those studies will publish their results. The error rate in the lab was indeed 5%, but the error rate in the literature coming out of the lab is 100 percent!
The example above is exaggerated, but look at the JAMA study results again. In a sample of real medical experiments, 32% of those with significant results were wrong. And among those that just barely showed significance, 74% were wrong.
See Jim Berger’s criticisms of p-values for more technical depth.