Friday 26 July 2013

Why we need pre-registration


There has been a chorus of disapproval this week at the suggestion that researchers should 'pre-register' their studies with journals and spell out in advance the methods and analyses that they plan to do. Those who wish to follow the debate should look at this critique by Sophie Scott, with associated comments, and the responses to it collated here by Pete Etchells. They should also read the explanation of the pre-registration proposals and FAQ  by Chris Chambers - something that many participants in the debate appear not to have done.

Quite simply, pre-registration is designed to tackle two problems in scientific publishing:
  • Bias against publication of null results
  • A failure to distinguish hypothesis-generating (exploratory) from hypothesis-testing analyses
Either of these alone is bad for science: the combined effect of both of them is catastrophic, and has led to a situation where research is failing to do its job in terms of providing credible answers to scientific questions.

Null results

Let's start with the bias against null results. Much has been written about this, including by me. But the heavy guns in the argument have been wielded by Ben Goldacre, who has pointed out that, in the clinical trials field, if we only see the positive findings, then we get a completely distorted view of what works, and as a result, people may die. In my field of psychology, the stakes are not normally as high, but the fact remains that there can be massive distortion in our perception of evidence.

Pre-registration would fix this by guaranteeing publication of a paper regardless of how the results turn out. In fact, there is another, less bureaucratic, way the null result problem could be fixed, and that would be by having reviewers decide on a paper's publishability solely on the basis of the introduction and methods. But that would not fix the second problem.

Blurring the boundaries between exploratory and hypothesis-testing analyses

A big problem is that nearly all data analysis is presented as if it is hypothesis-testing when in fact much of it is exploratory.

In an exploratory analysis, you take a dataset and look at it flexibly to see what's there. Like many scientists, I love exploratory analyses, because you don't know what you will find, and it can be important and exciting. I suspect it is also something that you get better at as you get more experienced, and more able to see the possibilities in the numbers. But my love of exploratory analyses is coupled with a nervousness. With an exploratory analysis, whatever you find, you can never be sure it wasn't just a chance result. Perhaps I was lucky in having this brought home to me early in my career, when I had an alphabetically ordered list of stroke patients I was planning to study, and I happened to notice that those with names in the first half of the alphabet  had left hemisphere lesions and those with names in the second half had right hemisphere lesions. I even did a chi square test and found it was highly significant. Clearly this was nonsense, and just one of those spurious things that can turn up by chance.

These days it is easy to see how often meaningless 'significant' results occur by running analyses on simulated data - see this blogpost for instance. In my view, all statistics classes should include such exercises.

So you've done your exploratory analysis, got an exciting finding, but are nervous as to whether it is real. What do you do? The answer is you need a confirmatory study. In the field of genetics, failure to realise this led to several years of stasis, cogently described by Flint et al (2010). Genetics really highlights the problem, because of the huge numbers of possible analyses that can be conducted. What was quickly learned was that most exciting effects don't replicate. The bar has accordingly been set much higher, and most genetics journals won't consider publishing a genetic association unless replication has been demonstrated (Munafo & Flint, 2011). This is tough, but it has meant that we can now place confidence in genetics results. (It also has had a positive side-effect of encouraging more collaboration between research groups). Unfortunately, those outside the field of genetics are unaware of these developments, and we are seeing increasing numbers of genetic association studies being published in the neuroscience literature, with tiny samples and no replication.

The important point to grasp is that the meaning of a p-value is completely different if it emerges when testing an a priori prediction, compared with when it is found in the course of conducting numerous analyses of a dataset. Here, for instance, are outputs from 15 runs of a 4-way Anova on random data, as described here:
Each row shows p-value for outputs (main effects then interactions) for one run of 4-way Anova on new set of random data. For a slightly more legible version see here

If I approached a dataset specifically testing the hypothesis that there would be an interaction between group and task, then the chance of a p-value of .05 or less would be 1 in 20  (as can be confirmed by repeating the simulation thousands of times - in a small number of runs it's less easy to see). But if I just looked for significant findings, it's not hard to find something on most of these runs. An exploratory analysis is not without value, but its value is in generating hypotheses that can then be tested in an a priori design.

So replication is needed to deal with the uncertainties around exploratory analysis. How does pre-registration fit in the picture? Quite simply, it makes explicit the distinction between hypothesis-generating (exploratory) and hypothesis-testing research, which is currently completely blurred. As in the example above, if you tell me in advance what hypothesis you are testing, then I can place confidence in the uncorrected statistical probabilities associated with the predicted effects.  If you haven't predicted anything in advance, then I can't.

This doesn't mean that the results from exploratory analyses are necessarily uninteresting, untrue, or unpublishable, but it does mean we should interpret them as what they are: hypothesis-generating rather than hypothesis-testing.

I'm not surprised at the outcry against pre-registration. This is mega. It would require most of us to change our behaviour radically. It would turn on its head the criteria used to evaluate findings: well-conducted replication studies, currently often unpublishable,  would be seen as important, regardless of their results. On the other hand, it would no longer be possible to report exploratory analyses as if they are hypothesis-testing. In my view, unless we do this we will continue to waste time and precious research funding chasing illusory truths.

References

Flint, J., Greenspan, R. J., & Kendler, K. S. (2010). How Genes Influence Behavior: Oxford University press.

Munafo, M, & Flint, J. (2011). Dissecting the genetic architecture of human personality Trends in Cognitive Sciences, 15 (9), 395-400 DOI: 10.1016/j.tics.2011.07.007