Who of us is not somehow familiar with the following situation? A student complains: “I have been working so hard to collect all this data, and still I am not getting any results” [meaning all tests of hypotheses turn out “n.s.”], and then the more senior scientists are trying to help with advice on what alternative ways of analysis could help the student out of this misery.
Sure, I can see the good intent from both sides, but in fact such an approach may render all the data uninterpretable. Apart from (rare) discoveries, most of biological research is an attempt of quantification (to what extent is y affected by x?). Yet, to find out the magnitude of such effects, we would need to ensure that our estimate of the effect size has not been biased in a systematic way. Problematically, with limited sample size only large effects will reach statistical significance, so any desire for reaching significance (because positive results may be more interesting and easier to publish) threatens our objectivity and the utility of our results. If we face an arbitrary decision that needs to be taken during data analysis (e.g. whether or not to control for a covariate), and one variant leads to p=0.10 for the main hypothesis we wanted to test while the other variant yields p=0.04, then any preference for presenting the latter results in inflated effect size estimates and makes the effort of quantification completely pointless.
The consequences of potential inflation of effect sizes may be quite dramatic: for some areas of research, despite of decades of research effort, I can still not make up my mind whether a certain effect is real or whether all positive reports resulted only from chance combined with non-blind decision making by the researchers. Hence, what we need is a much clearer separation (in our minds and our publications) between findings that arise from data exploration versus the results of rigorous confirmatory tests of a priori hypotheses. Only the latter tests yield effect size estimates that are not biased away from zero. Efforts of quantification are wasted as soon as we allow any post-hoc decision making that is based on significance of the effect of interest.
The solution: pre-registration
So here is what I did together with two students of mine in order to eliminate subjectivity as much as possible from our research project. Before starting data collection, we specified in a written document the main hypothesis that we were going to test, all variables that we were going to measure, the intended sample sizes and specific rules for terminating data collection, the intended analyses with specified fixed and random effects, as well as rules for transforming variables and for adding or removing covariates. Effectively, we made all the decisions blind to how they would later affect the p-value for the main hypothesis test. This process, involving discussions and writing the document, took us 2-3 days. We then uploaded the document to the Preregistration site (https://osf.io/prereg/) at the Open Science Framework (OSF; https://osf.io/), where it will remain as a permanent record of the aims and intended methods of our study. We initially kept it private (only visible to us), which of course was unnecessary because fears of getting scooped are rarely justified in our field of research.
The benefits of pre-registration
Having to write such a plan was extremely useful both for me and my students. It forced us to plan really carefully and to think through how the data collected should actually be analyzed. Also, formulating our hypotheses in writing, made us realize to what extent our experiment will be able to distinguish between alternative explanations. This exercise is not only educational for the student, but it also forces the supervisor to make sure that the project is both feasible and worth the effort, and that methods of data collection will lead to data tables that can be analyzed sensibly.
Jumping these hurdles before investing heavily in data collection makes a lot of sense. It helps with mastering an often difficult process, namely progressing from data analysis to writing the manuscript. For the Methods section, we could simply copy large parts from the pre-registration text, and the Results section can be incredibly short, because there is just this one major test being carried out. The analysis is obvious and straightforward because it is all already designed. This meant that half of the manuscript was written by us in almost no time. While following a strict plan makes data analysis really fast, the pre-registration does not necessarily tie your hands to still explore your dataset in whichever direction the data may lead you. What it really does is clarifies for the readership which part of the Results section refers to ‘rigorous a-priori hypothesis testing’ and which part is data exploration combined with post-hoc making sense. These latter explorations should belong to the Discussion section of a paper and not to the Introduction (“we hypothesized that we would find what we found”).
Pre-registration not only makes papers easier to write, it also should make them easier to get published. Without the freedom of ‘post-hoc hypothesizing’ and ‘squeezing-out significance’, most pre-registered studies may simply have to report a non-significant effect, but at least the outcome is maximally objective. One also feels an obligation to make the result available to the scientific community (not being guilty of causing the ‘file-drawer problem’). With the argument of maximal objectivity in hands, I think editors have a hard time rejecting such studies as long as you are not aiming too high. Anyway, the impact factor of the journal you are trying to publish in might be less important than you think. What if at some point funding agencies realize how they have been wasting money on research that looks successful but is not objective, and will begin to ask you for evidence of objectivity of your research? How many of your publications are reports of null-findings, how many of them are pre-registered studies? At least I hope that this will begin to count eventually.
The costs of pre-registration
So where are the drawbacks of pre-registration, apart from the time investment at an early stage? Well, for sure, it ties your hands in terms of p-hacking: a ‘failed’ experiment remains a ‘failed’ experiment, so you have less freedom to sell results as a ‘success story’. In the long run, p-hacking will backfire anyway, so the lost freedom isn’t really a drawback. Yet I can imagine situations where a pre-registration might make it harder to sell an unforeseen outcome in a sexy way. This is of course easier when you allow for hindsight bias, which makes everything look so obvious and clear.
The future of pre-registration
So should we only do pre-registered studies from now on? No, I really do not think so. Especially when someone is not yet very familiar with their study system, exploratory research and flexible data analysis makes a lot of sense. As long as we remain aware that many (maybe most) findings based on post-hoc data exploration will be false-positive results, I don’t see a problem with this approach. It is, after all, part of the process of scientific exploration. However, once there is a clear and strong finding, there comes to time for a proper verification experiment. For that to be convincing to others, I think it should be pre-registered. You can also pre-register a non-experimental study, as long as there is a clear hypothesis, and it is also possible for studies where the necessary data already exists, as long as you can plausibly argue that you have not yet inspected even parts of the data for the question of interest.
Pre-registration is also an educational experience in how simple p-hacking is. It makes you realize how often we use the same data twice, once to derive a hypothesis from it (the hypothesis that corresponds to what happened to reach p<0.05) and a second time to verify exactly that hypothesis with exactly that data. It’s such a cheap trick, and we have been using it for decades! Further, pre-registration helps us to realize that when arbitrary decisions are to be made, objectivity flies right out of the window as soon as we know how the decision affects the key p-value. This insight can also be helpful during data exploration: keep yourself blinded, or blind someone else to the outcome and let them make the arbitrary decisions for you. Maybe one day in the future we’ll see thesis requirements include the need for at least one pre-registered study. But for now, I recommend that all scientists should try it, at least once; you might be surprised by how informative it is—and how well you can predict the outcome of your hypothesis test.
Forstmeier W, Wagenmakers E-J, Parker TH (2017) Detecting and avoiding likely false-positive findings – A practical guide. Biological Reviews 92, 1941-1968.
Wolfgang Forstmeier is a Principal Investigator in the Department of Behavioural Ecology and Evolutionary Genetics at the Max Planck Institute for Ornithology