Once upon a time, I thought reproducible research meant if someone else showed X in a paper, then I should be able to get X in my experiment. However, this actually refers to replication, an important but separate concept.
Reproducible research is data analysis that starts with the raw data and arrives at the same answers. Do you remember in grade school math class when some students had the wrong answer because the calculator settings were in radians instead of degrees? They had trouble with reproducibility. If you tally up all the settings and functions that go into the data you work with, I’d wager that it would be a long list.
Why You Should Strive for Reproducible Research
If you are like me, most of your experience with data centers on spreadsheets. Perhaps you have become a master at editing the raw output from your favorite plate reader and plotting the graph. The problem is that every mouse click, formula, and choice is poorly annotated, (if at all), in the spreadsheet.
Not only that, but at the end you are likely to copy and paste the cleaned table into another program for your statistics. If you work with large datasets, (pick your favorite ‘omics), then you already know spreadsheets are a poor way to handle this challenge.
How can you, your PI, and your colleagues repeat your analysis a year later? Where did your write down why you picked your choice of statistical test, and which parameters? This is the heart of reproducible research – documenting what you did to your data and why, from the raw information to the final product.
Still not convinced? What is in it for you?
Top Reasons for Making your Research Reproducible
- Ethics – we have a responsibility to show our work to move science forward
- Funding requirements – many funding agencies require the storage of data and the steps performed in the analysis as part of the effort to increase rigor and reproducibility
- Catch your mistakes – when you generate a reproducible data document, you are more likely to catch mistakes. Also, your documented steps allow you to trace back to where you went wrong.
- Others can catch mistakes – it is far better that others, like the reviewers for your manuscript or your mentors, find the flaws in your analysis than for them to be buried while countless other students try to repeat your work.
- Others can learn how to perform the analysis – a reproducible research document can be a powerful teaching tool for future lab members or others around the globe.
- Better study design – when you write down why you performed a test, you realize the rationale, because ‘that’s how we always do this’ doesn’t quite cut it.
Where to Begin?
First, look for reproducible research advocates at your institution and in your field. For example, I am strongly influenced by Susan Holmes and colleagues in microbiome research. Next, start learning how to code your interaction with data. My suggestion, as well as that of others on Bitesize Bio, is to use R.
In my next article I will discuss how to use R and the R markdown library to make a reproducible record of your analysis.Image credit: Marc Smith