Pseudoreplication: Don’t Fall For This Simple Statistical Mistake

Now we come to the third part of our trifecta; in the last two posts I have gone over p-values and how they determine significance in null hypothesis testing, and we talked about degrees of freedom and their effect on the p-value. Finally, we come to pseudoreplication: where it can all go terribly wrong.

Replication is crucial in science. If you only do something once it’s near impossible to convince anyone that your result is not just some lucky strike. But as we have already established, higher degrees of freedom = more significant result.

Solution! Raise your degrees of freedom and your results will be significant. (We’ve all thought it at some point, you are not alone). But statisticians, buzz-kills that they are, have made up a word for over-inflating degrees of freedom: pseudoreplication.

As always, an example (easy one first):

Image Larger Volumes with the UltraMicroscope Choros™

From: Miltenyi Biotech

Trust Your Quantification with the DeNovix DS-8X Rapid Eight Channel, 1µL UV-Vis Spectrophotometer

From: DeNovix

A doctor is measuring cholesterol levels in the blood of his male patients. Twenty men are each subjected to two blood tests each.

Pseudoreplication: Don’t Fall For This Simple Statistical Mistake

Pop Quiz: What are the degrees of freedom?

Despite there being 40 blood tests, the sample size is 20 (the number of men tested) so df = 19. Two blood tests taken from the same man cannot be independent of each other, a necessity for true replication. Though they can provide internal consistency, these multiple tests cannot be used as replicates.

Likewise, an ornithologist is measuring average egg size in a bird species. She measures every egg found in 20 nests for a total of 50 eggs.

Pop Quiz: How many degrees of freedom are there?

This one might be a little less evident, but we can only take replication at the level of each birds nest. Remembering the condition of independence: two eggs from the same nest are intricately linked (they have the same parents, the same environmental conditions, etc.), and so we cannot assume that they are independent. Once again we have df = 19.

Pseudoreplication refers to taking the incorrect level of replication. In the above examples, it would mean using the total number of blood tests, or the total number of eggs counted, as the total sample size. Artificially inflating degrees of freedom in this way can lead to spurious results; as we’ve already discussed, higher degrees of freedom lowers the hurdle to achieve significance, possibly leading to an incorrectly significant test.

In the wise words of an academic in my department, at some level everything is pseudoreplicated. After all, we only have one Earth. So don’t drive yourself crazy over it, but do be aware of the effect it can have on your experiments and analyses. Always ask yourself, “Are my replicates independent?” and be prepared to defend yourself to your examiners and/or peer reviewers.

So what do you do if you’ve already pseudoreplicated? First off, I feel for you. So many of us barge headfirst into experiments and don’t realise our replication is not as high as we thought it was. First step: don’t panic. Next, come to terms with the fact that your degrees of freedom won’t be as high as you thought (and my heart does break for you). There are a few post-hoc solutions. First and most simply, you can average back up to the level of independent replicate. For instance, in the human blood experiment you could take an average per individual. This returns your sample size back to number of individuals, but still allows some measure of consistency across the multiple measurements.

The next alternative is to run a nested analysis. This means you use all of your data, but eggs would be “nested” within each nest; so only each individual nest would count towards degrees of freedom. Typically this would be seen in an ANOVA. In the interests of honesty I will point out that after five years I still struggle to run, let alone understand, a nested ANOVA. Between now and the time we get to ANOVAs, I’ll try to work it out. At least there are no pretences that this is easy!

Most of the time you’ll pseudoreplicate by mistake, but do be aware that doing so can call into question the validity of your experiment and analyses. Be aware of this in others’ research too, and be sure to read articles with a critical eye. Have they done what they said they have? Have they over-inflated degrees of freedom, and in doing so have they over stated their results? But always remember, we only have one Earth.

Let us know if you have any comments or questions!

Sarah-Jane O’Connor

Sarah-Jane is an ecologist who sometimes masquerades as a geneticist. Her statistical knowledge is embarassing in some social circles, but revered in others. Which probably just makes it neutral.
She has a PhD in ecology from the University of Canterbury, NZ.