Data Spread and How to Measure It: the Coefficient of Variation (CV)

No matter how we make measurements, there will be variation (a spread of data). Take 100 people and ask them to guess your age and you will get a range of results: some will be too low (excellent!), some too high (not so good!). It is the same with any of our laboratory experiments –…

Derek gained his Bachelor’s Degree in Animal Physiology and Nutrition from the University of Leeds.

No matter how we make measurements, there will be variation (a spread of data). Take 100 people and ask them to guess your age and you will get a range of results: some will be too low (excellent!), some too high (not so good!).
It is the same with any of our laboratory experiments – if we pipette liquid 100 times we won’t dispense exactly the same amount of fluid each time. And, in flow cytometry, if we measure the same cell 100 times we don’t get exactly the same result each time: there will be some data spread. Some of this is due to the sample itself, some due to the fluorochromes we use and some due to the cytometers and their capture and quantitation of the signal.
But we know that variability exists in our flow data and we know we have to deal with it, so we always want to try to quantitate the amount of spread. This is important because the greater the spread, the harder it will be to say that distributions are significantly different.
In our flow cytometry measurements we want to keep variability to a minimum which we can do by careful experimental planning, thinking about reagents, titration of antibodies and making sure we only measure what we want by excluding dead cells and clumps.
But we are always dealing with nature – even if we are just looking at CD4 positive cells in peripheral blood, not all cells will have exactly the same level – meaning within an individual sample and between multiple patients there will always be a spread. It follows therefore that we can look at data spread within our individual samples or within a series – here we will look at assessing spread in individual samples.
Luckily we can do this in statistical terms with several different metrics.

The range

The range is the difference between the highest and lowest values. Sometimes we won’t look at the whole range but a subset – the interquartile range. This is the area where the middle 50% of the data lies. These metrics can be useful in some situations (where there is a non-normal distribution for example) but they do not help us much when comparing between distributions.

The spread of data

It is better to look at the spread or variation around the mean of a data set and we do this by two measurements:

1. Standard Deviation

The standard deviation (SD) is defined as the average distance of each point from the mean, or mathematically:

Where ? = the sum of, x = each value in the population, m= the mean of the values, n = the sample size.

Although this is useful it is harder to compare results between experiments or, in flow terms, samples that have been run on different machines which may have different levels of data resolution. This is because even if distributions have the same spread the absolute value will change depending on where the mean is.
To get round this we can use another metric:

2. Coefficient of Variation

The coefficient of variation (CV) or coefficient of variance is defined as:

(SD/m) × 100

As CV is expressed as a percentage it is unitless and dimensionless. So this is what we generally use when we want to compare results over time, between machines or between sites.
In practical terms, the lower the number the less the variation there is. To think why CV is generally a better measure for comparing across platforms, consider two distributions that have the same relative spread of about 10%. If the mean of one is 100, it will spread between 90 and 100, but if the mean is 1000, it will spread between 900 and 1100. Different SDs but same CV.
So where do we uses these metrics in our cytometry experiments?

Quality Control

We look at CV when we running Quality Control tests – we check alignment of our cytometers using single peak beads which are very uniform and will have low intrinsic variation and therefore a low CV – we can monitor this and it tells us that our cytometer is performing optimally.

DNA analysis

To be able to best tell cells in G1 from those in S we need the CV of the G1 peak to be as low as possible. We have seen how this can be done in a previous Bitesize Bio article and Figure 1 shows the difference between a sample with low CV (left) and high CV (right).
Figure 1. Differences in CVs between samples. a) Sample with low CV; and b) sample with high CV.

Cell Proliferation

When using the dye dilution technique to be able to tell generations apart we need a low CV and the value of this will give us confidence (or not!) in the results.
Knowledge of these relatively simple statistical metrics will help you assess your experiments and form the basis of further tests that you can apply it to assess the significance of your results.

If you have sorted samples or phenotyped cells by surface expression of proteins, you’ve probably wondered how each cell is sorted or phenotyped in a flow cytometer? This question seems trivial, but in reality it took a while for engineers to figure it out. Before I get into today’s topic on “hydrodynamic focusing,” I’ll walk…

Say you just joined a lab and have been assigned your very own project to work on. As part of your new responsibilities, you have to breed and maintain the mutant (or transgenic) mouse line which you will be using for your experiments. An integral part of mouse genetics experiments is determining the genotype of…

Long before “Alexa” was a household name, Alexa dyes were an established series of fluorescent dyes. The inventor Richard Paul Haugland named the dyes after his son Alex. Originally a trademark of Molecular Probes, the Alexa family is now a part of Thermo Fisher Scientific. Alexa dyes are frequently used as labels in fluorescence microscopy,…

After designing a multicolor flow cytometry panel and securing the necessary cells and reagents, the process of optimization of the panel can begin. The first step in that optimization is titration of your antibodies. In this process, following a standard protocol to be used in the final analysis, you stain a known amount of cells with…

In real life, cells are instructed to commit suicide for the greater good of the organism. The programmed cell death (apoptosis) is important during development of a multi-cellular organism. A good example you will appreciate is the dis-appreance of the tail from a tadpole as it turns into a frog. On the reverse, the lack…

It strikes fear into the hearts of new cytometrists. Compensation. More fights have started over the proper way to compensate at meetings than anything else. This article will strive to shed some light on the principles of compensation, and equip you with the tools necessary to achieve compensation mastery for your research experiments. Compensation is…

10 Things Every Molecular Biologist Should Know

The eBook with top tips from our Researcher community.