No matter how we make measurements, there will be variation (a spread of data). Take 100 people and ask them to guess your age and you will get a range of results: some will be too low (excellent!), some too high (not so good!).
It is the same with any of our laboratory experiments – if we pipette liquid 100 times we won’t dispense exactly the same amount of fluid each time. And, in flow cytometry, if we measure the same cell 100 times we don’t get exactly the same result each time: there will be some data spread. Some of this is due to the sample itself, some due to the fluorochromes we use and some due to the cytometers and their capture and quantitation of the signal.
But we know that variability exists in our flow data and we know we have to deal with it, so we always want to try to quantitate the amount of spread. This is important because the greater the spread, the harder it will be to say that distributions are significantly different.
In our flow cytometry measurements we want to keep variability to a minimum which we can do by careful experimental planning, thinking about reagents, titration of antibodies and making sure we only measure what we want by excluding dead cells and clumps.
But we are always dealing with nature – even if we are just looking at CD4 positive cells in peripheral blood, not all cells will have exactly the same level – meaning within an individual sample and between multiple patients there will always be a spread. It follows therefore that we can look at data spread within our individual samples or within a series – here we will look at assessing spread in individual samples.
Luckily we can do this in statistical terms with several different metrics.
The range
The range is the difference between the highest and lowest values. Sometimes we won’t look at the whole range but a subset – the interquartile range. This is the area where the middle 50% of the data lies. These metrics can be useful in some situations (where there is a non-normal distribution for example) but they do not help us much when comparing between distributions.
The spread of data
It is better to look at the spread or variation around the mean of a data set and we do this by two measurements:
1. Standard Deviation
The standard deviation (SD) is defined as the average distance of each point from the mean, or mathematically:
Where ? = the sum of, x = each value in the population, m= the mean of the values, n = the sample size.
Although this is useful it is harder to compare results between experiments or, in flow terms, samples that have been run on different machines which may have different levels of data resolution. This is because even if distributions have the same spread the absolute value will change depending on where the mean is.
To get round this we can use another metric:
2. Coefficient of Variation
The coefficient of variation (CV) or coefficient of variance is defined as:
(SD/m) × 100
As CV is expressed as a percentage it is unitless and dimensionless. So this is what we generally use when we want to compare results over time, between machines or between sites.
In practical terms, the lower the number the less the variation there is. To think why CV is generally a better measure for comparing across platforms, consider two distributions that have the same relative spread of about 10%. If the mean of one is 100, it will spread between 90 and 100, but if the mean is 1000, it will spread between 900 and 1100. Different SDs but same CV.
So where do we uses these metrics in our cytometry experiments?
Quality Control
We look at CV when we running Quality Control tests – we check alignment of our cytometers using single peak beads which are very uniform and will have low intrinsic variation and therefore a low CV – we can monitor this and it tells us that our cytometer is performing optimally.
DNA analysis
To be able to best tell cells in G1 from those in S we need the CV of the G1 peak to be as low as possible. We have seen how this can be done in a previous Bitesize Bio article and Figure 1 shows the difference between a sample with low CV (left) and high CV (right).
Figure 1. Differences in CVs between samples. a) Sample with low CV; and b) sample with high CV.
Cell Proliferation
When using the dye dilution technique to be able to tell generations apart we need a low CV and the value of this will give us confidence (or not!) in the results.
Knowledge of these relatively simple statistical metrics will help you assess your experiments and form the basis of further tests that you can apply it to assess the significance of your results.
Welcome to the magical world of systematics! Looking for a way to produce a phylogenetic tree that’s a step above the default options, time efficient, not too program heavy and avoids using command line programs? Although there are more rigorous analyses that strict systematists perform, for your purposes, the following should suffice. 1. Data selection…
I have worked in flow cytometry for a number of years. I’m still annoyed that many myths and imprecisions are perpetrated and perpetuated. Here is my non-exhaustive list of cytometry-related beliefs that send flow cytometrists screaming from the room or at least, being English, make me tut sadly. Forward Scatter Equals Cell Size No No…
After designing a multicolor flow cytometry panel and securing the necessary cells and reagents, the process of optimization of the panel can begin. The first step in that optimization is titration of your antibodies. In this process, following a standard protocol to be used in the final analysis, you stain a known amount of cells with…
Biomarkers are fundamental in bioscience, from basic microbiological experiments to clinical studies. This article explains the different types of biomarkers and their application in cancer and disease research.
What is one of the first things you do when you sit down at the flow cytometer and start looking at your cells? You start drawing polygons and setting gates. To the neophyte the gating process can look a little random – why do you exclude those dots but not these? But gating in flow…
Next gen sequencing is a powerful technique, one that now lies at the heart of many scientific projects. This power comes with some special challenges, however, and by recognizing them you can ensure that your NGS results are robust. No one wants to publish findings that other scientists fail to replicate, but unfortunately it happens…
10 Things Every Molecular Biologist Should Know
The eBook with top tips from our Researcher community.