Quantcast
Skip to content

Let’s Talk About Stats: Comparing Two Sets of Data

There are so may statistical tests out there it can be difficult to determine which is the right test to use. Below is a simple diagram to help you quickly determine which test is right for you. Although this is by no means a comprehensive guide, it includes some of the most common tests and situations you will encounter.

Today I will focus on the left side of the diagram and talk about statistical tests for comparing two sets of data.

Slide1

Students T-test

The Students T-test (or t-test for short) is the most commonly used test to determine if two sets of data are significantly different from each other.

A wonderful fact about the Students T-test is the derivation of its name.  Interestingly it was not named because it’s a test used by students (which was my belief for far too many years). In fact, the Students T-test was created by a chemist, William Sealy Gosset, who worked for Guinness (yes, the beer company).   Gosset used the pen name, Student, to prevent other breweries from discovering Guinness’ use of statistics for brewing beer. Who would have thought that statistics and alcohol go so well together?

To perform a t-test your data needs to be continuous, have a normal distribution (or nearly normal) and the variance of the two sets of data needs to be the same (check out last week’s post to understand these terms better).

The t-test comes in both paired and unpaired varieties. In general, most data in biology tends to be unpaired.  If you’re not 100% sure whether your data is paired or not, err on the side of caution and assume it isn’t. You can use an unpaired t-test on paired data without a negative consequence.  However, if you use a paired t-test on unpaired data, you can get a significant result when there is actually no significance, and obtain a Type 1 error.

Mann-Whitney U test

The Mann-Whitney U test, also called Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test, or Wilcoxon–Mann–Whitney , is used for unpaired samples and is a non-parametric test (it makes no assumptions regarding the distribution or similarity of variances).  Therefore it is less powerful than the unpaired t-test but you can rely more on the fact that any significance you find is real.

The Mann-Whitney U test is performed by converting your data into ranks and analyzing the difference between the rank totals, providing a statistic, U. The smaller the U, the less likely differences have occurred by chance. Determining whether something is significant with the Mann-Whitney U test involves the use of different tables that provide a critical value of U for a particular significance level. The critical value varies depending on the significance level chosen as well as the number of participants in each group (which is not required to be equal for this test).

These are the 2 most common tests for analyzing 2 sets of data.  Stay tuned for next week when we add in more datasets and talk about the right hand side of the diagram.

1 Comment

  1. Deep on March 3, 2017 at 5:56 pm

    Hi Laura,

    Thanks for putting all the details in a layman-friendly explanation. I do not know if you still maintain the comment threads, but do you know of any way that I can formally differentiate two sets of data, with a numerical “score” that quantifies the amount of difference? I have read a few articles, and seems like the best bet is KL divergence. What do you think?

Leave a Comment





This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll To Top
Share via
Copy link