Skip to content

Time for T: How to Use the Student’s T-test

Posted in: Lab Statistics & Math
Time for T: How to Use the Student’s T-test

To pull together our discussions so far on hypothesis testing and p-values, we will use the t distribution as an example to see how it all works. The t distribution (you may have heard it called Student’s t) is a probability distribution that looks like a bell-shaped curve (or normal distribution).

If we sample repeatedly from a population in which the null hypothesis is true, the t distribution shows the long-run probabilities of various t values occurring.

But what is a t value?

We calculate a t statistic from our data set. Here’s the calculation for testing a sample mean:

Time for T: How to Use the Student’s T-test

If the null hypothesis is true, the sample mean would likely be close to the hypothesised value (e.g. the sample mean could equal 52, close to the hypothesised mean of 50). This would leave us with a numerator (above the line) close to zero, which in turn gives a t statistic close to zero.

Time for T: How to Use the Student’s T-test

Whereas, if our sample mean is further away from the hypothesised mean (e.g. 63) the resulting t statistic would be larger.

Time for T: How to Use the Student’s T-test

We then look to see where our calculated t statistic lies on the t distribution. Because it is a bell-shaped curve, the data clusters about the mean. Values further away from the mean (i.e. toward the tails of the distribution) are not impossible if the null hypothesis is true, but they are unlikely.

Statistical tables for the t distribution are readily available online and in textbooks. They give us critical values for the t distribution at various levels of significance. For instance, here is the alpha = 0.05 table we looked at when discussing degrees of freedom.

Time for T: How to Use the Student’s T-test

The underlying distribution is identical in the various tables; they vary only in what percentage of the distribution is being shown. Our table tells us, for a given degree of freedom, what value does 5% of the distribution lie beyond. For example, when df = 5, the critical value is 2.57. That means 5% of the data lies beyond 2.57 – so if our calculated t statistic is equal to or greater than 2.57, we can reject our null hypothesis.

At this stage we tie back into p-values. Often p-values are presented as some magical number, with researchers perhaps unsure where they actually came from. Here’s the secret: there’s nothing magic about them. As we’ve already discussed, p-values tell us the probability of obtaining our t statistic, or one more extreme, given the null hypothesis is true. That is, what area of the t distribution lies beyond our calculated t statistic?

Time for T: How to Use the Student’s T-test

We’ve already worked out that for 5 degrees of freedom, the critical t value is 2.57; 5% of the distribution lies to the right of the line marking 2.57. As shown above, if our sample mean was 63 we get a calculated t statistic of 2.60. The area to the right of this line gives us our p-value; the probability of getting this or more extreme, i.e. what area of the distribution lies to the right of 2.60. In this case, the answer is 2% of the distribution, giving us a p-value of 0.02. This is that magic number that your statistics software spits out; hopefully it doesn’t seem so magical now.

Next up: using all this information to start working through some statistical examples! In the meantime, post your comments below on how you’re finding the stats series so far and what you’d like me to write about next.

Share this to your network:
Image Credit: Daeveb

4 Comments

  1. alromana on May 10, 2012 at 12:41 pm

    thank you for your excellent articles on statictical issues which can be easily digested by biologists, even without sufficient knowledge of Mathematics. I wonder why you have stopped the series?! Hopefully, it is just a temporarily break.

    I would also like to know the very basic stuff in descriptive statistics., i.e. in which cases we must use mean with standard error and when it is more preferential to use median and confidence interval to present the data?

    • Sarah-Jane O'Connor on October 7, 2012 at 6:30 am

      Hi,
      My deepest apologies for not replying – I only just saw your comment right now. I’ve recently submitted my PhD, which is why the series took such an abrupt break.
      BUT, I’m back, and I’ve noted your comments down – I’m ready to address them!

  2. patientgrl on January 25, 2012 at 4:04 pm

    Ooooo, ooooo! Can you explain when to use separate variance t-tests as opposed to pooled variance t-tests? What about one-tailed vs. two-tailed? Thanks.

    • Sarah-Jane O'Connor on October 7, 2012 at 6:30 am

      On the cards! Thanks for reading and being so interested.

Leave a Comment

You must be logged in to post a comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll To Top