- make up some data by pulling random numbers from a normal distribution (with rnorm) or a uniform distribution (with runif).
- plot the data. I tried to use a variety of methods, trying to choose what I would start with when examining this type of data,
- build the model and save it as a variable (skipped on the simpler tests), then
- get useful information out of the model. Remember, if you ever want more information on a function, just run ?function_name!

## Unpaired t-test

Used to compare the means of groups of data where each datapoint is independent, when the data from each group are similarly spread out around the group mean (i.e. have similar variance). 1.make up data**group1_data = rnorm(10,1,2)**

**group2_data = rnorm(10,6,2)**2. plot data

**boxplot(group1_data,group2_data)**3. do test

**t.test(group1_data,group2_data)**

## Paired t-test

A paired t-test is used when each observation in one group has a matching observation in the other group. For example, you measured fluorescence in 50 cells, then dosed them with some drug then measured fluorescence again. For this to work right, the individuals (here, cells) need to be in the same place in the data in the before and after group. Like the unpaired t-test, you need to be able to assume that the observations are roughly normal (look like a bell curve when you plot them as a histogram). 1. make up data**before_data = seq(from=2,to=20,length.out=10)+rnorm(10,0,2)**

**after_data = before_data-rnorm(10,5,2)**2. plot data

**matplot(c(1,10),rbind(before_data,after_data),type=’l’,axes=F,ylab=’measure’)**

**axis(2)**

**axis(1,c(1,10),label=c(‘before’,’after’))**3. do test

**t.test(before_data-after_data)**

**t.test(before_data,after_data,paired=TRUE)**# same thing

## Mann-Whitney

This is what you use when you want to do an unpaired t-test, but you don’t think your data are from normal distributions with similar variance. It’s kind-of more foolproof, but it is also less likely to find real differences when they are actually there. You can see in the example below that the example data is pulled from uniform distributions, and that the data don’t look like bell curves in the histograms. 1. make up data**group1_data = runif(20,min=3,max=10)**

**group2_data = runif(20,min=6,max=15)**

**all_data = c(group1_data,group2_data)**

**group_labels = c(rep(‘group1’,20),rep(‘group2’,20))**2. plot data

**par(mfrow=c(1,2))**(this cool command lets you plot multiple plots in one figure)

**hist(group1_data,breaks=5,main=’histogram of group 1′, xlim=c(0,20))**

**hist(group2_data,breaks=5,main=’histogram of group 2′, xlim=c(0,20))**3. do test

**wilcox.test(all_data ~ group_labels)**

## One-way ANOVA

Used to see if at least one group out of three or more is different from the rest of the groups. Here, we need to be able to assume that the *residuals *of the model are normally distributed, if your sample size is low. If it is pretty high (>50 per group), it’s probably okay even if the data aren’t actually normal. I think it is important to point out here that an ANOVA never tells you which group(s) are different. Instead, if you get a significant answer it only tells you that at least one group is different. From there, you can just take a look at the data to figure out why, or do a multiple comparisons, one of which I’ve tossed in here (a Tukey’s Honestly Significant Difference test). 1. make up data**group1_data = rnorm(10,1,2)**

**group2_data = rnorm(10,6,2)**

**group3_data = rnorm(10,6,2)**

**all_data = c(group1_data,group2_data,group3_data)**

**group_labels=c(rep(‘group1’,10),rep(‘group2’,10),rep(‘group3’,10))**2. plot data

**plot(factor(group_labels),all_data)**3. do test

**my_anova_model = aov(all_data ~ factor(group_labels))**

**summary(my_anova_model)**

**plot(my_anova_model)***# (to do a visual inspection of the ANOVA assumptions)

**TukeyHSD(my_anova_model)**

## Kruskal-Wallace

Just like the t-test needed a partner that could compare two groups when you weren’t sure if they were from normally-distributed populations, the one-way ANOVA also needed a non-normal partner. So the Kruskal-Wallace test looks to see if at least one group out of three or more is different from the others, and does not assume normally-distributed residuals. Similar to the Mann-Whitney, it is also less powerful but more fool-proof. 1.make up data**group1_data = runif(15,min=1,max=2)**

**group2_data = runif(15,min=1,max=5)**

**group3_data = runif(15,min=1.4,max=3)**

**all_data = c(group1_data,group2_data,group3_data)**

**group_labels=c(rep(‘group1’,15),rep(‘group2’,15),rep(‘group3’,15))**2.plot data

**par(mfrow=c(3,1))**

**hist(group1_data,xlim=c(0,6))**

**hist(group2_data,xlim=c(0,6))**

**hist(group3_data,xlim=c(0,6))**3.do test

**kruskal.test(all_data ~ factor(group_labels))**

## Two-way ANOVA

Here is where the tests start getting pretty complicated, and the last one I’ll introduce is the two-way ANOVA. A two-way ANOVA is for when you’ve kind-of done two experiments simultaneously, ending up with (at least) four groups of data. For example, you are measuring fluorescence in cells. In half the cells, you do a knock-down of some gene. Separately, you treat half the cells with a drug. This ends up with four different groups: (1) knock-down, no drug, (2) knock-down, with drug, (3) no knock-down, no drug, (4) no knock-down, with drug. We say the treatments are in different ‘factors’: drug treatment and knock-down treatment. The cool thing about a two-way ANOVA, and the reason we analyze this kind of data differently from a one-way ANOVA, is that the two way can tell you if the drug had any general effect, if the knock-down had any general effect, and if there was interaction between the two (e.g. the knock-down only had an effect when the drug was present). 1.make up data**group1_data = rnorm(10,1,2) #no drug, knockdown**

**group2_data = rnorm(10,3,2) #no drug, no knockdown**

**group3_data = rnorm(10,6,2) #drug, knockdown**

**group4_data = rnorm(10,3,2) #drug, no knockdown**

**factor1 = c(rep(‘no drug’,20),rep(‘drug’,20))**

**factor2 = c(rep(‘knock down’,10),rep(‘no knock down’,10),rep(‘knock down’,10),rep(‘no knock down’,10))**

**all_data = c(group1_data,group2_data,group3_data,group4_data)**2.plot data

**interaction.plot(factor1,factor2,all_data)**3.do test

**my_2way_anova_model = aov(all_data ~ factor(factor1)*factor(factor2))**

**my_2way_anova_model**

**summary(my_2way_anova_model)**

**plot(my_2way_anova_model)**Now that you have the code, you are ready to roll! Stay tuned for more!