Statistics is one of the most hated subjects by biologists around the globe. In spite of its daily dose of abuse, knowledge of statistics can be a life-saver.

Chi square distribution and test is one of the most important and widely used probability distribution in inferential statistics for biology and life science students. It is generally based on proportions of variables present in the experimental condition.

The Chi square test is popular because of its many strengths, including:

• Easy to compute
• Can also be used for data collected on the nominal (categorical) scale
• Can be used to study the difference among the various variables in consideration
• Does not make assumptions about the distribution of data (e.g. normality)
• Can be used for large data sets

Because of its popularity, I thought I’d review it for you here.

## How is the chi square test used?

The chi square test can be used in two ways:

### 1.  Goodness of Fit Test

Goodness of fit test is used when you have a widely accepted theory and want to check whether your observed values are in sync with the theory or not.

• The goodness of fit test is normally used in genetics where the genotypic and phenotypic ratios have already been established for a given test and population.
• You can also use this test in case when the expected outcome has already been established. For example: You want to understand the outcome of an experiment that you set in your field based on the test cross given by Mendel. You observe that the results are not according to the accepted theory. In this case you can check the p value of the chi square test for goodness of fit test to determine whether the observed values are in accordance to the test or not [similar example is explained later]. If p value <0.05, your experiment is a success. If not, better luck next time!

### 2.  Test for Independence of Attributes

Independence of attributes, or χ2 test of association of attributes, is used to understand how the two attributes are connected to one another. It is used to study if proportions of one variable are different from the values of other variables.

• Comparison of parameters/attributes among control and test populations
• Evaluation of correlation of disease symptoms with the disease in case of clinical data

## Steps for the proper calculation and understanding of the chi square test

The following steps are generalized steps that could be used for both Goodness of Fit Test as well as for Test of Independence of Attributes.

There are 3 simple steps for every chi square test:

It may seem obvious, but to perform any statistical test, you must first define what you are testing. This is based on the hypothesis of your research. For the chi square test, you need a null hypothesis and an alternative hypothesis.

#### Formulate the null hypothesis

The null hypothesis (H0), also known as the hypothesis of NO difference, states that there is no difference in the results before or after the test is performed. For example, let’s say you want to understand the effect of sunlight on the plant growth. In this case, your null hypothesis will state that the sunlight has no effect on plant growth.

#### Formulate the alternative hypothesis

The alternative hypothesis (H1) is always opposite to that of the null hypothesis. In our plant study, the alternative hypothesis would be that the amount of sunlight affects plant growth rate.

Once you know what you are testing, you can apply the calculations. There are many different software packages that you can use, but the formula for testing is:

#### Determine Degree of freedom (df)

df is the number of parameters of the system that may vary independently without violating any constraint imposed on it. Degree of freedom can be easily determined using a matrix system. If you are working with a matrix of 2 (rows) x 3 (columns) then the degree of freedom is:

df = [(2-1)x(3-1)]= 2

Note: Every Chi square calculation can be represented as matrix as explained later in example 4.

### 3.  Find p

Use the chi square tables (an example is attached to this post) to determine the p value. You should only check the p value corresponding to the degree of freedom calculated in step 3.

## Now, let’s have a look at performing these tests with a few examples

### Example of Goodness of Fit Test

Assume that you have crossed pure breeding plants of genotype A/A, B/B, a/a, b/b and obtained di-hybrid A/a, B/b. You then test crossed this to a/a, b/b. The resulting F1 generation matrix of the offspring was:

#### Step 1: Develop your hypotheses

H0 = the resulting F1 generation is in accordance with the established theory (1:1:1:1).

H1 = the resulting F1 Generation is not in accordance with the established theory.

#### Step 2: Do your calculations

df=3 [{df=(r-1) x (c-1)} , in this case r=type of genotypes in study i.e. A/B, a/b, A/b and a/B and c=No. of conditions in which genotypes are being studies (viz. observed values and expected values.)

#### Step 3: Find p

Find the χ2 value from Chi Square table at df=3

χ2 = ±7.81

Result: You can accept H0 because the results are in accordance with the established theory. χ2 = 5.2 and lies between -7.81< 5.2 <+7.81 at α=0.05 ; α is called the confidence interval. An α = 0.05 is acceptable when the sample size is >30. However, if the samples size is <30, then 99% of curve is accepted at α=0.01.

### Example for Independence of Attributes

Assume that in a scenario you have two groups of patients: one diseased and the other non-diseased. 37/54 lucky (or rather unlucky!!) diseased and 13/66 non-diseased individuals were chosen for the administration of drug 1.

#### Step 1: Develop your hypotheses

H0 = Drug 1 does not improve the disease condition

H1 = Drug 1 improves the disease condition

#### Step 2: Do your calculations

• Observed matrix table

• Expected matrix table

df=(r-1) x (c-1), in this case r= 2 (number of conditions under observation i.e. diseased non-diseased) and c = 2 (for treated and untreated groups)

#### Step 3: Find p

Find χ2 value from Chi Square table at df=1

χ2 = ±3.841

Result: You can accept the Ho as the results are in accordance with the established theory. χ2 = 2.89 and lies between -3.841< 2.89 <+3.841 at α=0.05

## When can you not use chi square test?

Although chi square is a powerful statistical test, it can not be used in all situations.  In particular it is not valid:

• When the sampling is biased. For example: if you deliberately choose larval stages of insects for study over pupae and adult stages even if they are present at the collection site. The sampling is considered biased and accurate entomological deductions could not be made using chi square statistics.
• When sample size is very small (usually less than or equal to 5 is considered very small in this case, but generally it should not be used when the sample size is less than 50.) For smaller sample sizes, the fischer’s test might be used)
• In case of dependent variables; where presence/absence of variable B will always depend on presence/absence of variable A.
• When data is anything except frequency data. For example, if you are counting how many patients show resistance to a particular drug versus how many show susceptibility, than a chi-square is appropriate. If the data is present in any other format, the chi square tests is useless.
• In case where strength of relationship is required. The chi square test merely talks about the independence of two variables; it cannot be used to determine the degree of independence.

Hopefully this has helped you understand what a Chi square test is and when and how to use it.