Statistical Hypothesis Testing- Steps, Errors, Interpretation

What is hypothesis testing?

Hypothesis testing is used to determine whether a premise is valid or not in relation to a statistical parameter. The goal of hypothesis testing is to make decisions about a population based on the interpretation of hypothesis testing on sample data drawn from the population data.

How Hypothesis Testing is done?

The testing of a hypothesis is done by forming a null and alternate hypothesis, where the null hypothesis states that the prevailing belief or premise in relation to a statistical parameter is true whereas the alternate hypothesis states that the prevailing belief or premise in relation to a statistical parameter is not true and thus alternate hypothesis is accepted.

Null and alternate hypothesis are mutually exclusive in nature. If one is true automatically another hypothesis becomes false and thus both are proposed simultaneously in relation to a statistical parameter.

Let us go through the steps of conducting hypothesis testing-

  1. Propose a null hypothesis (H0) and alternate hypothesis (Ha) is proposed in relation to the statistical parameter you want to interpret from the population.
  2. Specify the significance level (α) for accepting or rejecting the null hypothesis where the significance level is about the probability of error when the null hypothesis is true. The researcher decides on the significance level based on research problem. Generally, as a thumb rule, an alpha level of 0.05 (5%) is used.
  3. Conduct the experiments, and collect data to run statistical tests.
  4. Select an appropriate statistical test to calculate test statistics and p-value (In the null hypothesis, the p-value measures the probability of obtaining the observed results).
  5. Analyze the output and form conclusions.

Interpretation of the test results

Based on the output of the statistical test, the p-value is compared with the significance level (α). If the p-value is lower than the threshold level of acceptable error specified in the alpha value then observations are considered to be significant. Usually, the significance level for a study is set at 0.05 or 5%. A p-value that is below the significance level indicates that your results were statistically significant and supported the alternative hypothesis. If your p-value was greater than the significance level, then the results were statistically insignificant.

The interpretation of statistical hypothesis testing thus helps in making decisions on the validity of the hypothesis on the population data based on the statistical tests drawn on sample data drawn from that population data.

Types of Error in Hypothesis Testing

There are few types of errors that occur during hypothesis testing based on discrepancy between actual results and statistical results. Those types of errors are-

  1. Type I error – Type I errors are false positive errors when results appear to be statistically significant but they are actually purely by chance or the result of unrelated factors. This type of error can be prevented by choosing a higher alpha value (α).
  2. Type II error – Type II errors mean failing to reject the null hypothesis when it is actually false. This is not the same as accepting the null hypothesis as a test can only conclude whether to reject the null hypothesis. A type II error occurs when the statistical study failed to conclude the effect of stimuli on a statistical parameter when there actually was. This type of error can be prevented by increasing the statistical power of the study. The Type II error rate is also known as beta (β).

Hypothesis Testing

Hypothesis testing is the process of testing validity of a hypothesis or a supposition in relation to a statistical parameter. Hypothesis testing is used by analysts to determine whether or not a hypothesis is reasonable. For example, hypothesis testing could be used to find whether a certain drug is effective or not in treating headache. It uses data from a sample to draw conclusions about a statistical parameter. Hypothesis testing is an important step as it validates statistical parameter which could be used in making conclusions or inference about population or large sample data.

Types of Hypothesis

In data sampling, different types of hypothesis is used to examine whether a sample is positive for test hypothesis or not.

  1. Alternative Hypothesis (H1) – This hypothesis states that there is a relationship between two variables (where one variable affects the value of other variable). The relationship that exists between the variables is not due to chance or coincidence.
  2. Null Hypothesis (H0) – This hypothesis states that there is no relationship between two variables. It states that the effect of one variable on another is entirely due to chance, with no empirical explanation.
  3. Non-Directional Hypothesis – It states that there is a relationship between two variables, but that the direction of influence is unknown.
  4. Directional Hypothesis – It states the direction of effect of the relationship between two variables.

Alternative hypothesis and null hypothesis is used to study data samples to find a possible pattern to form a statistical hypothesis that can be validated through hypothetical testing. Alternative hypothesis and Null hypothesis cannot be true at the same time as they are mutually exclusive. Similarly, Non-directional and directional hypothesis cannot be true at the same time as they are mutually exclusive.

Methods of Hypothesis Testing

  1. Frequentist Hypothesis Testing- This is the traditional approach to hypothesis testing. It involves making assumptions on current data and comparing prior knowledge about hypothesis with posterior knowledge of the hypothesis to form a conclusion on the hypothesis. One of the subtypes of this approach is Null Hypothesis Significance Testing.
  2. Bayesian Hypothesis Testing- It is one of the modern methods of hypothesis testing. In this method prior probability of hypothesis from past data and current data is used to find posterior probability of the hypothesis.

The Bayes factor, which is a key component of this approach, represents the likelihood ratio between the null and alternative hypotheses. This factor indicates the plausibility of either of the two hypotheses formed for hypothesis testing.


Techniques of Hypothesis Testing

There are few commonly used Tests: Z-Test, T-Test, Chi squared Test and F-Test.

  1. Z Test- A z test is performed on a population with independent data points that follows a normal distribution and has a sample size of larger than or equal to 30. When the population variance is known, it is used to determine whether the means of two populations are equal. Z test statistic is compared to the crucial value and the null hypothesis of z test is rejected if the z test statistic is statistically significant.





Z= Z-test

X̄ =sample average


s=standard deviation

  1. T Test – A t-test is an inferential statistic that is used to see if there is a significant difference in the means of two groups that are related in some way. This test is also called as Student test. It is used when variables are continuous, sample size is less than 30, and population standard deviation is not known. T statistic is used to arrive at a conclusion on whether to accept the hypothesis or reject the hypothesis.





t= Student’s t-test

m= mean of sample

µ= assumed mean

s= standard deviation

n= number of observations

  1. Chi squared Test – A chi-square statistic is a test that evaluates how well a model matches actual data. For using Chi squared test the data used must be random, mutually exclusive, taken from independent variables from a large sample.




c=Degrees of freedom

O=Observed value(s)

E=Expected value(s)

There are two types of χ2 test – the test of independence, and goodness-of-fit test. A χ2 test for independence can show us how likely it is that random chance can explain any observed difference between the actual frequencies in the data and these theoretical expectations.

  1. F Test – Any statistical test with an F-distribution under the null hypothesis is known as an F-test. It is generally used to compare statistical models that have been fitted to a data set to find which model best fits the population from which the data were sampled. To perform an F-test, the population must have an f distribution and the samples must be random. If the f test findings are statistically significant, the null hypothesis is rejected otherwise, it is not. F statistic for large samples:




σ1= variance of the first population

σ22  = variance of the second population