Analytical Study Design in Medical Research: Measures of risk and disease association

A researcher, while designing any analytical study in medical research, should be aware of few basic terms in epidemiology required to measure disease risk and association. This blog article focuses on defining those terms used for calculating disease risk and association. As mentioned above, there are two different types of measurements: Measures of risk and Measures of association.

Measures of Risk

Risk is defined as the probability of an individual developing a condition or disease over a period of time.

Risk = Chances of something to happen/ Chances of all things to happen

Odds= Chances of something to happen/ Chances of it not happening

Therefore, “Risk” is a proportion, while “Odds” is a ratio.

Incidence: Incidence is a measure of risk which describes the number of cases developed a new condition for a specified period of time. In this context, there is another important term, “Incidence proportion” to be worth mentioning. It is defined as the proportion of the number of cases developed a new condition and total population including the cases with developed condition and no condition in a specified period of time.

For example, among 100 non-diseased persons initially at risk, 20 develop a disease/condition over a period of five years.

Incidence = 20 cases

Incidence proportion = 20 cases per 100 persons i.e., 5%

Incidence rate = 20 cases developed in 100 persons in 5 year means the rate of incidence is equal to 4 per 100 person-years

Prevalence: Prevalence is the proportion of the number of people having a condition at a specific point of time and total population studied. This is specifically called point prevalence. For example, at a certain date, five persons are detected having a condition among 100 people studied. There are two more terms need to be defined in this regard: Period prevalence and Life time prevalence (LTF). The former is defined as the proportion of the number of people having the disease at a certain period of time, say a month or period or a year and the total population studied at that period of time. On the other hand, LTF is defined as the proportion of the number of people having the disease at some point of their life and total population studied.

There is a very subtle difference between incidence and prevalence. Incidence is the frequency of a new event, while prevalence is the frequency of an existing event.

Cumulative Risk: Cumulative risk is defined by the probability of developing a condition over a period of time.

Measures of Association

Association is defined as a statistical measurement between two or more variables.

For measuring the strength of association of a disease for etiological and hypothesis testing, following measurements are important. The terms defined below are used to measure the association between exposure and disease.

Relative risk (RR): The relative risk is measured as a ratio of two risks.

For example, in 100 people consisting of 50 male and 50 female, while 20 male are infected with Tuberculosis, 10 female develop the condition.

Risk in men: 20/50

Risk in women: 10/50

Therefore, relative risk (RR) of developing Tuberculosis in men compared to women is

RR = 20/50 : 10/50 = 2.0

i.e., men are at double risk of developing Tuberculosis as compared to women.

Odd ratio (OR): Odd ratio is measured as the ratio of two odds (odds is defined above).

Continuing the previous example of Tuberculosis in men and women in a total population of 100

Odds in men: 20/30

Odds in women: 10/40

Odd ratio (OR) = 20/30 : 10/40 = 2.67

Therefore, the odds of men getting infected with Tuberculosis are 2.6 times as high as the women developing Tuberculosis.

To measure the impact of   the disease association on public health, following measuerments are important. All these measurements assume that the association between exposure and disease is causal.

Attributable risk (AR): Amount of disease attributed to the exposure i.e., the difference between the incidence of disease in the exposed group (Ie) and the incidence of disease in the unexposed group (Iue).

AR = Ie – Iue

Attributable (risk) fraction (ARF): ARF is the proportion of disease in the exposed population whose disease can be attributed to the exposure.

ARF = Ie – Iue / Ie

Population attributable risk (PAR): The incidence of disease in total population (Ip) that can be attributed to the exposure.

PAR = Ip – Iue

Population attributable (risk) fraction (PARF): PARF is the proportion of the disease in the total population whose disease can be attributed to the exposure.

PARF = Ip – Iue / Ip


Bias and Confounding Factors

In an epidemiological study, when association is found between exposure and disease, it is very important to check first whether the association is real. One needs to be cautious if the association is by chance due to non-adequate sample size or it is because of some kind of bias in the design or measurement.

Bias is a systematic error in design, conduct or analysis which results in unreal association of exposure with disease. There are three types of biases possible: (i) Selection bias, (ii) Information bias, and (iii) Confounding.

Selection bias occurs when selection of participants in one group shows different outcome in the selection of other groups. Information bias happens when information is taken differently from two groups.

Confounding occurs when the observed result between exposure and disease differs from the truth due to the influence of a third variable which has not been considered for analysis. For example, a person suffers from headache when he is under stress; however the person eats a lot of junk food especially, when he is in under stress. Therefore, it is hard to predict what actually causes the headache; whether it is lack of sleep, anxiety, gas formation due to indigestion. Therefore, all these variables should be adjusted before associating mental stress with headache.



1. Health Statistics New South Wales – Definitions. (n.d.).


3. John-Hopkins open courseware.

4. Manuel Bayona M, Chris Olsen, C. Measures in Epidemiology. In The Young Epidemiology Scholars Program (YES)‎

5. Emily L. Harris EL. Linking Exposures and Endpoints: Measures of Association and Risk

Analytical Study Designs in Medical Research

In medical research, it is important for a researcher to know about different analytical studies. The objectives of different analytical studies are different, and each study aims to determine different aspects of a disease(s) such as prevalence, incidence, cause, prognosis, or effect of treatment. Therefore, it is essential to identify the appropriate analytical study associated with certain objectives. Analytical studies are classified as experimental and observational studies. While in an experimental study, the investigator examines the effect of presence or absence of  certain intervention(s), he does not need to intervene in a observational study, rather he observes and assesses the  relation between exposure and disease variable. Interventional studies or clinical trials fall under the category of experimental study where investigator assigns the exposure status. Observational studies are of four types: cohort studies, case-control studies, cross-sectional studies, and longitudinal studies

Classification of Analytical studies

While experimental studies are sometimes non indicative or not ethical to conduct or very expensive, observational studies probably are the next best approach to answer certain investigative questions. Well-designed observational studies may also produce similar results as controlled trials; therefore, probably, the observational studies may not be considered as second best options. In order to design an appropriate observational study, one should able to distinguish between four different observational studies and their appropriate application depending on the investigative questions. Following is a brief discussion on four different observational studies (each will be discussed in detail individually in my upcoming blogs):


Observational Analytical Study Designs

Cohort studies

Cohort methodology is one of the main tools of analytical epidemiological research. The word “cohort” is derived from the Latin word “cohors” meaning unit. The word was adopted in epidemiology to refer a set of people monitored for a period of time. In modern epidemiology, the word is now defined as “group of people with defined characteristics who are followed up to determine incidence of, or mortality from, some specific disease, all causes of death, or some other outcome” (Morabia, 2004). In cohort studies, individuals are identified who initially do not have the outcome of interest and followed for a period of time. The group can be classified in sub sets on the basis of the exposure. For example, a group of people can be identified consisting of both smoker and non-smoker and followed them for the incidence of lung cancer. At the beginning of the study none of the individuals have lung-cancer and the individuals are grouped into two sub sets as smoker and non-smoker and then followed for a period of time for different characteristics of exposure such as smoking, BMI, eating habits, exercise habits, family history of lung cancer or cardiovascular diseases, etc. Over the time, some individuals develop the outcome of interest. From the data collected over time, it is convenient to evaluate the hypothesis whether smoking is related with the incidence of lung cancer. The following schematic shows the basic design of a cohort study. There are two types of cohort studies: prospective and retrospective. A prospective study is conducted at present but followed up to future i.e., waiting for the disease to develop. On the other hand, a retrospective study is carried out at present on the data collected in the past. This is also called as historic cohort study. In the next blog, I will discuss these in detail.

Design of a Cohort study

Case-control studies

In terms of objective, case-control studies and cohort studies are same. Both are observational analytical studies, which aim to investigate the association between exposure and outcome. The difference lies in the sampling strategy. While cohort studies identify the subjects based on the exposure status, case-control studies identify the subjects based on the outcome status. Once the outcome status is identified the subjects are divided into two sets: case and control (who do not develop the outcome). For example, a study design which determines the relation between endrometrial cancer with use of conjugated estrogen. For this study, subjects are chosen based on the outcome status (endrometrium cancer) i.e., with disease present (case) and absent (control), and then these two subsets are compared with respect to the exposure (use of conjugated estrogen). Therefore, case-control study is retrospective in nature and cannot be used for calculating relative risk. However, odd ratio can be measured, which in turn, is approximate to relative risk. In cases of rare outcomes, case control study is probably the only feasible analytical study approach.

Design of a Case-Control Study

Cross-sectional studies

Cross-sectional study is a type of observational analytical study which is used primarily to determine the prevalence without manipulating the study environment. For example, a study can be designed to determine the cholesterol level in walker and non-walker without exerting any exercise regime or activity on non-walkers or modifying the activity of the walkers. Apart from cholesterol other characteristics of interest, such as age, gender, food habits, educational level, occupation, income, etc., can also be measured. The data collected at one time in present with no further follow up. In cross-sectional design, one can study a single population (only walkers) or more than one population (both walker and non-walker) at one point of time to see the association between cholesterol level and walking. However, the design of this study does not allow to examine the causal of a certain condition since the subjects are never been followed either in past or present. 

Design of a Cross-Sectional Study

Longitudinal studies

Longitudinal studies, similar to cross-sectional studies, are also a type of observational analytical studies. However, the difference of this study design with the cross-sectional study is the following up the subjects for a longer time; hence, can contribute more to the association of causative to a condition. For example, the design that aims to determine the cholesterol level of a single population, say the walkers over a period of time along with some other characteristics of interest such as age, gender, food habits, educational level, occupation, income, etc. One may choose to examine the pattern of cholesterol level in men aged 35 years walking daily for 10 years. The cholesterol level is measured at the onset of the activity (here, walking) and followed up throughout the defined time period, which enables to detect any change or development in the characteristics of the population.

Following two tables summarize different observational analytical studies with regard to the objectives and time-frame.


I will define several terms, such as risk factor, odd ratio, probability, confounding factors, etc., related to study designs along with the detail discussion on individual analytical study design and tips to choose correct design depending on the research question in my upcoming blogs. Visit the blog section of the website ( for more such informative and educative topics. 


[1] Morabia, A (2004). A History of Epidemiologic Methods and Concepts. Birkhaeuser Verlag; Basel: p. 1-405.

[2] Hulley, S.B., Cummings, S.R., Browner, W.S., et al (2001). Designing Clinical Research: An Epidemiologic Approach. 2nd Ed. Lippincott Williams & Wilkins; Philadelphia: p. 1-336.

[3] Merril, R.M., Timmreck, T.C (2006).  Introduction to Epidemiology. 4th Ed. Jones and Bartlett Publishers; Mississauga, Ontario: p. 1-342.

[4] Lilienfeld, A.M., and Lilienfeld, D.E. (1980): Foundations of Epidemiology. Oxford University Press, London.