The Odds Ratio: why is it the only option in case-control studies? | RodrigoNahum

The Odds Ratio: why is it the only option in case-control studies?

Before I proceed, let me explain some notations. $P(X)$ is the probability of X happening. $P(X|Y)$ is the probability of X happening, given that Y have already happened. This is a Bayesian probability. There are ways to expand a bayesian probability to deal with different situations, for example:

$P(X) = 1 - P(\neg{X})$

$P(X) = P(X|Y)P(Y) + P(X|\neg{Y})P(\neg{Y})$

$P(X|Y) = \frac{P(X)P(Y|X)}{P(Y)}$

$P(X|Y) = \frac{P(X|\neg{}Y)P(\neg{}Y)P(Y|X)}{P(\neg{}Y|X)P(Y)}$

$P(X|Y) = 1 - P(\neg{X}|Y)$

These are just a few examples of arithmetics that can be done when the measures are binary.

Also, to avoid gigantic formulas ahead, let's replace the binary outcome and the binary exposure with X and Y, respectively.
$X$ = outcome
$Y$ = exposure
$\neg{X}$=no outcome
$\neg{Y}$=no exposure

Population risk

So if you had the complete data for the whole population, it would be easy to calculate the risks and the odds. The absolute risk of having cancer, if you smoked, would be: $$Risk_{pop} = P(X|Y)$$

And the risk ratio between having cancer if you smoked, and having cancer if you didn't smoke, would be: $$RR_{pop} = \frac{P(X|Y)}{P(X|\neg{Y})}$$

Sample risk

Now, if you are sampling from a population, things get a little different, depending on the sampling design. That's because when you sample, you're drawing from a population with a specific probability. If you sample people based on their exposure status (cohort design), and then wait until you see the outcome, you would have $$Risk_{cohort}=\frac{P(X|Y)}{P(X|Y)+P(\neg{X}|Y)}=\frac{P(X|Y)}{1}=Risk_{pop}$$

Which is precisely the same as calculating the population risk. Then if you try calculating the risk ratio $$RR_{cohort} = \frac{\frac{P(X|Y)}{P(X|Y)+P(\neg{X}|Y)}}{\frac{P(X|\neg{Y})}{P(X|\neg{Y})+P(\neg{X}|\neg{Y})}}=\frac{\frac{P(X|Y)}{1}}{\frac{P(X|\neg{Y})}{1}}=RR_{pop}$$

You would also find that it would be mathematically the same as the population RR. So a cohort study has the perfect design for calculating the population risk, given that the sampling was random.

However, if you sampled people based on their outcome status (case-control design), and then checked whether they were exposed or not, you would get a very different probability, that is the probability of finding an exposure, given that you know the outcome: $$Risk_{case-control}=\frac{P(Y|X)}{P(Y|X)+P(Y|\neg{X})}\ne{}P(X|Y), Risk_{case-control}\ne{}Risk_{pop}$$ $$RR_{case-control} = \frac{\frac{P(Y|X)}{P(Y|X)+P(Y|\neg{X})}}{\frac{P(\neg{Y}|X)}{P(\neg{Y}|X)+P(\neg{Y}|\neg{X})}}\ne{}\frac{P(X|Y)}{P(X|\neg{Y})}, RR_{case-control}\ne{}RR_{pop}$$

If you try to solve it, you wil find that $RR_{case-control}\ne{RR}$.

Population odds

The odds of something happening is the probability of it happening divided by the probability of it not happening. For example, you would have 4 times (odds) more chance of winning than of losing if the probability of winning was 80%, because you would divide 80% by 20%. So the chance of cancer, if you smoked, would be: $$Odds_{pop} = \frac{P(X|Y)}{P(\neg{X|Y})}$$

And the Odds Ratio would be the ratio between the odds of cancer if you smoked, and the odds of cancer, if you didn't smoke. $$OR_{pop} = \frac{\frac{P(X|Y)}{P(\neg{X|Y})}}{\frac{P(X|\neg{}Y)}{P(\neg{X|\neg{}Y})}}$$

Sample odds

If you were doing a case-control study, in which the Odds Ratio would be the choice for measuring the effect size, you would be calculating this: $$OR_{case-control} = \frac{\frac{\frac{P(Y)P(X|Y)}{P(X)}}{\frac{P(Y)P(\neg{}X|Y)}{P(\neg{}X)}}}{\frac{\frac{P(\neg{}Y)P(X|\neg{}Y)}{P(X)}}{\frac{P(\neg{}Y)P(\neg{}X|\neg{}Y)}{P(\neg{}X)}}} = \frac{\frac{\frac{1.P(X|Y)}{1}}{\frac{1.P(\neg{}X|Y)}{1}}}{\frac{\frac{1.P(X|\neg{}Y)}{1}}{\frac{1.P(\neg{}X|\neg{}Y)}{1}}} = \frac{\frac{P(X|Y)}{P(\neg{}X|Y)}}{\frac{P(X|\neg{}Y)}{P(\neg{}X|\neg{}Y)}} = OR_{pop}$$

I won't write here the equation for the Odds Ratio in a cohort study, because it would be exactly the same as the population odds ratio, therefore they are the same.

When OR ~ RR

Whenever the population risk of a certain disease is close to zero (e.g. one in a million), we can assume that: $$\lim\limits_{a,c \to 0}RR=\frac{\frac{a}{a+b}}{\frac{c}{c+d}}\approx\frac{a/b}{c/d}\approx OR$$

Therefore, given the rare disease assumption, OR is a good and simplified approximation of RR.

Conclusion

Here I tried to explain in a visual and a theoretical way why the Odds Ratio is an effect size measurement that can be calculated in either a cohort or a case-control study, because they are mathematically the same. However, the risk ratio can only be calculated using a cohort study design, while a case-control will only be able to offer an odds ratio, and that is mathematically true.

Also, I explained why an odds ratio is a good approximation of the risk ratio in populations where the outcome is very rare, because when the probability of an outcome is near zero, the odds ratio becomes the risk ratio.