Who is Simpson and what’s his paradox?

To make sense of the world we need statistics; to make sense of statistics we need statistical literacy.

With the increased speed and accessibility of global communications, the everyday person is now expected to be aware of and care about the many crises happening across the globe. Yet, human minds aren’t well-suited to large-scale thinking. Fortunately for us, statistics are a powerful method of putting information about our world into context; formatting knowledge into intelligible bites. 

Take the recent representations of billionaires’ wealth using proportionate piles of rice, popular in early 2020 on platforms such as TikTok. The shock generated by these videos successfully demonstrated how difficult large numbers are to comprehend. The staggering scale of modern life, whether in the study of demographics, economics or science, necessitates a degree of statistical literacy, which will only grow as we contemplate and navigate more complex problems about our world.

So, what does statistical literacy look like?

First, it’s critical to understand that data are always interpreted, leaving space for misrepresentation and misunderstanding. Therefore, researchers must use a myriad of data transformations to draw meaningful conclusions. They might also intentionally use them to draw erroneous ones. In an academic context, this should be overcome during the peer review process, but for the average person reading a newspaper, you must bring your own critical eye.

For example, casual readers should pay attention to the time periods or metrics selected by the media in their data representation. Consider how global carbon emissions decreased during the pandemic. If we isolate 2020-2022 things look great, but less so when we observe from 1880 onwards. The data aren’t wrong, their interpretation is just misleading. 

We should also be wary of statistical paradoxes. Let’s look at an example: Simpson’s paradox.

Whenever a statistical study is carried out, it is necessary to control variables to ensure a degree of comparability between them. If demographic factors such as age, sex or health, are not accounted for, the association between a variable and its outcome can be inverted.

Take the following example. You and your friend decide to study together for an upcoming quiz by doing 20 practice questions. On day one, you blast through 16 and get 14 correct – a success rate of 87.5%. Meanwhile, your friend only does 4 but gets them all correct for 100%. On day two, you complete the final 4 and get 2 correct. Your friend (who should really share their notes) scores 10/16 on their remaining questions and has a better daily average again!

Our intuition might suggest that if someone does better on both days they will probably do better overall. However, this is not the case! You scored 16 while your friend only got 14. This is a textbook example of Simpson’s Paradox. When we observe performance on a daily basis it flips the result of the overall trend.

Simpson’s paradox has greatly confounded researchers in the past. A study on smoking and long-term survival in women in 1996 showed that women who smoked lived longer than those who didn’t. Given significant research showing the harms of smoking, this result was perplexing.

However, once the data were controlled for the age of each woman it was revealed that in each age group, women who smoked were less likely to survive than women who did not smoke. The researchers failed to consider the many women who would have been in the older age groups but were not included in the study because they had already passed away. In other words, controlling for age flipped the conclusion!

You are now armed with the keen eye required to look past erroneous statistics. While statistics are essential to modern life, their usefulness is stripped without the literacy required to wield them – their power is not. While this article is far from exhaustive, I hope an awareness of statistical paradoxes and data manipulation methods will encourage you to read more critically the next time you scroll past an Instagram infographic.