Data Literacy - Correlation v causation

4Ehd...3ukk
27 May 2022
43

How we analyse data impacts our ability to make decisions and understanding correlation and causation is one of the many ways to improve data literacy.

What do we mean by correlation and causation and why does it get so many people in trouble. 


First lets start with a few definitions:


Correlation - is a statistical indicator of the relationship between variables

Causation - The two variables are correlated with each other and there is also a causal link between them
https://www.scribbr.com/methodology/correlation-vs-causation/

So why is this important!

Correlation does not equal causation, and the directionality of the correlation is also an important factor. This is best explained in some examples that help explain the reason (it is not a lesson on statistics) that are rather peculiar but illustrate the point. 

Directionality of Causation

There is strong positive correlation between crop yields and the number of hours of sunshine and whilst there are many other factors impacting crop yields this makes enormous sense.
However you can not indicated that as crop yields increase the number of sunshine hours increases also. It is the sunshine that drives your outcome variable “crop yields”

Correlation is not Causation

Per capita cheese consumption in the US correlates with the number of people who died by becoming tangled in their bed sheets. It would be rather absurd to think that has the consumption of cheese increases the death by tangled bedsheets. It’s a positive correlation (0.94) no doubt, but not a causal relationship.

When data sets are more relatable you should take care. 

Final thoughts

These are two interesting examples to show the point that people get into trouble inferring causation based on a correlation and care needs to be taken when establishing casual relationships.

Please share your most bizarre correlations you have seen and feel free to get some inspiration from Tyler Vigens examples of of 30,000 crazy correlations
https://www.tylervigen.com/spurious-correlations


Write & Read to Earn with BULB

Learn More

Enjoy this blog? Subscribe to Napes

6 Comments

B
E2123
I couldn't stop myself from cheering when I read this article. The distinction between correlation and causation is one of the most misunderstood parts of data literacy and seems to pop up everywhere (from public policy debates to business marketing campaigns). I honestly think the phrase "Correlation does not equal causation" should feature in every classroom, meeting room and board room.
Most relevant comments are displayed, so some may have been filtered out.