Correlation and Causality

Stop overpopulation,
shoot a stork.

One of the most basic but most difficult things in science is the difference between correlation and causality. Just because two things occur together — or rather: one after the other — does not mean that one thing causes the other. It’s the old post hoc ergo propter hoc fallacy.

On the surface, it’s easy to understand. Just think about the standard example of the correlation between the number of storks and the birth rate in a region. There actually is a correlation between these two numbers. The correlation coefficient is .62, which is statistically significant (p = .008). Yes, there actually is a study about it, albeit with stork data based on personal communication. But hey, science is great this way. Of course, storks do not bring babies, there’s an even stronger correlation between land area and birth rate (r = .923). And if you control for the land area, the correlation between number of storks and birth rate is no longer significant, but not vice versa.

So much for the simple example that is easy to see (other great examples on this spurious correlations website, e.g., this correlation between US spending on science, space, and technology and suicides by hanging, strangulation and suffocation). However, things become more difficult when it’s obvious to see the connection between two variables. When we already believe that a connection between two variables exist. Then we have to detach ourselves from what we already think we know, and look at whether this correlation really tells us anything. And that’s extremely difficult, giving that we have to work against our senses, esp. against our very well developed sense to detect co-occurrences, and our prejudices. Things that give us a measure of predictability in a complex and overwhelming world.

But mistrusting correlations, especially when they ‘feel’ plausible, and instead looking for actual causal relationships is the only way we can actually make sense out of our environment. And to go beyond the simple associations that even our primal ancestors were capable of. Because the results should support the conclusions we draw. The (preconceived) conclusion should not be needed to “lend support” to the interpretation of the results.

In this sense, be skeptical of correlations, no matter how significant or plausible. They tell you next to nothing and can cause a lot of damage by the false sense of understanding they provide.