Hanno Böck writes:
I not too long ago noticed a graphic coming from right here posted a number of instances on social media that I discovered fairly deceptive in its knowledge illustration.
There exist some variations of it, however all of them share the identical drawback.
Essentially the most notable situation is that the graphic makes use of logarithmic scales on each axes. This has the impact of compressing the whole lot collectively on the higher proper finish and visually creates a a lot stronger correlation than there truly is.
One other factor to notice, and that is the place I’d be curious what you concentrate on it, is that it provides an R^2 worth of 0.8 on the backside. To start with, R^2 is, so far as I can inform, not one thing that may be simply and intuitively understood (it appears a easy r coefficient can be extra applicable). However that’s not the primary drawback. The worth is, so far as I can inform, merely improper.
When I attempt to calculate R^2 for that knowledge, I get 0.43. It seems that what was achieved right here was to calculate the R^2 worth over the log values of the enter knowledge. (If I do this, I get 0.81.)
In case you wish to play with the information, right here’s some fast python I wrote to create comparable graphs with a non-log scale, and the related knowledge sources from the world financial institution and EIA.
My reply:
I don’t suppose the logarithmic scale is an issue, and it’s nice to compute the R-squared of log-scaled knowledge. In any case, the scatterplot tells the story; I don’t skinny R-squared provides something right here.
I clicked by way of to the supply, and the true drawback appears to be their title, “How does vitality influence financial progress.” The info they present are cross-sectional with no such causal implication.
Bock responded:
I’m stunned that you just don’t see an issue within the log scale. I imagine that is the primary situation with this graph. (As a rule of thumb, I’d say log scales ought to hardly ever be utilized in public communication in any respect, as they don’t seem to be simple to grasp intuitively. If they’re used, there must be rationalization, which I don’t see right here.)
To possibly illustrate this extra clearly, I’ve hooked up linear and log-scaled variations of the information. To me, they inform a unique story. The log model implies that there’s a normal, robust correlation between electrical energy consumption and per capita gdp. However the precise knowledge tells me that the correlation is just current beneath a sure threshold, and above that, we’ve got excessive variations of vitality use in nations with very comparable gdp ranges. (E.g. fairly wealthy nations like Denmark/Switzerland with a really low electrical energy use.)
Relating to your level about causal inference, that’s most likely a legitimate level as properly, however probably not what I’m making an attempt to get at right here. The reason being that I don’t suppose that weblog submit acquired lots of consideration, however the graphic is shared very extensively.
Böck posted an extended dialogue right here. Setting apart the above-discussed points with the log scale and R-squared, the remainder of his submit has fascinating economics content material.