Benford’s Legislation is enjoyable; it is also the supply of limitless confusion. Math is tough!
This got here up lately when Kevin Lewis pointed to this paper, which states:
We use Benford’s legislation to look at the non-random parts of well being care prices. We discover that as well being care expenditures enhance, the conformity to the anticipated distribution of naturally occurring numbers worsens, indicating a bent in the direction of inefficient therapy. Authorities insurers comply with Benford’s legislation higher than non-public insurers indicating extra environment friendly therapy. . . .
This sounded fascinating, however I’m fairly certain they’re doing it unsuitable, as a result of they attempt to consider the match to Benford’s legislation inside every “worth bucket” ($100-999, $1000-9999, $10,000-99,999, and $100,000-999,999). Primarily based on my understanding of the processes underlying Benford-like habits, you wouldn’t essentially anticipate the sample to happen inside every bin in that means.
Right here’s an instance of how issues go unsuitable. The authors write:
We additionally comply with Drake and Nigrini (2000) by calculating the imply of absolute deviations (MAD) to make use of as a solution to assess conformity to the anticipated distribution. . . . We discover that on the MAD for the primary bucket of expenses (0.010) exhibits a touch acceptable conformity to Benford’s legislation. Nevertheless, for the second (0.023), third (0.049), and fourth (0.092) buckets the MAD is larger than 0.012 indicating nonconformity. As anticipated, the MAD will increase with the extent of total-charges. . . . An extra doable clarification for this discovering is hospital pricing methods . . .
OK, right here’s the issue. Listed below are the information:
This could actually be a graph, however let’s not fear about that proper right here.
Let’s deal with the fourth bucket, as a result of that’s the place the discrepancy is highest. You see what’s taking place, proper? In that fourth bucket, we’re up there within the tail of the distribution, the tail is dropping quick, so, yeah, there are very expenses over $200,000. That doesn’t imply anybody’s dishonest of their billing; it’s simply what you’d anticipate to see within the tail of the distribution. Benford’s legislation applies when the underlying numbers come from a distribution with have a large dynamic vary, and by binning on this means you’re destroying that.
I’d say I’m shocked this obtained printed in a official journal, however, you understand, the issue with peer evaluate is the friends. Everybody’s doing one of the best they’ll, Benford’s legislation is a vibrant shiny object, it will get misused identical to linear regression will get misused, identical to logistic regression will get used, identical to speculation testing will get misused and misused and misused. The Benford instance is just a bit bit extra fascinating as a result of the mathematics confusion is one thing a bit much less acquainted than the standard statistical errors we see on daily basis. Therefore why I bothered with this submit.