The above graph reveals for the world report instances within the mile run from 1913 by 1999, together with a fitted regression line (in blue), and 10 attracts from the posterior distribution of the road (in purple).
Right here’s the fitted mannequin:
Median MAD_SD (Intercept) 1006.24 22.78 yr -0.39 0.01 Auxiliary parameter(s): Median MAD_SD sigma 1.42 0.19
The estimated slope is -0.39: in the course of the twentieth century, the report time dropped by about 0.4 seconds per yr. Fairly cool!
However what concerning the intercept? 1006.88 is the anticipated report time . . . within the yr that Jesus was born.
Right here’s the graph on the size together with x=0:
On this case we’d do higher by centering and rescaling the predictor, for instance centering the yr at 1950 and dividing by 10 in order that it may be interpreted as “a long time relative to 1950.” Right here’s the outcome:
Median MAD_SD (Intercept) 240.43 0.28 I((yr - 1950)/10) -3.93 0.11 Auxiliary parameter(s): Median MAD_SD sigma 1.42 0.18
The linear mannequin predicts a report time in 1950 of 4 minutes and 0.43 seconds, with a decline of about 3.9 seconds per decade in the course of the interval of the info.
That is all mathematically trivial, however on a regular basis I see individuals struggling to interpret regressions with predictors which can be removed from zero. That is significantly an issue when the mannequin has interactions, through which case every major results will be interpreted because the slope when the opposite predictors equal zero.
The simple strategy to bear in mind this: “The intercept is the anticipated worth within the yr Jesus was born.” You need to keep away from that (except you’re analyzing knowledge from the early Roman empire).
P.S. There’s some dialogue within the feedback concerning the linear development, which is now not. (The present world information for 1500 meters and the mile had been set in 1998 and 1999, respectively.)
Talking only a informal observer, not as an skilled in athletics, I’ve the impression that the linear enchancment all through the century is a product of a number of adjustments occurring at totally different instances, together with higher coaching, extra international competitors, altering views about what was doable, higher operating strategies, and tapping a bigger inhabitants of potential runners. Possibly higher sneakers too. Since 2000 there have been no new information, which means that the final main enchancment has already occurred–though I suppose that expertise might yield some enhancements, by doping, physique modification, or improved coaching and method.
I take the linear-over-a-century factor to be extra of a statistical artifact than the rest. Nonetheless, it’s amusing, which is why it’s lengthy been one in all my favourite statistics examples. Certainly, I first began to make use of this as a instructing instance within the Eighties, at which period we had been nonetheless within the vary of approximate linear enchancment. I had no concept that the development would come to a screeching halt earlier than the flip of the century.
P.P.S. Listed here are the info (which I typed in years in the past, most likely from wikipedia):
yr month min sec 1913 5 4 14.4 1915 7 4 12.6 1923 8 4 10.4 1931 10 4 09.2 1933 7 4 07.6 1934 6 4 06.8 1937 8 4 06.4 1942 7 4 06.2 1942 7 4 06.2 1942 9 4 04.6 1943 7 4 02.6 1944 7 4 01.6 1945 7 4 01.4 1954 5 3 59.4 1954 6 3 58.0 1957 7 3 57.2 1958 8 3 54.5 1962 1 3 54.4 1964 11 3 54.1 1965 6 3 53.6 1966 7 3 51.3 1967 6 3 51.1 1975 5 3 51.0 1975 8 3 49.4 1979 7 3 49.0 1980 7 3 48.8 1981 8 3 48.53 1981 8.2 3 48.40 1981 8.3 3 47.33 1985 7 3 46.32 1993 9 3 44.39 1999 7 3 43.13
And my R code:
library("rstanarm") mile <- learn.desk("mile2.txt", header=TRUE) mile$yr <- mile$yr + mile$month/12 mile$time_in_seconds <- mile$min*60 + mile$sec match <- stan_glm(time_in_seconds ~ yr, knowledge=mile) print(match, digits=2) png("jesus_1.png", peak=350, width=500) par(mar=c(3,3,1,1), mgp=c(1.8,.5,0), tck=-.01) plot(mile$yr, mile$time_in_seconds, xlab="yr", ylab="report time (seconds)", pch=20, bty="l") sims <- as.matrix(match) n_sims <- nrow(sims) for (s in pattern(n_sims, 10)) { curve(sims[s,1] + sims[s,2]*x, lwd=.5, col="purple", add=TRUE) } curve(median(sims[,1]) + median(sims[,2])*x, col="blue", add=TRUE) dev.off() png("jesus_2.png", peak=350, width=500) par(mar=c(3,3,1,1), mgp=c(1.8,.5,0), tck=-.01) plot(mile$yr, mile$time_in_seconds, xlim=c(0,2000), ylim=median(sims[,1]) + median(sims[,2])*c(2000,0), xlab="yr", ylab="report time (seconds)", pch=20, bty="l") sims <- as.matrix(match) n_sims <- nrow(sims) for (s in pattern(n_sims, 10)) { curve(sims[s,1] + sims[s,2]*x, lwd=.5, col="purple", add=TRUE) } curve(median(sims[,1]) + median(sims[,2])*x, col="blue", add=TRUE) dev.off() print(stan_glm(time_in_seconds ~ I((year-1950)/10), knowledge=mile), digits=2)
Yeah, the code is ugly. Sue me.