In “Does Velocity Matter?” I diagnosed the factors that account for defensive success or failure, as measured by runs allowed per nine innings of play. There’s a long list of significant variables: hits, home runs, walks, errors, wild pitches, hit batsmen, and pitchers’ ages. (Follow the link for the whole story.)
What about offensive success or failure? It turns out that it depends on fewer key variables, though there is a distinct difference between the “dead ball” era of 1901-1919 and the subsequent years of 1920-2015. Drawing on statistics available at Baseball-Reference.com. I developed several regression equations and found three of particular interest:
- Equation 1 covers the entire span from 1901 through 2015. It’s fairly good for 1920-2015, but poor for 1901-1919.
- Equation 2 covers 1920-2015, and is better than Equation 1 for those years. I also used it for backcast scoring in 1901-1919 — and it’s worse than equation 1.
- Equation 5 gives the best results for 1901-1919. I also used it to forecast scoring in 1920-2015, and it’s terrible for those years.
This graph shows the accuracy of each equation:
Unsurprising conclusion: Offense was a much different thing in 1901-1919 than in subsequent years. And it was a simpler thing. Here’s Equation 5, for 1901-1919:
RS9 = -5.94 + BA(29.39) + E9(0.96) + BB9(0.27)
Where 9 stands for “per 9 innings” and
RS = runs scored
BA = batting average
E9 = errors committed
BB = walks
The adjusted r-squared of the equation is 0.971; the f-value is 2.19E-12 (a very small probability that the equation arises from chance). The p-values of the constant and the first two explanatory variables are well below 0.001; the p-value of the third explanatory variable is 0.01.
In short, the name of the offensive game in 1901-1919 was getting on base. Not so the game in subsequent years. Here’s Equation 2, for 1920-2015:
RS9 = -4.47 + BA(25.81) + XBH(0.82) + BB9(0.30) + SB9(-0.21) + SH9(-0.13)
Where 9, RS, BA, and BB are defined as above and
XBH = extra-base hits
SB = stolen bases
SH = sacrifice hits (i.e., sacrifice bunts)
The adjusted r-squared of the equation is 0.974; the f-value is 4.73E-71 (an exceedingly small probability that the equation arises from chance). The p-values of the constant and the first four explanatory variables are well below 0.001; the p-value of the fifth explanatory variable is 0.03.
In other words, get on base, wait for the long ball, and don’t make outs by trying to steal or bunt the runner(s) along,.