As with runs scored, I assembled a data set of 10 years' of team pitching statistics (1996-2005) to ascertain which variables are most important in
predicting how many runs a team will allow. Again I used my statistical software (SPSS) linear modeling function to run the "Stepwise" method of
modeling, where it picks the best of an offered group of variables.
I did not include ERA. I preferred using a metric that was measured individual performance as closely as possible. My perception of ERA, at least
among relief pitchers, is often dependent on other pitchers (e.g. the following pitcher allows a triple, clearing the bases of your runners vs. the guy
coming in a striking the guy out). My feeling is ERA is a good measure for starting pitching but less so for relief pitching. I wanted a metric that
was the same across all pitchers, so avoided ERA. Also, this eliminates the difference between an earned run and an unearned run, which I find
very arbitrary.
The best model was a combination of WHIP, SLG and GB/FB, with an r-squared 0f .919. As with runs scored, I eliminated the GB/FB ratio for
simplicity, slightly lowering the r-squared (to .918). Logically, WHIP and SLG are very similar to OPS (a measure for how often hitters reach, in this
case WHIP, and a measure of total bases, SLG), but this combination did have a slightly better r-squared than using OPS or any combination of
SLG and OBP weighted in different amounts (e.g. OBP weighted 70%, SLG 30%).
As an additional step, I did the same excerise, calculating runs per IP, which slightly improved the r-squared (to .920).
Total Runs Allowed
This chart shows the results of my predicted team runs allowed vs. actual. The exercise was very similar to the runs allowed. I took an exhaustive
variable list assembled from team data over the last 10 years, and used the "Stepwise" method to build an linear regression model to find
combination of variables to predict actual runs allowed. The best fit I could find was using SLG, WHIP and GB/FB (r-squared = .919). As with runs
scored, I excluded the GB/FB for simplicity, lowering the r-squared to .918. I have included the total list of variables.
Chart Detail
The vertical (Y) axis are the actual runs allowed by teams (10 years, ~30 teams/year, ~300 data points) plotted against the modeled or predicted runs allowed on the horizontal axis (X).
As shown by the formula, the r-squared, or measure of the explanatory value of the model (0 = no explanatory value, 1 = perfect), .918 (very strong). The equation shown is the one
calculated by Excel and does not represent the equation produced by the statistical software.
As with the runs scored, I also ran for innings pitched, and got basically the same result.