Header image  
where jay's nerdiness & the world meet  
  

 
 
 

 
 
Drivers of Runs Scored

Initially I started with a simple three year data set from espn.go.com. I later assembled a ten year data set from Baseball Prospectus.

My goal was to see which statistics best explained team run performance, allowing better understanding of which statistics are the best judge of a
players' offensive contributions.

The results are probably not very surprisingly to anyone with a basic numeric interest: OPS is the basically the key driver. There is a slight bias
towards OBP vs. SLG. My best performing model (using r-squared as my criteria), weights OBP at 70% and SLG at 30%, rather than the 50/50
implied in OPS. The r-squared was very good: .915. This says to me that this metric explains just about all you can expect to in team offense, the
rest is luck. There may be more, but it is marginal. This also says to me that you want your team to have as much of the OPS (OPS70 I call it) as
you can. The more plate appearances in a season you can devote to players that have more of it, the more runs you are going to score.

The thing that I find amazing is that given that anyone has dug into the matter at all, can see that OPS is the overwhelming factor. Yet, it still
remains a rarity for an announcer or a screen graphic to mention OPS. My assessment, rough as it is, is that baseball is about halfway from old
school to Money Ball in the front offices. Perhaps further along. But broadcasting is still only about a twentieth of the way, at best.

Total Runs Scored
Shown below is a chart plotting all teams for the past 10 years, showing modeled predicted runs vs. actual runs. The model used OPS70, OPS
factored to weight OBP 70% and slugging 30%. I ran this off a large list of variables using the "step method" of linear regression (letting the
program find the best variables) and OBP and SLG, weighted 70/30 were produced the highest r-squared (.914)

You get a slightly higher r-squared, up to .915 if you included ground ball to fly ball ratio (the more ground balls you get, the fewer runs you score)
but excluded for simplicity because on of my uses of my formula was to calculate runs created by player, and it is often difficult and cumbersome to
calculate that by player.


Chart Detail
The vertical (Y) axis are the actual runs scored by teams (10 years, ~30 teams/year, ~300 data points) plotted against the modeled or predicted runs scored. As shown by the formula, the
r-squared, or measure of the explanatory value of the model (0 = no explanatory value, 1 = perfect), .913 is, by my view, quite high. The equation shown is the one that Excel came up
with when I added a trend line to the results of my data. I am not sure why, but the r-sqaured that my statistic software showed was a bit higher (.914). Both are good, .909 and .913.

As mentioned above, the "X" in the model is modified OPS variable that weights OBP by 70% and SLG by 30%. Regular OPS adds them together, implying a 50/50 allocation. The
actual best weighting I get is 69.5% for OBP, but I rounded it to 70%, lowering my r-squared from .915 to .914.

I ran the data for runs scored per plate appearance, and the results were basically the same. Below is a list of the variables I input and ran against runs scored. The quasi-OPS variables were the strongest.