Baseball Prospectus recently introduced three new pitching metrics – Power Score, Command Score, and Stamina Score. I highly recommend reading the article for more information about the calculations behind each one, but collectively they aim to “describe how a pitcher gets to their results.” They are reported on a simple 0-100 scale, with 100 being the best all-time score.
I wanted to know more about them, so I downloaded data from Baseball Prospectus and plotted some things. The new scores are available for all seasons from 2008 through 2017, so everything that follows is based on that period. The first thing I discovered is that none of them display a strong relationship with FIP, DRA, or ERA. To me, this is an indication that these new scores are a meaningful, nonredundant addition to the pitching stat line.
Power Score goes beyond pure velocity, measuring the extent to which pitchers rely on it in terms of fastball usage and velocity of offspeed pitches. Power pitchers are generally thought of as strikeout artists, who may sacrifice some control as a tradeoff for increased speed. To see what Power Score says about that theory, I plotted it against strikeouts and walks plus HBP per plate appearance.
Neither of the above plots show any discernible correlation. Power Score is meant to be a reflection of a pitcher’s process, not his results, and the plots support that claim. Now let’s see the other side of the supposed tradeoff.
Nothing there either. I wasn’t necessarily expecting to see anything. Rather, this confirms that Baseball Prospectus is distinguishing command from control (i.e. not walking or hitting batters), which is important. Again, the focus is on process, not outcomes.
Stamina Score is a measure of workload based on pitches thrown, batters faced, and days of rest. The plots below show how it relates to traditional workload stats used to assess starting pitchers.
Unlike the others, Stamina Score is founded in results more than process. There is clearly a relationship; the more innings pitched, the higher the score. However, there is enough variation in the bottom half of the chart to prove that it’s not all about innings or games pitched. Given the rise of “bullpenning” in today’s game, Stamina Score might provide a more meaningful measure of durability and reliability.
Next, I looked at the year-to-year correlations for each of these scores. A strong correlation from year one to year two suggests that the stat measures a repeatable skill among players, while a weak correlation implies year-to-year performance that is more influenced by random variation than talent.
The plots show strong correlations for Power Score and Command Score, and a moderate correlation for Stamina Score. This makes sense. Pitchers who throw hard continue to throw hard. Likewise, pitchers who throw strikes continue to throw strikes, though with less certainty. Stamina shows even less certainty from year to year, as injuries and ineffectiveness take their toll.
The colors distinguish between starters and relievers (determined by whether they threw more innings as a starter or in relief), and we can also see differences there. In the Power plot, you can see more red (relievers) at the high end and more blue (starters) at the low end. The opposite is true in the Command plot if you look close enough, and there is a clear distinction in the Stamina plot. All of this aligns with what we see watching the game. I dove deeper into the difference between starters and relievers, but the above plots pretty much sum it all up.
One more plot, showing the average scores for each age (minimum 10 pitcher seasons per age).
As we’ve seen throughout this exercise, power runs converse to command. Starters and relievers lose power equally as they age, while Command Score generally improves with age. Stamina scores for starters remain fairly constant across all ages, while they decrease over time for relievers. I suspect this is related to young pitchers serving in long relief roles initially, then progressing towards regular middle or late inning work.
That’s it for now. None of this is particularly surprising, but hopefully it sets a baseline and builds some comfort with these stats going forward. I am particularly interested in Command Score, having attempted to quantify command myself in a somewhat similar fashion. I'm in the process of updating my work on that topic and will have a post soon comparing it to BP’s Command Score.