I've read a lot about pitch tunnels, sequencing, and pitcher similarity this year. As a former catcher, these topics have a special significance to me. My favorite part of the game was calling pitches; considering the strengths and weaknesses of both the pitcher and the hitter, the umpire's strike zone, the game situation, and many other variables to determine what pitch to throw next.
Pitch tunnels, in particular, strike a chord with me. While I know that this approach may not work for everyone, I have seen firsthand how it can lead to success. One of the best pitchers I caught in college relied almost exclusively on a fastball/ straight changeup combo. Batters knew they could completely forget about any sort of breaking ball, but the combination of precise location and velocity difference was devastating nonetheless.
That being said, expect to see more from me on this topic in the future. For now, I wanted to share a piece I completed last year for grad school in which I compared different pitch types and assessed the performance of each one. While it's not nearly as sophisticated as the articles I linked to earlier in this post, it's a start. You can view the code here.
(Warning: As with other academic pieces I've posted, the writing is a bit more dense)
The most elemental part of baseball at any level is the battle between pitcher and batter. This interaction sets in motion everything else that occurs in the game. Various pitch types are employed by pitchers as they attempt to induce swinging strikes and weakly hit groundouts or popups from batters. Common pitch types include the fastball, curveball, and changeup. In the past, pitches like the spitball and screwball have risen to prominence in Major League Baseball and subsequently faded. More recently, the slider and cut fastball, or cutter, have gained popularity. A glossary defining each pitch type is included below.
Since the pitcher’s goal is always to get the batter out, it can be assumed that MLB pitchers aim to throw the types of pitches that are most effective in accomplishing that feat. The effectiveness of different pitch types has been studied in the past, but the focus has primarily been on specific pitch types thrown by individual pitchers (e.g. Pitcher A’s fastball or Pitcher B’s curveball). Prominent baseball analyst and data journalist Eno Sarris studied the best pitches in baseball during the 2015 season and developed a scoring system based on swinging strike rate and ground ball rate, arguing that these are two of the most important outcomes a pitch can have (Sarris, 2016). He used it to rank pitches per player per pitch type, but noted that many other factors make a particular pitch effective. His method found that the best pitch during the 2015 season was a particular relief pitcher’s sinker, but it was closely followed by other pitch types including curveballs, sliders, and fastballs. Furthermore, a report published by sports data and technology company STATS used batting average against to rank the best individual pitches in the game during the 2015 season (STATS, 2015). The top ten pitches produced by this method included mostly off-speed pitches such as sliders, curveballs, and changeups. Jeff Sullivan, a prominent writer at baseball statistics and analysis website Fangraphs.com, opted to use lowest contact rate to identify the most “unhittable” pitches in the league in 2013 (Sullivan, 2013). Similar to the results produced by STATS, the best pitches according to this approach were sliders, curveballs, and changeups.
Despite all of this research, Major league players and managers still assert that a well-placed fastball is the most difficult pitch to hit (Berry, 2013). Statistical expert Mike Fast, currently employed by MLB’s Houston Astros as an Analyst, investigated the correlation between fastball velocity and performance, discovering that starting pitchers improve by one run allowed per nine innings for every 4 mph increase on fastball velocity (Fast, 2010). Relief pitchers achieve the same gain in performance for every 2.5 mph increase. It appears that one can determine that any particular pitch type is the best, depending on the metrics used in the assessment.
The purpose of this study is to examine the differences between pitch types in Major League Baseball and identify which pitch types have performed better than others during the 2016 season, as measured by batted ball hit speed, and opponent batting average on balls in play (BABIP). Its implications could impact areas such as scouting, roster construction, and player development. The data used in this report were generated by PITCHf/x, an advanced technology and software system that tracks nearly everything that occurs on the field during a game. Created in the early 2000s, it has been installed in every MLB stadium since the beginning of the 2008 season, tracking data such as the velocity and acceleration of the ball for each game (Marchi, 2013). Based on evidence presented above, the hypotheses for this study are: 1) slower pitch types such as curveball or changeup will perform better on batted ball hit speed; and 2) the downward breaking sinker will perform better on BABIP.
PITCHf/x data for every MLB game during the 2016 season through August 20th was extracted from the Statcast Search page on MLB’s website. This resulted in a dataset of 541,581 observations across 60 variables. A subset of the data was taken to eliminate irrelevant variables and 13,095 incomplete records were removed, leaving 528,486 remaining observations. There was no explanation for the missing data upon downloading the data, but the most likely explanation is that the PITCHf/x technology failed to capture some aspect of the trajectory for those particular pitches – which accounted for only 2.4% of the dataset. The 12 variables used in the study are included in Table 1 below, along with a description and example of each.
Of particular importance is pitch type, which indicates the type of pitch thrown and serves as the independent variable throughout this study. Start speed is the traditional measurement of velocity used throughout baseball history. Effective speed represents how fast the pitch appears to the batter and is calculated based on the pitcher’s release extension, or how far towards home plate their arm is when the pitch is thrown. Spin rate is an additional PITCHf/x field that represents the rate of spin on the ball after it is released by the pitcher (Glossary, 2016). Break angle and break length are measurements of the ball’s trajectory that the PITCHf/x system uses to determine the pitch type (Fast, 2007). Hit speed measures the speed of the ball after it is struck by the bat (Glossary, 2016). For pitchers, a lower value of hit speed is desired. The mean values for each of these variables, per pitch type, are shown below in Table 2.
Some pitch types are simple and thrown by nearly every pitcher while others are difficult to master and rarely put to use. It is clear that the four-seam fastball (FF) was by far the most commonly thrown pitch during the period, with other variations of the fastball also among the most heavily used. On the other hand, the screwball (SC) was the most seldom used pitch type with only 25 occurrences in 2016. Also, it was evident from the large variation in speed, spin rate, break angle, and break length that the characteristics of each pitch type are rather unique.
Figure 1 presents boxplots of each variable by pitch type. These plots illustrated how the trajectories vary drastically for each pitch type. The disparities exhibited below suggested that the differences between pitch types may be significant. Conversely, the boxplot for hit speed appeared to be fairly uniform across all pitch types. However, slight variations in hit speed can make a substantial difference in baseball so a statistical test was used to determine if there were, in fact, differences between groups for hit speed.
An analysis of variance (ANOVA) is commonly used when there are one or more categorical independent variables and one continuous dependent variable. A key assumption for the ANOVA test is that the dependent variable is normally distributed within each group that is being compared. The histograms presented below in Figure 2 illustrate that the distributions for start speed, effective speed, spin rate, break angle, break length, and hit speed are decidedly not normally distributed. There are indications of both positive and negative skewness, as well as bimodal distributions.
The distributions deviate from normality even further when plotted by pitch type. These plots are displayed in Figure 3 below. The histograms for hit speed, located in the bottom right corner of Figure 3, vary considerably between pitch types. With hit speed being the dependent variable of interest, the normality assumption was violated and the ANOVA test was ruled out as a means to investigate the differences between groups. The Kruskal-Wallis Test, a rank-based nonparametric test, was utilized instead.
Unlike the ANOVA test, the Kruskal-Wallis Test is not based on any assumption about the shape of the distribution. Therefore, the non-normality displayed in the histograms is acceptable. One assumption that must be met is that the dependent variable should be measured at the ordinal or continuous level. In this case, the dependent variable was measured at the continuous (i.e. ratio) level. In addition, this test is commonly used when there are 3 or more groups to compare. In this case, there were 14 different pitch types present in the dataset. Finally, observations in each group or between the groups themselves must be independent. Since each observation in the dataset represented a unique pitch that occurred during the 2016 MLB season, this assumption was also met.
The Kruskal-Wallis Test for hit speed by pitch type produced a chi-squared test statistic of 1118.9 and p-value less than 2.2e-16. Compared to the critical chi-square value for a 95% confidence interval with 14 degrees of freedom of 23.7, this test statistic provided sufficient evidence to reject the null hypothesis that all groups are identical. Therefore, it was concluded that the hit speed for at least one group differs from the others. The mean hit speeds for each pitch type are shown below in Table 3.
However, this test does not indicate in which groups the difference lies. In order to determine which specific pitch types presented a statistically significant different in hit speed, a post-hoc pairwise comparison test was performed on the data. Selected results are presented below in Table 4, with the full results here.
Table 4 reveals 23 out of 91 pairwise comparisons that produced a p-value sufficient to reject the null hypothesis and conclude that there was a statistically significant difference in hit speed between the two pitch types. The interpretation was derived by comparing the mean hit speeds shown in Table 3 for the two pitch types in question. For example, mean hit speeds for the changeup (CH) and four-seam fastball (FF) were 86.06 and 89.21, respectively. Therefore, it was concluded that the changeup generated a lower hit speed than the four-seam fastball. Most of the pairs shown above represent a comparison between a slower off-speed pitch like the changeup (CH), knuckleball (KN), or slider (SL) and faster pitch like the four-seam fastball (FF), sinker (SI), or two-seam fastball (FT).
Hit speed indicates only how hard the ball was hit by the batter with no indication of the result of the play. In order to assess performance from an outcome perspective, the statistic Batting Average on Balls In Play (BABIP) was calculated for each pitch type. Using the “event” variable within the dataset that indicates the result of each at bat, a subset was taken to include only observations with a hit speed greater than zero and only one observation per at bat (i.e. one record for every at bat that ended with a ball put in play). The aggregate batting average for each pitch type was calculated as total number of hits divided by total number of at bats. The results are displayed below in Table 5.
This value was combined with the batted ball dataset such that each observation reflected the BABIP for its associated pitch type and a Kruskal-Wallis Test was performed. The test produced a chi-squared test statistic of 84620 and p-value less than 2.2e-16. Therefore, it was concluded that the BABIP for at least one pitch type differs from the others. The same post-hoc pairwise comparison was completed and selected results are shown below in Table 6, with the full results here.
In this case, 73 out of the 91 possible pairs produced a p-value low enough to reject the null hypothesis and conclude that there was a statistically significant difference in BABIP between the two pitch types. Again, the interpretation was derived by comparing the BABIPs shown in Table 5 for the two pitch types in question. One noteworthy comparison was between the changeup (CH) and the slider (SL). It was revealed that BABIP for the changeup (CH) was significantly different than BABIP for the slider (SL) despite a difference of only 0.002 in those values. Likewise, the sinker (SI) showed a statistically significant difference from the four-seam fastball (FF). Overall, slower off-speed pitches like the changeup (CH) or slider (SL) produced BABIPs lower than faster pitches like the sinker (SI) or four-seam fastball (FF). Nearly all of the 18 comparisons that failed to reject the null hypothesis included the eephus (EP), the screwball (SC), or unknown (UN) pitch types. These pitch types are either rarely used “trick” pitches or failures by the PITCHf/x system to classify the pitch. Therefore, the practical interpretation of this test is that while the differences in BABIP between each commonly used pitch type may be small, they are likely to be statistically significant.
The hypothesis that slower pitch types such as the curveball or changeup would perform better on batted ball hit speed proved to be true. The curveball (CU), changeup (CH), and slider (SL) each displayed a significantly lower hit speed than the four-seam fastball (FF), two-seam fastball (FT), and sinker (SI). In other words, a pitch thrown at a slower speed resulted in the ball being struck by the batter at a slower speed. The hypothesis that the sinker would perform better on BABIP proved to be false. Rather, it produced a higher BABIP than most pitches, including all off-speed pitch types, and pairwise comparisons revealed these differences to be statistically significant. Based on the results of this study, slower pitch types resulted in fewer hits than faster ones.
These results could impact areas such as scouting, roster construction, and player development. Teams may seek out players who specialize in throwing a curveball, changeup, or slider rather than players who throw a high-speed fastball. They may also try to teach these pitch types to their prospects in the minor leagues to develop more effective pitchers for the major league team. In-game strategy may also be impacted. Pitchers may choose to throw a curveball or changeup rather than a fastball, knowing that these pitch types induce weaker contact and a lower batting average.
One limitation of this study is that it did not include historical data. The PITCHf/x database contains a massive amount of data and analyzing all of it requires more computing resources than were available. It would be interesting to investigate how the performance of different pitch types has changed over time. The recent preference across MLB has been for pitchers who throw faster, so one possible explanation for the results of this study is that batters have adjusted to faster pitches and are less prepared to hit slower pitches. Also, it did not distinguish between left-handed and right-handed throwers. Pitch types behave differently (i.e. break in different directions) depending on which hand the pitcher throws with, so this may have changed the results.
Baseball Prospectus | Glossary. (2016). Retrieved July 10, 2016, from http://www.baseballprospectus.com/glossary/
Berry, A. (2013, April 26). Hitters and pitchers agree: There is no pitch tougher to hit in baseball than a well placed fastball. Retrieved August 26, 2016, from http://m.mlb.com/news/article/45834916/hitters-and-pitchers-agree-there-is-no-pitch-tougher-to-hit-in-baseball-than-a-well-placed-fastball/
Digging Into the Data Behind Baseball's 10 Toughest Pitches. (2015). Retrieved August 26, 2016, from http://www.stats.com/insights/mlb/digging-into-the-data-behind-baseballs-10-toughest-pitches/
Fast, M. (2007, August 02). Glossary of the Gameday pitch fields. Retrieved July 10, 2016, from https://fastballs.wordpress.com/2007/08/02/glossary-of-the-gameday-pitch-fields/
Fast, M. (2010, April 5). Lose a tick, gain a tick. Retrieved August 26, 2016, from http://www.hardballtimes.com/lose-a-tick-gain-a-tick/
Glossary. (2016). Retrieved July 10, 2016, from http://m.mlb.com/glossary/statcast/
Marchi, M., & Albert, J. (2013). Analyzing baseball data with R. Boca Raton, FL: CRC Press.
Pitches - BR Bullpen. (2016, May 5). Retrieved August 26, 2016, from http://www.baseball-reference.com/bullpen/pitches
Sarris, E. (2016, January 25). Last Year’s Best Pitch By the Numbers | FanGraphs | FanGraphs Baseball. Retrieved August 26, 2016, from http://www.fangraphs.com/plus/last-years-best-pitch-by-the-numbers/
Sullivan, J. (2013, November 11). Identifying 2013’s Most Unhittable Pitches. Retrieved August 26, 2016, from http://www.fangraphs.com/blogs/identifying-2013s-most-unhittable-pitches/