Analysis of Pitcher Height using PITCHf/x Data

The post that follows is an academic exercise that I completed for grad school in 2016. I also did a follow up study on the same topic, which will be shared in another post. It's a bit more dense than my usual writing, so here are the highlights:

  • Weak correlation between height and perceived velocity, which suggests that taller pitchers do not gain as much in perceived velocity as is commonly believed.
  • Positive correlation between height and downward plane, which suggests that taller pitchers can produce a more difficult angle for the hitter.
  • No indication that a greater downward plane led to a lower exit velocity, but saw a negative correlation between downward plane and launch angle. In other words, pitchers are more likely to produce a lower launch angle as downward plane (i.e. height) increases.

You can view the code here.

Introduction

With an emphasis throughout professional sports on “bigger, stronger, and faster” performers, today’s professional athletes are physically larger than ever. This is evident in the NFL, where high-speed collisions become increasingly violent each year, and in the NBA, where the game moves at a faster pace than ever before. In Major League Baseball (MLB), while the paradigm has shifted from the muscle-bound players of the Steroid Era, teams search for pitchers who are tall, strong, throw extremely hard. The average height of an American male is approximately 5-foot-10, yet as of April 2015, 14 MLB teams did not have a single pitcher under 6 feet tall on their roster (Bryant). In the 2016 first year player draft, the Chicago Cubs selected five pitchers who are taller than 6’4” out of their first eight picks (Chicago Cubs 2016 Draft Results).

Logically, it makes sense that taller pitchers might throw with greater velocity because they are typically bigger and stronger. This was examined by baseball writer Zachary D. Rymer is 2013, with mixed results that did not show with certainty that taller pitchers throw harder. It also makes sense physiologically. They are able to extend farther towards home plate before releasing the pitch, meaning the batter has less time to react while the ball travels. In other words, taller pitchers may appear to throw the ball harder than they actually do, creating a “perceived velocity” greater than the actual velocity. Furthermore, taller pitchers can release the ball from a higher point than others, creating a more dramatic downward plane. Generally, pitches become more difficult to hit with greater vertical or horizontal movement. This trend towards taller pitchers was studied in detail in 2010 by Glenn Greenberg, and the results showed no significant difference between taller and shorter pitchers in effectiveness or durability (Greenberg).

Greenberg’s study used traditional pitching statistics (e.g. Innings Pitched, Earned Run Average, etc.), and there are more advanced metrics available today that may shed more light on the topic. The purpose of this report is to assess the validity of the theory that there is a positive correlation between height and pitching effectiveness in the current era, using data from the PITCHf/x motion-tracking system. Created in the early 2000s and now used in every MLB stadium, PITCHf/x captures data on the release point, speed, and trajectory of every pitch that occurs in every game throughout the league (Marchi, 21). It also captures data on the speed and trajectory of the ball after it is put in play by the batter, which provide a new way to evaluate pitching and hitting effectiveness in the form of “exit velocity” and “launch angle” (Glossary). The hypotheses are: 1) there will be a positive correlation between height and perceived velocity; 2) there will be a positive correlation between height and the vertical movement, or downward plane, of the pitch; 3) there will be negative correlations between perceived velocity and both exit velocity and launch angle; and 4) there will be negative correlations between downward plane and both exit velocity and launch angle, supporting the anecdotal evidence in favor of MLB’s preference for taller pitchers.

Methods

PITCHf/x data for every MLB game during the 2016 season through the end of June was extracted from the Statcast Search page on MLB’s website. This resulted in a dataset of 348,728 observations across 60 variables, spanning April, May and June 2016. This investigation relates to the current era of Major League Baseball, so historical data was ignored. Including historical data may have also made the dataset unmanageable given time and system limitations for this project. The PITCHf/x data does not include height, so additional data was downloaded from Baseball Prospectus to supplement the dataset with the pitcher’s height for each record. A subset of the data was taken to eliminate irrelevant variables, including the 11 variables shown in Table 1 below. 13,095 incomplete records were removed, leaving 335,633 records for total pitches thrown. These are likely due to the occasional malfunction by the PITCHf/x technology, but the deleted records represented only a small portion of the dataset and their removal was not expected to have a significant effect on the results. The “total pitches” dataset was used to examine perceived velocity and downward plane. A second dataset was also created with the same observations, but including only batted balls. The “batted balls” dataset contained 60,515 observations and was used to analyze exit velocity and launch angle. An exploratory data analysis and correlational analysis were performed on the data.

Table 1: Variable Definitions

Table 1: Variable Definitions

Descriptive statistics were calculated as the initial step in the exploratory analysis. The mean height in the total pitches dataset was 74.68 inches, with a minimum of 66 inches and a maximum of 82. The mean is used later in the analysis to classify pitchers as tall or short. The min and max values do not appear initially to be extreme enough to be considered outliers. The remaining values in Table 2 illustrate the average measurements for release height, height at home plate, spin-induced vertical movement, extension, velocity, and perceived velocity during the 2016 season. The minimum start_speed of 51.46 MPH is notable, but certainly possible, and did not appear to have a drastic effect on the mean so no further action was taken.

Table 2: Descriptive Statistics on all pitches thrown Apr-Jun 2016

Table 2: Descriptive Statistics on all pitches thrown Apr-Jun 2016

The summary statistics for batted balls are shown below in Table 3. The average batted ball in 2016 travels at 87.70 MPH at an angle of 12.679 degrees. According to the Statcast Glossary, that is a line drive (Glossary). 10 degrees or less equates to a ground ball, between 10 and 25 degrees is line drive, between 25 and 50 degrees is a fly ball, and above 50 is a pop-up. This method was used later in the analysis to classify batted balls by result.

Table 3: Descriptive Statistics on batted balls Apr-Jun 2016

Table 3: Descriptive Statistics on batted balls Apr-Jun 2016

Histograms were produced to assess normality. Figure 1 below presents the histogram for height, featuring a reasonably normal distribution.

Figure 1: Height, all pitches thrown

Figure 1: Height, all pitches thrown

Figure 2: Histograms, all pitches thrown

Figure 2: Histograms, all pitches thrown

Figure 3: Histograms, batted balls only

Figure 3: Histograms, batted balls only

Figures 2 and 3 above display histograms for the remaining variables in the datasets. For the most part, they appear to be normally distributed. There is some negative skewness, particularly in the velocity related fields, which is not surprising for the velocity related fields because most pitchers today throw higher than 90 MPH. In addition, the skewness and kurtosis values were calculated and are included below in Table 4. The values are a reasonable approximation to the values of 0 for skewness and 3 for kurtosis that represent a normal distribution, with the exception of the high kurtosis for release height. This indicates that the distribution for release height has heavy tails and outliers may be present.

Table 4: Normality assessment

Table 4: Normality assessment

Boxplots were produced to further assess the possibility of outliers in the data. Figure 4 below indicates 4 potential outliers based on height.

Figure 4: Height, all pitches thrown

Figure 4: Height, all pitches thrown

As suspected earlier, the boxplots show extreme outliers in release height. A few records appear to have values near or below zero. These may be technical anomalies, but they may also be valid observations from a “submarine” style pitcher who releases the ball near the ground. Overall, the dataset is so large that the small number of potential outliers does not have a significant impact.

Figure 5: Boxplots, all pitches thrown

Figure 5: Boxplots, all pitches thrown

Figure 6: Boxplots, batted balls only

Figure 6: Boxplots, batted balls only

After assessing normality and outliers, a correlational analysis was performed on the data using both Pearson’s product-moment correlation and Spearman’s rho, along with scatterplots for visualization. Pearson correlation is commonly used to quantify linear relationships, while Spearman’s rho depicts monotonic relationships. The data did not present clean linear relationships, so using both methods provides a more thorough analysis.

Results

Figure 7 illustrates the relationships between height and release height, height at home plate, spin-induced vertical movement, release extension, velocity, and perceived velocity. It is difficult to see any correlation amongst these variables. There is a barely distinguishable positive trend in Pitcher Height vs Release Height, driving the unsurprising inference that taller pitchers release the ball from higher point. The remaining plots each present an almost random distribution with no discernible pattern.

Figure 7: Scatterplots, all pitches thrown

Figure 7: Scatterplots, all pitches thrown

Each of the above relationships was submitted through a correlational analysis, producing the values presented below in Table 5.

Table 5: Correlation with Height, all pitches thrown

Table 5: Correlation with Height, all pitches thrown

As expected, release height exhibited a relatively strong correlation with height (r = 0.2963, rho = 0.3183). Release extension also produced a moderately strong correlation with height (r = 0.2581, rho = 0.2676). In both cases, Spearman’s rho is greater than Pearson’s r, indicating that the relationship is more monotonic than linear. While the correlation does not appear to be very strong, height is more positively correlated with perceived velocity (r = 0.0287, rho = 0.0699) than velocity (r = -0.0035, rho = 0.0345).

Height at home plate and vertical movement did not correlate strongly with height. Height at home plate is more dependent on the pitcher’s control and pitching style than height, while vertical movement from PITCHf/x is more dependent on the type of pitch thrown and shows a negative correlation to height. The attribute is defined as the vertical movement between release point and home plate, “as compared to a theoretical pitch thrown at the same speed with no spin-induced movement” (Fast). Essentially, it is the vertical movement driven by the pitch type, excluding the effects of gravity. This does not represent the “downward plane” discussed in the introduction. To create a simple approximation of the downward plane, a new variable was created as the difference between release height and height at home plate. The new variable, downward plane, yielded a much stronger correlation to height (r = 0.1165, rho = 0.1100).

Next, the relationship between height and pitching effectiveness, in terms of exit velocity, and launch angle, was examined. Correlations between exit velocity and height, release height, downward plane, release extension, and perceived velocity are presented in Table 6.

Table 6: Correlation with Exit Velocity, batted balls only

Table 6: Correlation with Exit Velocity, batted balls only

None of the variables display a strong correlation to exit velocity. There is no significant evidence of a negative correlation between downward plane and exit velocity (r = -0.0022, rho = 0.0013). There is actually a positive correlation between perceived velocity and exit velocity (r = 0.0838, rho = 0.0791).

The same variables were compared to launch angle, and the correlations are presented below in Table 7. In this case, downward plane is negatively correlated with launch angle (r = -0.1483, rho = -0.1574). On the other hand, perceived velocity produced a fairly weak positive correlation to launch angle (r = 0.0367, rho = 0.0386).

Table 7: Correlation with Launch Angle, batted balls only

Table 7: Correlation with Launch Angle, batted balls only

To reveal additional insight, observations were classified as tall or short based on whether the pitcher’s height was above or below the mean height of 74.68. They were also classified by their result based on the launch angle specifications mentioned in the Methods section. The scatterplots in Figure 8 use these classifications to display relationships between exit velocities and launch angle.

Figure 8: Exit Velocity Scatterplots, batted balls only

Figure 8: Exit Velocity Scatterplots, batted balls only

The hypotheses that taller pitchers induce lower exit velocities and launch angles via a sharper downward plane stipulates that a weakly hit ground ball is the desired result. In the plot of release height vs exit velocity, this would manifest itself in higher number of ground balls in the lower right corner of the plot. However, the desired pattern is not present in Figure 8. There does appear to be a larger number of ground balls near a release height of 6 feet, but that value is hardly above the mean release height of 5.887 feet. Likewise, the plot of release extension vs exit velocity offers no evidence that taller pitchers who achieve greater extension see any benefit in effectiveness. The plot of downward plane vs exit velocity says little about exit velocity but shows that a higher downward plane leads to more ground balls than fly balls. The final plot displays the same data classified by tall/short rather than result and reveals that taller pitches dominate the right half of the plot.

Figure 9: Launch Angle Scatterplots, batted balls only

Figure 9: Launch Angle Scatterplots, batted balls only

Figure 8 displays the relationship between downward plane and launch angle, classified by both result and tall/short. Once more, it is clear that taller pitchers are likely to produce a greater downward plane. It also appears that batted balls are more likely to be ground balls as downward plane increases.

Implications

The hypothesis that there would be a positive correlation between height and perceived velocity was true, though not to the degree that was expected. The correlation was rather weak, indicating that taller pitchers do not gain as much as common belief suggests in perceived velocity. The expectation that there would be a positive correlation between height and downward plane also proved to be true. Taller throwers can leverage their length to generate a more difficult angle for the hitter. In addition, the notion that there would be negative correlations between perceived velocity and both exit velocity and launch angle was demonstrably false. Both correlations were positive, offering no support of the theory that tall pitchers are able to induce weaker and less harmful contact by the batter. Finally, the hypothesis that there would be negative correlations between downward plane and both exit velocity and launch angle was partly true. While there was no support showing that a greater downward plane led to a lower exit velocity, there was a negative correlation between downward plane and launch angle. In other words, pitchers are more likely to produce a lower launch angle as downward plane increases.

The findings presented in this report can help reshape how Major League Baseball teams scout and search for pitchers. The analysis showed little correlation between height and perceived velocity, so teams seeking hard-throwing pitchers should not search on the basis of height. Taller pitchers may not be more likely throw the ball any harder than shorter pitchers, but their ability to create a greater downward plane from the release point to home plate may allow them to generate ground balls more frequently. Consequently, teams looking for an effective ground-ball-inducing pitcher may benefit from investigating taller pitchers.

This study did not distinguish between starting pitchers and relief pitchers. Often, these two types of pitchers have very different styles and goals and it may be wise to examine these relationships in that context. Furthermore, the data did not include any past seasons. Expanding the dataset would allow a more accurate examination of trends over time. The PITCHf/x data used in the study is more concentrated on the trajectory and spin of the ball than simple measures like height. Therefore, there may be confounding factors with some of the variables used in the analysis. There are also other fields that may prove enlightening on this topic, such as the swing and miss rate.

The correlational analysis was limited by its simplicity. A more comprehensive review or removal of outliers may have improved the model, though it is not likely with a dataset this large. A more complex analysis may look to build a predictive model for exit velocity and launch angle based on height and other related attributes, or compare tall and short pitchers to determine if there is a significant difference in certain areas.

References

Arthur, R. (2016, April 13). The New Science Of Hitting. Retrieved July 10, 2016, from http://fivethirtyeight.com/features/the-new-science-of-hitting/

Baseball Prospectus | Active Players by Year. (2016). Retrieved July 10, 2016, from http://www.baseballprospectus.com/sortable/extras/active_players.php?this_year=2016

Baseball Prospectus | Glossary. (2016). Retrieved July 10, 2016, from http://www.baseballprospectus.com/glossary/

Bryant, H. (2015, April 27). As athletes get bigger, they look less and less like us. Retrieved July 10, 2016, from http://espn.go.com/mlb/story/_/id/12751620/for-major-league-baseball-pitchers-bigger-better

Chicago Cubs 2016 Draft Results. (2016). Retrieved July 10, 2016, from http://chicago.cubs.mlb.com/team/draft.jsp?c_id=chc

Fast, M. (2007, August 02). Glossary of the Gameday pitch fields. Retrieved July 10, 2016, from https://fastballs.wordpress.com/2007/08/02/glossary-of-the-gameday-pitch-fields/

Glossary. (2016). Retrieved July 10, 2016, from http://m.mlb.com/glossary/statcast/

Greenberg, G. P. (2010). Does a Pitcher's Height Matter? Retrieved July 10, 2016, from http://sabr.org/research/does-pitcher-s-height-matter

Marchi, M., & Albert, J. (2013). Analyzing baseball data with R. Boca Raton, FL: CRC Press.

Rymer, Z. D. (2013, May 13). Do Taller Pitchers Throw Harder Than Average? Retrieved July 10, 2016, from http://bleacherreport.com/articles/1645950-do-taller-pitchers-throw-harder-than-average

Statcast Search. (2016). Retrieved July 10, 2016, from https://baseballsavant.mlb.com/statcast_search

Why Does MLB Prefer Taller Pitchers?

The Mariners and Slow Starts Throughout MLB History