Predicting Positional Flexibility in the NBA

The post that follows is an academic exercise that I completed for grad school in 2016 in which I built statistical models to predict the positional flexibility of NBA prospects using only data from the NBA combine. The data may be a bit outdated at this point, so this post is more about sharing my method than anything else. I may revisit it again in the future with a more current batch of prospects. You can view the code here.

Introduction

While there has been an emphasis throughout professional sports on “bigger, stronger, and faster” performers in recent years, the focus in the NBA has been on players with flexibility. Not flexibility of the body, characterized by a wide range of motion, but rather the ability to play more than position on the court. Based on definitions provided on the NBA’s website, the point guard (PG) is traditionally the smallest and quickest player in the lineup and acts as the “director” of the offense. The shooting guard (SG) is taller than the PG and is responsible for outside shooting and scoring. The small forward (SF) is generally larger than the SG in both height and weight, as he must be able to play inside near the basket, but quick enough to play defense on the perimeter. The power forward (PF) is generally one of the biggest and strongest players on the team. Finally, the center (C) is the tallest player in the lineup. However, in recent years, terms such as “positional flexibility”, “position-less basketball”, and “small ball” have become prominent throughout the sport.

Much has been written about position-less basketball and the evolution of the ideal NBA player. Longtime sportswriter Jeff Zillgitt defines position-less basketball as “having players who can play multiple positions on the court at the same time.” Teams with several interchangeable players are more difficult to defend and can also defend their opponents more aggressively (Zillgitt, 2012). NBA Editor with the Sporting News, Adi Joseph sought insight from several current players and coaches and concluded that teams have been shifting towards putting the five most talented players on the court, regardless of position, since the late 1990s and early 2000s (Joseph, 2016). Chris Herring, a sports journalist for The Wall Street Journal, reported that strictly defined positions have lost relevance in the NBA because young players are learning skills that were traditionally reserved for guards (Herring, 2012). Drew Cannon, a statistician with the NBA’s Boston Celtics, studied conventional position definitions and found a potential market inefficiency that could be gained if teams sought out specific skills rather than specific positions (Cannon, 2010).  

Height is no longer the most important factor in measuring an NBA prospect. Investigative reporter David Epstein examined the value of wingspan and discovered that, while most people have a wingspan approximately equal to their height, the average ratio between the two measures in the NBA is 1.06 (Epstein, 2012). He found wingspan to be an important predictor of both blocked shots and rebounds. NBA writer Jonathon Tjarks assets that the ultimate prototype for the “small ball” type of player is Draymond Green of the Golden State Warriors, who is listed at 6 feet 6 inches tall with a wingspan of 7 feet 2 inches. He dribbles, passes, and shoots like a guard, rebounds like a forward or center, and can defend anyone position on the opposing team (Tjarks, 2015). This evolution is not lost on today’s NBA prospects, who aim to avoid being labelled by one particular position, frequently list versatility as one of their strengths, and seek to emulate Green’s success (Jones, 2016).

While the research in the prior paragraph makes it clear that NBA prospects are no longer evaluated based on the rigid positional definitions of the past, it offers little quantifiable evidence that can be used to identify these position-less players. The purpose of this exercise is to examine data from the official NBA Draft Combine and create a model to quantify positional flexibility and identify prospects whose physical measurements indicate they may have the ability to play multiple positions.

Methods

Player measurement data from the official NBA Draft Combine were downloaded from the NBA scouting, statistics and analytics service DraftExpress. All publicly available records were included, covering the NBA Draft Combines from 2009 through 2016 and producing 458 observations on 19 variables. Physical measurements such as height, weight, and wingspan were included in the data, as well as the results of various physical tests like the bench press or agility drill. In addition, player statistics data were extracted from Basketball-Reference using the website’s Play Index search tool. The search was limited to regular season games from the 2009-10 NBA season through the 2015-16 NBA season, including only the first three seasons within that period for players who debuted in the league between the 2009-10 season and the 2013-14 season. The result was a dataset with 361 observations on 34 variables, one record for each player containing the per-36-minute averages for statistics such as rebounds, assists, and points over their first three NBA seasons combined. The reason for this level of specificity is that NBA teams are looking for a return on an investment made through the NBA draft during the first three seasons of that player’s career. Furthermore, as each player’s body changes over time, the measurements taken during the NBA Draft Combine become obsolete. The Combine results are likely to be most representative of a player’s true athleticism during his first three seasons.

The data from these two sources were merged together, resulting in a dataset of 215 observations on 16 variables. This decrease in observations was caused by the exclusion of players in the Combine dataset who never played in the NBA, as well as players in the NBA Stats dataset who did not attend the official Draft Combine. Many variables were also filtered out in order to focus on key measurements and stats. There were two nominal variables, Name and Position. The values for Position fell into one of five categories – PG, SG, SF, PF, and C – based on the established basketball position model. The remaining fourteen variables contained ratio level data in the form of physical measurements from the Combine and per-36-minute statistics from each player’s first three NBA seasons. Height, Weight, Body_Fat, Wingspan, Vertical, Bench, Agility, and Sprint represent measurements taken at the Combine. TRB, AST, STL, BLK, TOV, and PTS represent each player’s total rebounds, assists, steals, blocks, turnovers, and points, respectively, per 36 minutes of playing time during his first three seasons. Each variable is defined below in Table 1.

Table 1: Variable Definitions

Table 1: Variable Definitions

As the initial step in the exploratory analysis, the descriptive statistics presented in Table 2 were calculated. The mean value of Wingspan was 82.44 and the mean value of Height was 77.72 inches, producing a ratio between the two of 1.06. This figure is in line with the ratio reported by Epstein in 2012. The dataset contained missing values, primarily in Vertical, Bench, Agility, and Sprint, for players who did not participate in those particular drills at the Combine. These missing values were excluded from any subsequent statistical analyses. There were no obvious outliers based on the maximum and minimum values. Out of the 215 observations, there were 43 PGs, 55 SGs, 39 SFs, 41 PFs, and 37 Cs.

Table 2: Descriptive Statistics

Table 2: Descriptive Statistics

Figure 1 below presents a scatterplot matrix of the continuous variables included in the dataset, along with histogram for each variable and the correlation coefficients for each pair. In general, the histograms for all of the NBA Combine measurements depict a reasonable approximation of the normal distribution.

Figure 1: Scatterplot Matrix (click to enlarge)

The upper right corner of the graphic, bordered by a dashed line, designates the area where the NBA Combine measurements intersect with the player statistics. In agreement with studies mentioned in the previous section, Wingspan was found to be more strongly correlated than Height to both rebounds (TRB) and blocks (BLK). Interestingly, it was also observed that Vertical produced the only positive correlation to assists (AST) out of the Combine measurements. In addition, none of the physical measurements exhibited a strong correlation to steals (STL), turnovers (TOV), and points (PTS). This suggests that there are factors besides physical attributes, such as timing or shooting ability, involved in producing these statistics.

To further explore the data, a boxplot for each NBA Combine measurement was produced, differentiated by Position. These are displayed below in Figure 2.

Figure 2: Boxplots of NBA Combine measurements by Position (click to enlarge)

In the boxplots, overlap between the categories suggests that players at multiple positions exhibit similar measurements for the variable in question. There was very little, if any, overlap in the Height plot. This explains why height has traditionally been used to clearly delineate player positions. There was more overlap present in the Wingspan plot, proving, as suggested in the previous section, that it may be an indicator of positional flexibility. There was significant overlap in the plots for Body_Fat, Vertical, Bench, Agility, and Sprint, with the largest differences existing between the PG and C categories on opposite ends of the position spectrum. Therefore, the boxplots indicated that the NBA Combine measurements can be used to identify players with the ability to play multiple positions, rather than classify players into specific positions. Lastly, there were very few outliers present in any of the plots, as defined by the standard boxplot rule which designates any data point that lies far enough away from the interquartile range as an outlier. It was determined that none of the data points required further evaluation for removal from the study.

Next, a boxplot for each player statistic was produced, differentiated by Position. These are displayed in Figure 3 below. These plots were not particularly enlightening, confirming that point guards (PG) and centers (C) have traditionally been relied upon for assists (AST) and blocks (BLK), respectively. However, the overlap in the plots for steals (STL) and points (PTS) show that these important statistics can be accumulated by players at any position and do not require much specialization.

Figure 3: Boxplots of player statistics by Position (click to enlarge)

Logistic regression, which estimates the probability that an observation falls into one of two categories, was used to create a model for identifying prospects with the ability to play multiple positions. The first assumption of the binomial logistic regression model is that the dependent variable should be measured on a dichotomous scale. In order to accomplish this, a new binary variable was created for each category in the Position field. The variable names were PG, SG, SF, PF, and C. For example, if the Position value for a particular observation was “PG”, then the value under the PG field would be 1. If the Position value was anything other than “PG”, it would be 0. A separate logistic regression was run for each binary variable, validating the first assumption. The second assumption is that there are one or more independent variables, which can be either continuous or categorical. Every independent variable used in the model contained continuous data. The third assumption is that each observation is independent. In this study, each observation represented a unique NBA player so the third assumption holds. The fourth assumption is that a linear relationship between the continuous independent variable and the logit transformation of the dependent variable exists. In this case, there is no indication that such a relationship does not exist.

To create the model, a logistic regression was performed on each position-indicating binary variable (PG, SG, SF, PF, and C) with Wingspan, Body_Fat, Vertical, Bench, Agility, and Sprint as the independent variables. Height and Weight were excluded in order to focus on measures that are not traditionally used to define positions. In other words, five separate models were created, one for each position category. Ultimately, the probability of the player being labelled as a PG, SG, SF, PF, or C was estimated for each observation. These five probabilities were summed together to produce a “Flex Score”, of which larger values indicate a player who is more likely to possess positional flexibility.

Results

Predictive accuracy and predictive value are typically used to assess and interpret a logistic regression model. However, in this study the focus was not prediction or true classification. Rather, the model sought to use the probability from the logistic regression as part of a score used to assess a player’s positional flexibility. The top 15 “Flex Scores” are presented below in Table 3.

Table 3: Top 15 Flex Scores

Table 3: Top 15 Flex Scores

The table shows several players who may be able to play multiple positions, indicated by the red and orange highlights. Surprisingly, the model indicated that one player, Jeremy Tyler, was nearly an equal fit at both SF and C. However, it was immediately apparent that this list was dominated by point guards and centers, the two positions that reside on opposite ends of the spectrum of NBA player size. Most of the Flex Scores listed above are inflated by a large probability value nearly equal to one in either PG or C. This revelation led to an amendment in the model. As espoused in the Introduction section, position-less basketball is characterized by a lineup filled with players of similar size and athleticism and is regularly referred to as “small ball”. Therefore, players in the first and last quartiles for Height, below 73.5 inches and above 84.4 inches, were removed from the dataset. This left 108 observations that fell into a range of height that is optimal for this type of basketball. The Flex Score model was refit onto the reduced dataset. Table 4 displays a comparison of the two sets of models.

Table 4: Logistic regression model comparison

Table 4: Logistic regression model comparison

The lower AIC values for the “small ball” models indicate that these are much better models. The top 15 Flex Scores for the “small ball” model are shown below in Table 5.

Table 5: Top 15 Flex Scores for “small ball” players

Table 5: Top 15 Flex Scores for “small ball” players

This table shows several players who may be able to play multiple positions, and also fit into the position-less basketball scheme that has become popular throughout the NBA. There is more flexibility present in the above table, as well, with fewer scores boosted by large values on either end of the position range. Many of these players may not fit the prototypical model for a particular position, but possess physical measurements that substantiate their versatility. The model had the most trouble quantifying the flexibility of small forwards (SF), who are typically the most adaptable player in the lineup already and may not obviously fall into any of the position categories.

Implications

The logistic regression models produced a score for each player’s positional flexibility, or a “Flex Score”. This score can be used by NBA teams not as a replacement for scouting, but as a supplement to their existing evaluation process. The model can be tweaked in a variety of ways depending on the type of player a team is seeking. It can focus on smaller players with the ability to play both guard positions (PG and SG) and taller players with the ability to play both “big” positions (PF and C). Not every player must have the ability to play any position in the lineup, like Draymond Green. The flexibility offered by a prospect who can play even two positions successfully is valuable. The model can also focus on players in the “small ball” mold and assess their ability to play a wider range of positions.

Predictive accuracy is not critically essential to the Flex Score model, as it was used merely as a means of identifying versatile prospects and quantifying their positional flexibility. However, the model only assesses players based on physical measurements and athletic performances at the NBA Draft Combine, and could be improved with the inclusion of more variables from other sources of scouting. One obvious basketball skill that is lacking in the model is shooting ability. Shooting is a major factor in the position-less basketball model and, thus, could be even more indicative of positional flexibility than the physical measurements used in this study. Shooting statistics from college or leagues overseas could be included for each prospect.

The model also excludes prospects who did not attend the official NBA Draft Combine. There are many other pre-draft camps and events at which a player can showcase his skills and athleticism. NBA scouting is a comprehensive exercise that must encompass a wide range of factors and should not be limited to data from one event.

Including height and weight in model may have given it more predictive accuracy but that would have undermined the primary purpose of the exercise, which was to build a model that quantifies positional flexibility based on physical measurements besides the traditional height and weight. Adding other variables like shooting could have resulted in a better model for all players rather than focusing on “small ball” players. Despite these limitations, the model deployed in this report provides a framework for quantifying positional flexibility in NBA prospects that could be very useful in pre-draft evaluation.

References

Basketball-Reference NBA & ABA Player Directory. (2000-2016). Retrieved August 14, 2016, from http://www.basketball-reference.com/players/

Cannon, D. (2010, August 2). Five Players. Retrieved August 14, 2016, from http://www.basketballprospectus.com/article.php?articleid=1190

Epstein, D. (2012, November 05). The Case for ... Wingspan. Retrieved August 14, 2016, from http://www.si.com/vault/2012/11/05/106252287/the-case-for--wingspan

Glossary. (2016). Retrieved August 14, 2016, from http://stats.nba.com/help/glossary/

Herring, C. (2012, June 06). The Rise of the Position-Less Player. Retrieved August 14, 2016, from http://www.wsj.com/articles/SB10001424052702303753904577450500740492554

Jones, J. (2016, May 16). NBA prospects don't want to be confined to one position. Retrieved August 14, 2016, from http://www.sacbee.com/sports/nba/sacramento-kings/kings-blog/article77991572.html

Joseph, A. (2016, April 07). LeBron James and Paul George can only avoid NBA's revolution for so long. Retrieved August 14, 2016, from http://www.sportingnews.com/nba/news/lebron-james-paul-george-small-ball-revolution-nba-playoffs-kevin-durant/teauavmfaa4w1cs69qx6n1z8u

NBA Pre-Draft Measurements. (2016). Retrieved August 14, 2016, from http://www.draftexpress.com/nba-pre-draft-measurements/

Player Season Finder. (2000-2016). Retrieved August 14, 2016, from http://www.basketball-reference.com/play-index/psl_finder.cgi

Players and Positions. (2016). Retrieved August 15, 2016, from http://www.nba.com/canada/Basketball_U_Players_and_Posi-Canada_Generic_Article-18037.html

Standing Reach. (2016). Retrieved August 14, 2016, from http://basketball.about.com/od/collegebasketballglossary/g/standingreach.htm

Tjarks, J. (2015, November 20). The Epitome Of Positionless Basketball - RealGM Articles. Retrieved August 14, 2016, from http://basketball.realgm.com/article/239916/The-Epitome-Of-Positionless-Basketball

Zillgitt, J. (2012, October 18). LeBron James, Miami Heat find no position like no positions. Retrieved August 14, 2016, from http://www.usatoday.com/story/sports/nba/heat/2012/10/18/miami-lebron-james-position-versatility-flexibility/1642623/

In Defense of Lucky Pitching Numbers

In Defense of Lucky Pitching Numbers

Replacing Dexter Fowler