-
PDF
- Split View
-
Views
-
Cite
Cite
Paola Zuccolotto and others, Alley-Oop! Basketball Analytics in R, Significance, Volume 18, Issue 2, April 2021, Pages 26–31, https://doi.org/10.1111/1740-9713.01507
Close - Share Icon Share
Abstract
Paola Zuccolotto, Marica Manisera and Marco Sandri give a play-by-play of basketball data analyses to assist students, coaches, technical experts and fans in scrutinising the sport, its teams and players
Interest in sport statistics has grown rapidly in recent years. In the scientific literature, increasing numbers of papers are published dealing with statistical methods and analyses applied in a wide range of sports, including American football, soccer, basketball, volleyball, baseball and ice hockey. Teams and individual athletes also increasingly base their decisions about game strategies or training on the analysis of performance data, while magazines and newspapers make use of statistics and data visualisations in response to their readership's growing fascination with the topic.
In response to this, the University of Brescia, Italy, created an international project, Big Data Analytics in Sports (BDsports; bdsports.unibs.it), with the aim of establishing links between scholars and professionals who have a shared interest in sport statistics.
BDsports has given rise to a number of activities and applications, including a new tool for basketball analytics: an R package called BasketballAnalyzeR. It contains a number of functions aimed at developing both basic and complex statistical analyses of basketball data, and in this article we show some analyses that can be carried out, focusing – by way of illustration – on a single player: Luka Dončić, whose performance in the 2018/2019 National Basketball Association (NBA) regular season earned him the NBA Rookie of the Year Award.
We use two data sets: players’ “box scores” and the “play-by-play” data relative to the 82 games played by Dončić's team, the Dallas Mavericks, during the regular season (October 2018 to April 2019). In basketball, a box score is a summary of the results from one or more games, structured according to a set of game variables, summarised in Table 1. The box score data are usually derived from a statistics sheet recorded manually or with the help of proper equipment, and are then summarised into frequency tables or averages, referring to the achievements of a team or individual players. A play-by-play, meanwhile, is a set of data that recounts each play of a game as it occurred, along with relevant information about the recorded event – for example, time, player(s) involved, score, or location on the court (Figure 1).
Key features of a basketball court. Photo by Oleksii S. on Unsplash.
Overview of box score game variables (which may apply either to a team or an individual player).y
| Game variable . | Description . | Abbreviation . |
|---|---|---|
| Minutes played | Total minutes spent in game | MIN |
| Points scored | Total points scored | PTS |
| Two-point shots | Number of shots taken from within the three-point line (or arc), either attempted (A) or made (M). | P2A or P2M |
| Three-point shots | Number of shots taken from outside the three-point line (or arc), either attempted (A) or made (M). | P3A or P3M |
| Free throws | Number of free throws (unopposed attempts to score awarded after a foul on the shooter by the opponent), either attempted (A) or made (M). | FTA or FTM |
| Rebounds | Number of total (offensive and defensive) rebounds (a rebound occurs when the ball is retrieved after a missed field goal or free throw). | REB |
| Offensive rebounds | Number of offensive rebounds (when the missed field goal or free throw has been attempted by the team). | OREB |
| Defensive rebounds | Number of defensive rebounds (when the missed field goal or free throw has been attempted by the opponent). | DREB |
| Assists | Number of ball passes made by a player to a teammate in a way that leads to a score by field goal. | AST |
| Turnovers | Number of lost possessions of the ball. | TOV |
| Steals | Number of turnovers legally caused to the opponent. | STL |
| Blocks | Number of legal deflections of a field goal attempt by the opponent in order to prevent a score. | BLK |
| Personal fouls | Number of breaches of the rules that concern illegal personal contact with an opponent. | PF |
| Game variable . | Description . | Abbreviation . |
|---|---|---|
| Minutes played | Total minutes spent in game | MIN |
| Points scored | Total points scored | PTS |
| Two-point shots | Number of shots taken from within the three-point line (or arc), either attempted (A) or made (M). | P2A or P2M |
| Three-point shots | Number of shots taken from outside the three-point line (or arc), either attempted (A) or made (M). | P3A or P3M |
| Free throws | Number of free throws (unopposed attempts to score awarded after a foul on the shooter by the opponent), either attempted (A) or made (M). | FTA or FTM |
| Rebounds | Number of total (offensive and defensive) rebounds (a rebound occurs when the ball is retrieved after a missed field goal or free throw). | REB |
| Offensive rebounds | Number of offensive rebounds (when the missed field goal or free throw has been attempted by the team). | OREB |
| Defensive rebounds | Number of defensive rebounds (when the missed field goal or free throw has been attempted by the opponent). | DREB |
| Assists | Number of ball passes made by a player to a teammate in a way that leads to a score by field goal. | AST |
| Turnovers | Number of lost possessions of the ball. | TOV |
| Steals | Number of turnovers legally caused to the opponent. | STL |
| Blocks | Number of legal deflections of a field goal attempt by the opponent in order to prevent a score. | BLK |
| Personal fouls | Number of breaches of the rules that concern illegal personal contact with an opponent. | PF |
Overview of box score game variables (which may apply either to a team or an individual player).y
| Game variable . | Description . | Abbreviation . |
|---|---|---|
| Minutes played | Total minutes spent in game | MIN |
| Points scored | Total points scored | PTS |
| Two-point shots | Number of shots taken from within the three-point line (or arc), either attempted (A) or made (M). | P2A or P2M |
| Three-point shots | Number of shots taken from outside the three-point line (or arc), either attempted (A) or made (M). | P3A or P3M |
| Free throws | Number of free throws (unopposed attempts to score awarded after a foul on the shooter by the opponent), either attempted (A) or made (M). | FTA or FTM |
| Rebounds | Number of total (offensive and defensive) rebounds (a rebound occurs when the ball is retrieved after a missed field goal or free throw). | REB |
| Offensive rebounds | Number of offensive rebounds (when the missed field goal or free throw has been attempted by the team). | OREB |
| Defensive rebounds | Number of defensive rebounds (when the missed field goal or free throw has been attempted by the opponent). | DREB |
| Assists | Number of ball passes made by a player to a teammate in a way that leads to a score by field goal. | AST |
| Turnovers | Number of lost possessions of the ball. | TOV |
| Steals | Number of turnovers legally caused to the opponent. | STL |
| Blocks | Number of legal deflections of a field goal attempt by the opponent in order to prevent a score. | BLK |
| Personal fouls | Number of breaches of the rules that concern illegal personal contact with an opponent. | PF |
| Game variable . | Description . | Abbreviation . |
|---|---|---|
| Minutes played | Total minutes spent in game | MIN |
| Points scored | Total points scored | PTS |
| Two-point shots | Number of shots taken from within the three-point line (or arc), either attempted (A) or made (M). | P2A or P2M |
| Three-point shots | Number of shots taken from outside the three-point line (or arc), either attempted (A) or made (M). | P3A or P3M |
| Free throws | Number of free throws (unopposed attempts to score awarded after a foul on the shooter by the opponent), either attempted (A) or made (M). | FTA or FTM |
| Rebounds | Number of total (offensive and defensive) rebounds (a rebound occurs when the ball is retrieved after a missed field goal or free throw). | REB |
| Offensive rebounds | Number of offensive rebounds (when the missed field goal or free throw has been attempted by the team). | OREB |
| Defensive rebounds | Number of defensive rebounds (when the missed field goal or free throw has been attempted by the opponent). | DREB |
| Assists | Number of ball passes made by a player to a teammate in a way that leads to a score by field goal. | AST |
| Turnovers | Number of lost possessions of the ball. | TOV |
| Steals | Number of turnovers legally caused to the opponent. | STL |
| Blocks | Number of legal deflections of a field goal attempt by the opponent in order to prevent a score. | BLK |
| Personal fouls | Number of breaches of the rules that concern illegal personal contact with an opponent. | PF |
Basic analyses using box scores
To start, we carry out an exploratory analysis of the achievements of players in the Dallas Mavericks team with respect to some game variables contained in the box scores. Specifically, we select the players who have played at least 700 minutes and we generate radial plots (Figure 2, page 28) according to nine variables calculated per minute played – for example, the variable “two-point shots made” (P2M) divided by the variable “minutes played” (MIN). The variables are standardised, so the dashed blue line in the plot represents the overall average of each variable over the 12 players shown, and we can easily determine whether a player's achievements are above or below this average.
Radial plots of Dallas Mavericks players who have played at least 700 minutes (standardised variables: PTS, P3M, P2M, FTM, REB, AST, TOV, STL, BLK per minute played; see Table 1 for descriptions of each variable used).
The solid red lines delineate the players’ profiles and allow us to draw conclusions about their way of playing with respect to the selected variables. Usually, the profile of a standard player exhibits some variables above the average and some below, and this is typically related to the position played. For example, DeAndre Jordan has the typical profile of a good centre, with high values for blocks (BLK), rebounds (REB) and P2M, and low values for assists (AST) and three-point shots made (P3M). In this respect, we immediately notice the outstanding profile of Dončić, whose achievements per minute played are above the average for all the variables considered, with the exception of BLK. This depicts him as a special player indeed.
Increasing numbers of papers are published dealing with statistical methods and analyses applied in a range of sports, while teams and individual athletes base decisions about game strategies on the analysis of data
Next, we analyse shooting performance. The radial plots inform us that Dončić performs better than average with respect to the number of shots made per minute, for all kinds of shot (two-point, three-point, free throws). But this tells us nothing about the shooting percentages, that is, the ratio between shots made and attempted. To investigate this, we display a bubble plot where the players who have attempted at least 100 shots are represented as bubbles on an (x, y) grid, with the x-axis relating to the percentage of two-point shots made over those attempted, and the y-axis relating to the same percentage for three-point shots. The bubble size is proportional to the number of shots attempted and the colour denotes the percentage of free throws made over those attempted, according to the blue–white–red scale reported on the right (Figure 3). The two black lines (horizontal and vertical) and the white colour on the blue–white–red scale correspond to the team averages for the corresponding variables.
Bubble plot of Dallas Mavericks players who have attempted at least 100 shots: x-axis, P2M/P2A × 100; y-axis, P3M/P3A × 100; bubble size, P2A+P3A+FTA; bubble colour, FTM/FTA × 100.
From the point of view of shooting percentages, Dončić lies slightly below the average for all three types of shot. However, this performance may be justified by the high number of shots attempted. Players with above-average shooting percentages tend to attempt fewer shots (e.g., Maxi Kleber, Trey Burke and Ryan Broekhoff).
The network assist shot
We now move on to some more complex analyses that can be carried out using play-by-play data. First, we represent the network assist shot of the Dallas Mavericks team (Figure 4). The graph shows that Dončić plays a fundamental role with regard to assists made, his most assisted players being Jordan, Barnes, Powell, Brunson, Kleber, and Matthews. Two other notable players with respect to assists made are the point guards Brunson and Barea, but they are in no way comparable with Dončić. Figure 5 (page 30) shows a bar-line plot, with bars denoting the number of assists made (AST) and the line denoting the number of assists received (ASTR). It shows that Dončić is the player who makes by far the highest number of assists, but the same cannot be said of assists received.
Bar-line plot of assists made (bars, AST) and assists received (line, ASTR) for Dallas Mavericks players. Bars ordered by AST.
To look deeper into this issue, we can extract from the network map the variables denoting, for each player, the points scored with field goals (FGPTS), those scored by teammates thanks to the player's assists (ASTPTS), and the percentage of field goals scored by the player thanks to a teammate's assist (FGPTS_ASTp). We represent the three variables by means of a scatterplot (Figure 6, page 30).
Scatterplot of Dallas Mavericks players: x-axis, points scored with field goals by player (FGPTS); y-axis, points scored by teammates thanks to player's assists (ASTPTS); colour, percentage of field goals scored by the player thanks to a teammate's assist (FGPTS_ASTp).
Once again, Dončić outperforms his teammates and confirms his position as the standout player in terms of points scored by the team, with 1,180 field goal points scored by himself and another 1,048 scored by his teammates thanks to his assists. In addition, the blue colour tells us that he has a low percentage of points scored thanks to some teammate's assist (29.2%), which means that he usually makes his own opportunities for shooting.
Further analyses
The examples so far are by no means a complete investigation of Dončić's overall performance: several further analyses may be carried out with the BasketballAnalyzeR package.
For example, we may analyse the expected value of points scored with respect to some concurrent variable (e.g., shot distance, time in the match or in the quarter, play length), also comparing different teammates in the same graph or considering different opponents. In Figure 7 (page 31), for example, we separately consider the games played by the Dallas Mavericks against teams that qualified for the playoffs (top-ranked teams) and those against teams that did not qualify for the playoffs (bottom-ranked teams) and we draw the graph of Dončić's expected value of points scored as a function of the shot distance. Interestingly, Dončić performs better against top-ranked teams, with the main improvement occurring for shots attempted further away from the basket.
Expected points versus shot distance of Dončić against top-ranked teams (“top”) and bottom-ranked teams (“bot”).
Our analyses clearly show Luka Dončić's outstanding performance with respect to other players in his team
Other options are, for example, to plot charts to investigate spatial performance or to assess the density of two-point and three-point shots with respect to time or space. For example, focusing attention on three-point shots, in Figure 8 we consider the play length (the time elapsed between the shot and the immediately preceding event) and analyse the density and success of Dončić's shots with respect to how close they are to the “shot clock” timer ending – the shot clock providing a 24-second period “within which the team possessing the ball must attempt a field goal” (on.nba.com/3ksDPRZ).
Density of Dončić's three-point shots with respect to play length, separately for games played against bottom-ranked (left) and top-ranked (right) opponents.
The percentage of three-point shots attempted by Dončić in the middle of a play (from 4 to 20 seconds) is almost the same against bottom- and top-ranked teams (70% and 69%, respectively). However, there are differences when comparing the first 4 seconds against the last 4 seconds of a play: when the opponents are bottom-ranked teams, 18% of three-point shots are attempted in the first 4 seconds and 12% in the last 4 seconds; conversely, when the opponents are top-ranked teams, 15% of three-point shots are attempted in the first 4 seconds and 16% in the last 4 seconds. We also see that Dončić performs better against top-ranked teams in terms of percentage of shots made, regardless of what time during a play a shot is attempted.
Final huddle
The analyses presented here clearly show Luka Dončić's outstanding performance with respect to other players in his team, with reference to several game variables. His achievements per minute are largely above the team average with respect to points scored, shots made, rebounds, assists, and steals. In contrast, he has slightly lower shooting performance. He plays a fundamental role in the network assist shot and, in addition to being the player with the highest number of points scored, he is the one with the highest number of points scored by teammates thanks to his assists.
Overall, the percentage of shots made thanks to some teammate's assist is low, meaning that he creates shooting opportunities for himself and for his teammates. From the point of view of the expected points with respect to shot distance, we have highlighted that his performance from longer distances improves in difficult situations. Overall, we may say that in three-point shots, Dončić tends to give his best when the going gets tough – that is, when he is playing against stronger opponents.
Analyses such as these can help basketball coaches to have a better awareness of the main characteristics of players and teams, which should, ultimately, help to define personal training regimes (in order to properly address players’ weaknesses and strengths) as well as game strategies. ■
Authors’ note
BasketballAnalyzeR is described in the book Basketball Data Science, published by CRC Press. The package is particularly suited for teaching, both in degree courses in statistics and in specific masters's and postgraduate courses in sport science, and can also be employed by technical experts who work with professional basketball teams. See bdsports.unibs.it/basketballanalyzer for installation instructions, codes, data, FAQs, news and updates.
Disclosure statement
The authors declare no conflicts of interest.










