In this post I will use the NBA API to access shot chart data and use it to make some cool plots based on the shot zone infromation which is available in the raw data.

I wrote a package in order to access the NBA api. It can be see on my github page (https://github.com/eyalshafran/NBAapi). This NBA package also includes some plotting features as I will show in this post. This package is an on going project which will be updated as I keep working on this blog.

In [1]:
import NBAapi as nba
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import sys
from scipy import misc
from scipy.stats.stats import pearsonr
%matplotlib inline 

First let's access the data and preview it:

In [2]:
shotchart,leagueavergae = nba.shotchart.shotchartdetail(season='2016-17') # get shot chart data from NBA.stats
shotchart.head()
Out[2]:
GRID_TYPE GAME_ID GAME_EVENT_ID PLAYER_ID PLAYER_NAME TEAM_ID TEAM_NAME PERIOD MINUTES_REMAINING SECONDS_REMAINING ... SHOT_ZONE_AREA SHOT_ZONE_RANGE SHOT_DISTANCE LOC_X LOC_Y SHOT_ATTEMPTED_FLAG SHOT_MADE_FLAG GAME_DATE HTM VTM
0 Shot Chart Detail 0021600001 2 201565 Derrick Rose 1610612752 New York Knicks 1 11 40 ... Center(C) Less Than 8 ft. 0 4 8 1 1 20161025 CLE NYK
1 Shot Chart Detail 0021600001 3 201567 Kevin Love 1610612739 Cleveland Cavaliers 1 11 26 ... Center(C) Less Than 8 ft. 3 -11 36 1 0 20161025 CLE NYK
2 Shot Chart Detail 0021600001 5 2546 Carmelo Anthony 1610612752 New York Knicks 1 11 16 ... Right Side Center(RC) 16-24 ft. 19 148 129 1 0 20161025 CLE NYK
3 Shot Chart Detail 0021600001 7 204001 Kristaps Porzingis 1610612752 New York Knicks 1 11 15 ... Center(C) Less Than 8 ft. 2 24 -1 1 1 20161025 CLE NYK
4 Shot Chart Detail 0021600001 8 2544 LeBron James 1610612739 Cleveland Cavaliers 1 10 59 ... Left Side(L) 8-16 ft. 11 -79 80 1 1 20161025 CLE NYK

5 rows × 24 columns

Extracting zone based statistics for each player

Each player has a unique player ID and also a name (which might not be unique). It is possible to just work with the player ID but I find that it is less informative when looking at the data and therefore I'm creating a new column (called PLAYER) which incorporates both the player name and ID.

I'm going to create a list of tuples with zone names which will be used later.

The shot zone can be found using the combination of the 'SHOT_ZONE_RANGE' and 'SHOT_ZONE_AREA' columns. I will also use the 'SHOT_MADE_FLAG' columns to see whether the shot was made or not. I'm going to use the groupby method in order to get a dataframe with zone based infromation for each player. The aggergator size will show us how many times a player shot from each zone and whether they made it or not:

In [3]:
shotchart['PLAYER'] = zip(shotchart['PLAYER_NAME'],shotchart['PLAYER_ID'])
zones_list = [(u'Less Than 8 ft.', u'Center(C)'),
              (u'8-16 ft.', u'Center(C)'),
              (u'8-16 ft.', u'Left Side(L)'),
              (u'8-16 ft.', u'Right Side(R)'),
              (u'16-24 ft.', u'Center(C)'),
              (u'16-24 ft.', u'Left Side Center(LC)'),
              (u'16-24 ft.', u'Left Side(L)'),
              (u'16-24 ft.', u'Right Side Center(RC)'),
              (u'16-24 ft.', u'Right Side(R)'),
              (u'24+ ft.', u'Center(C)'),
              (u'24+ ft.', u'Left Side Center(LC)'),
              (u'24+ ft.', u'Left Side(L)'),
              (u'24+ ft.', u'Right Side Center(RC)'),
              (u'24+ ft.', u'Right Side(R)'),
              (u'Back Court Shot', u'Back Court(BC)')]
# Create dataframe with PLAYER as index and the rest as columns
zones = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG','PLAYER']).size().unstack(fill_value=0).T
zones.head()
Out[3]:
SHOT_ZONE_RANGE 16-24 ft. ... 8-16 ft. Back Court Shot Less Than 8 ft.
SHOT_ZONE_AREA Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) ... Center(C) Left Side(L) Right Side(R) Back Court(BC) Center(C)
SHOT_MADE_FLAG 0 1 0 1 0 1 0 1 0 1 ... 0 1 0 1 0 1 0 1 0 1
PLAYER
(AJ Hammons, 1627773) 0 2 4 2 1 1 2 1 1 1 ... 1 0 1 0 4 0 0 0 6 5
(Aaron Brooks, 201166) 5 0 3 5 7 5 6 0 2 2 ... 5 4 6 7 7 10 7 0 58 40
(Aaron Gordon, 203932) 10 8 15 6 12 5 25 10 14 7 ... 20 15 32 20 19 15 3 0 135 230
(Aaron Harrison, 1626151) 0 0 1 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
(Adreian Payne, 203940) 1 2 3 1 0 0 1 2 0 0 ... 0 0 1 0 0 3 0 0 13 12

5 rows × 30 columns

The shot chart data does not say how many games each player played. We will use the player biostats data to get that infromation:

In [4]:
players = nba.player.biostats(season='2016-17')
players['PLAYER'] = zip(players['PLAYER_NAME'],players['PLAYER_ID'])
players.set_index('PLAYER',inplace=True)
players.head()
Out[4]:
PLAYER_ID PLAYER_NAME TEAM_ID TEAM_ABBREVIATION AGE PLAYER_HEIGHT PLAYER_HEIGHT_INCHES PLAYER_WEIGHT COLLEGE COUNTRY ... GP PTS REB AST NET_RATING OREB_PCT DREB_PCT USG_PCT TS_PCT AST_PCT
PLAYER
(AJ Hammons, 1627773) 1627773 AJ Hammons 1610612742 DAL 24.0 7-0 84 260 Purdue USA ... 22 2.2 1.6 0.2 -0.6 0.049 0.199 0.167 0.472 0.038
(Aaron Brooks, 201166) 201166 Aaron Brooks 1610612754 IND 32.0 6-0 72 161 Oregon USA ... 65 5.0 1.1 1.9 -3.0 0.022 0.064 0.190 0.507 0.216
(Aaron Gordon, 203932) 203932 Aaron Gordon 1610612753 ORL 21.0 6-9 81 220 Arizona USA ... 80 12.7 5.1 1.9 -2.8 0.054 0.141 0.200 0.530 0.097
(Aaron Harrison, 1626151) 1626151 Aaron Harrison 1610612766 CHA 22.0 6-6 78 210 Kentucky USA ... 5 0.2 0.6 0.6 -18.6 0.000 0.200 0.142 0.102 0.375
(Adreian Payne, 203940) 203940 Adreian Payne 1610612750 MIN 26.0 6-10 82 237 Michigan State USA ... 18 3.5 1.8 0.4 0.8 0.069 0.200 0.224 0.505 0.089

5 rows × 23 columns

We will need to merge the GP column from the players dataframe with the zones dataframe that we created earlier. Since both dataframes have the same index we can use pandas join

In [5]:
GP = players.loc[:,['GP']] # create DataFrame with single GP column
GP.columns = pd.MultiIndex.from_product([GP.columns,[''],['']]) # change column to multiindex before join (prevents join warning)
zones_with_GP = zones.join(GP) # only inclued game played from players
zones_with_GP.columns = pd.MultiIndex.from_tuples(zones_with_GP.columns.tolist(), 
                                                  names=['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','MADE'])
zones_with_GP = zones_with_GP.sortlevel(0,axis=1) # sort columns for better performance (+ avoid warning) 
zones_with_GP.head()
Out[5]:
SHOT_ZONE_RANGE 16-24 ft. ... 8-16 ft. Back Court Shot GP Less Than 8 ft.
SHOT_ZONE_AREA Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) ... Center(C) Left Side(L) Right Side(R) Back Court(BC) Center(C)
MADE 0 1 0 1 0 1 0 1 0 1 ... 1 0 1 0 1 0 1 0 1
PLAYER
(AJ Hammons, 1627773) 0 2 4 2 1 1 2 1 1 1 ... 0 1 0 4 0 0 0 22 6 5
(Aaron Brooks, 201166) 5 0 3 5 7 5 6 0 2 2 ... 4 6 7 7 10 7 0 65 58 40
(Aaron Gordon, 203932) 10 8 15 6 12 5 25 10 14 7 ... 15 32 20 19 15 3 0 80 135 230
(Aaron Harrison, 1626151) 0 0 1 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 5 0 0
(Adreian Payne, 203940) 1 2 3 1 0 0 1 2 0 0 ... 0 1 0 0 3 0 0 18 13 12

5 rows × 31 columns

Let's do some plotting!

Which players takes the most shots per zone?

I already included some plotting tools in the package. For the court plot I used the following blog http://savvastjortjoglou.com/nba-shot-sharts.html. I made some changes to the court function (biggest change is working in feet instead of feet*10 which the shot chart location comes in).

I also have a plt.text_in_zone function which accepts a text and the zone tuple and writes the text in the specified zone.

We need to sum over the 0s (missed shot) and 1s (made shots) to get the total shots and divide by the number of game played.

In [6]:
path = os.path.dirname(nba.__file__) # get path of the nba module
floor = misc.imread(path+'\\data\\court.jpg') # load floor template
plt.figure(figsize=(15,12.5),facecolor='white') # set up figure
ax = nba.plot.court(lw=4,outer_lines=False) # plot NBA court - don't include the outer lines
ax.axis('off')
nba.plot.zones(lw=2,color='white',linewidth=3)
eligible = zones_with_GP.loc[:,'GP'].values > 10 # only include players which player more than 10 games
# we are going to use the zone_list to plot information in each zone
for zone in zones_list:
    # calculate shots per game for specific zone and sort from highest to lowest
    shots_PG = (zones_with_GP.loc[eligible,zone].sum(axis=1)/zones_with_GP.loc[eligible,'GP']).sort_values(0,ascending=False)
    name = [] # will be used to store the text we want to print
    # run a loop to find top 3 players 
    for j in range(3):
        # create text
        name.append(shots_PG.index[j][0].split(' ')[0][0]+'. ' + shots_PG.index[j][0].split(' ')[1]+':%0.1f' %shots_PG.values[j])
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1)
plt.title('Most Shots by Zone',fontsize=16)
plt.imshow(floor,extent=[-30,30,-7,43]) # plot floor
Out[6]: