An Analysis of Commonly Held Fan Beliefs About Referees¶

Nathaniel Vaduthala

Motivation¶

Much of fan discourse around refereeing is based on emotion. Many of the claims made by fans actually made an implicit relationship between variables. For instance, the oft-repeated claim that "star players get more calls" is making the claim that there is a positive correlation between the stardom of a player and the number of calls they receive. The goal of this project to analyze some of the commonly repeated claims made by NBA fans and determine if there is some truth to it.

Project Goals¶

In this project, I will attempt to see if commonly held beliefs about referees amongst fans of the NBA are true. In particular, these beliefs include the opinions:

"star players always receive preferential treatment when it comes to foul calls"
"some referees are fair, while others are biased"
"home teams receive favorable treatment compared to the away team"
"in the NBA Playoffs, the referees call less fouls"

Using this information, I will then attempt to use k-Nearest Neighbors to see if we can predict the number of total foul calls made in a game.

Do Star Players Receive Favorable Treatment?¶

Relation Between Single-Value Player Metrics and Personal Foul Call Amounts¶

Before we can even determine if there is a relationship between a player's stardom and the number of foul calls they receive, we first need to define what a "star player" is. To do so, I will be using common single-value player metrics. These metrics attempt to provide a single number that quantifies how talented a player is. The single-value player metrics I will be using are Win Shares per 48 (WS/48), and Value Over Replacement Plyer (VORP). Notably, I am not using Player Efficiency Rating (PER) here, as it seems to be no longer used in basketball analytics.

The higher these values are, the more of a star the corresponding player is. Therefore, we will attempt to see if there is a relationship between these values and the average number of foul calls per game. I imported one data set from Basketball Reference that included the average number of personal foul calls a player received, along with another data set from Basketball Reference that was primarily focused on single-value player metrics.

# Import data for BBall Ref

import pandas as pd
import numpy as np

url_br_2021_per36 = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/bball_ref_2021_per36.csv' # '2021' here means 2021-22 season
df_br_2021_per36 = pd.read_csv(url_br_2021_per36) # DataFrame containing players' average stats

url_br_2021_adv = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/bball_ref_2021_advanced.csv'
df_br_2021_adv = pd.read_csv(url_br_2021_adv) # DateFrame containing single-value metrics

df_br_2021_per36.head()

After importing the data for the 2021-22 NBA season, it is necessary now to remove of any data points that cannot provide us any value. In order to take into account that different players play a different number of minutes, and therefore will most likely have a higher number of foul calls, I imported in the data set that proportionately normalized the players' average statistics to if they had played 36 minutes every game. However, in the case of players that had played either less than 15 minutes on average or had played less than 35 games, we cannot make any normalization, as they have not played enough to make a realistic extrapolation. Therefore, I want to remove these players from the dataset

But before I can do that, I have to first create a new column that described how many minutes per game a player played on average. This is because the dataset containing the players' average statistics did not contain this information. But, it did contain the total number of minutes played ('MP') and the total number of games played ('G'), and so with this, I was able to find the average number of minutes played ('MPG').

But, because the data set I am importing performs some extrapolation, there are some illogical values in the dataset. I will force every value that was greater than 6 in the "Personal Fouls" column of the dataset6 to be equal to 6, as a player cannot receive more than 6 fouls in an NBA game.

# clean up data for BBall Ref

df_br_2021_per36['MPG'] = df_br_2021_per36.apply(lambda row: row['MP'] / row['G'], axis = 1) # calculate minutes per game
df_br_2021_per36 = df_br_2021_per36.drop(df_br_2021_per36[(df_br_2021_per36['MPG'] < 15) | (df_br_2021_per36['G'] < 35)].index) # remove players with insufficient games/min playes
df_br_2021_per36.loc[df_br_2021_per36['PF'] > 6, 'PF'] = 6 # impossible to have more than 6 fouls 

df_br_2021_adv_pf = df_br_2021_adv.merge(df_br_2021_per36, how = 'inner') # add personal fouls column onto advanced stats dataframe

df_br_2021_adv_pf.head()

Compare Overall Fouls to Single-Value Player Metrics¶

From here, we can see some results.

# Plot WS/48 vs PF
import matplotlib.pyplot as plt
from scipy import stats

df_br_2021_adv_pf.plot.scatter(x = 'WS/48', y = 'PF')

result = stats.linregress(x = df_br_2021_adv_pf['WS/48'], y = df_br_2021_adv_pf['PF'])

plt.plot(df_br_2021_adv_pf['WS/48'], result.intercept + result.slope*(df_br_2021_adv_pf['WS/48']), 'r')
plt.text(-0.05, 5.5, 'r^2 = %0.2f' % (result.rvalue) ** 2)
plt.title('WS/48 vs (Adjusted) Personal Fouls/Game')
plt.xlabel('Win Shares/48')
plt.ylabel('Adjusted Personal Fouls/Game')

Text(0, 0.5, 'Adjusted Personal Fouls/Game')

We can see from the graph that there does not appear to be any relationship between an increased Win Shares per 48--and therefore increased stardom--and the average number of personal fouls called on the player. Although there is a positive correlation, the r-squared value is small enough that we cannot make a claim of there being any relation.

df_br_2021_adv_pf.plot.scatter(x = 'VORP', y = 'PF')

result2 = stats.linregress(x = df_br_2021_adv_pf['VORP'], y = df_br_2021_adv_pf['PF'])

plt.plot(df_br_2021_adv_pf['VORP'], result2.intercept + result2.slope*(df_br_2021_adv_pf['VORP']), 'r')
plt.text(2, 5.5, 'r^2 = %0.2f' % (result2.rvalue) ** 2)
plt.title('VORP vs (Adjusted) Personal Fouls/Game')
plt.xlabel('VORP')
plt.ylabel('Adjusted Personal Fouls/Game')

Text(0, 0.5, 'Adjusted Personal Fouls/Game')

In a similar manner as in the first graph, we cannot conclude any definitive relationship between the stardom of a player and the number of foul call received in a game. In this case, the correlation is negative, but the r-squared value is once again too small for us to state a relationship.

Compare Actual Number of Favorable Calls to Single-Value Player Metrics¶

One issue in our analysis is that we are extrapolating information off of the average. Another issue is that we are just comparing the number of foul calls to the single value metrics. It may be the case that a player received a lot of calls in their favor, and that all these calls were correct.

In order to try and remedy these issues, I will using the NBA's Last 2 Minutes dataset to look at the actual favorable calls made, and see if there is a relationship between the single-value player metrics and the number of favorable calls made. The dataset I will be using was organized by Github user athawksfanatic. This dataset states the fouls called made in the last two minutes of every game and states if they were the correct call to make or the incorrect one.

In particular, what I will be counting is the net number of favorable calls. In the NBA's Last 2 Minutes dataset, there are columns that outlines which player was disadvantaged and which player committed the foul. As a result, I can count how many times every player was either at an advantageous position in a foul call or at a a disadvantageous position. By assigning the value 1 to an advantageous position, and a -1 to a disadvantageous position, I will count up all the situations that a player was in, and this will be their net number of favorable calls. From here, I will then see if there is a relationship between this variable and the single-value player metrics.

Before I can do this, I must first remove all the rows that contain data not in the 2021-22 season.

# Import data from NBA L2M, provided by atlhawksfanatic

url_l2m = 'https://raw.githubusercontent.com/atlhawksfanatic/L2M/master/1-tidy/L2M/L2M_stats_nba.csv'
df_l2m = pd.read_csv(url_l2m)
df_l2m = df_l2m.drop((df_l2m[(df_l2m['season'] != 2022) | (df_l2m['playoff'] == True)].index)) #'2022' here means 2021-22 season, and only want Regular Season
df_l2m.head()

/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py:3326: DtypeWarning: Columns (7,23,24,26,30,31,37,40,51) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)

As one can see, this code currently produces a DtypeWarning. This should not be an issue, as this only affects columns that I am not interested in. However, we see that this dataset contains violations that aren't fouls. We are not interested in this, and so we will remove all rows that contain information about a referee call made that is not a foul. Along with that, some rows contain team names where there should be player names, and so those also have to be removed.

df_l2m = df_l2m.dropna(subset=['call_type', 'committing', 'disadvantaged', # Remove NaN values in columns we care about
                               'decision'])

df_l2m = df_l2m[df_l2m['committing'] != df_l2m['disadvantaged']] # Avoid possible repeats

# remove calls that are not fouls, like delay of game or a turnover
df_l2m = df_l2m[df_l2m['call_type'].str.contains('Foul', case=False)]

# removes all possible team names, except the Trail Blazers
mask_commit = df_l2m['committing'].str.count(' ').ge(1)
df_l2m = df_l2m[mask_commit]

mask_disadv = df_l2m['disadvantaged'].str.count(' ').ge(1)
df_l2m = df_l2m[mask_disadv]

df_l2m = df_l2m[(df_l2m['committing'] != 'Trail Blazers') & (df_l2m['disadvantaged'] != 'Trail Blazers')]

Now, we say that a player received a favorable call if they are called committing a foul and the decision is seen as an incorrect non-call, or if they are the disadvantaged player and received an incorrect call. We say that a player received an unfavorable call if they are called committing a foul and the decision is seen as an incorrect call, or if they are the disadvantaged player and the decision is seen as an incorrect non-call. All other cases can be considered to be a neutral (correct) call and so they will automatically have a value of 0.

However, we must also take into account the fact that different players will play different amounts of time in the last two minutes and so the number of foul situations they were involved in will also differ, and therefore it may not be fair to directly compare the net number of favorable calls between players. In order to mitigate this issue, we will instead look at the net number of favorable calls per 100 possessions. In order to find this value, we will first find the net number of favorable calls, and then we will find the number of possessions that a player was involved in a foul call. We will then divide the net number of favorable calls by the number of possessions in order to find the net number of favorable calls per possion, and then multiply this value by 100.

df_fav = pd.DataFrame()
df_fav = pd.DataFrame(pd.concat([df_l2m['committing'], df_l2m['disadvantaged']], ignore_index = True).unique())
df_fav['favorable count'] = ' '
df_fav['unfavorable count'] = ' '
df_fav['possession count'] = ' '
df_fav.columns = ['Player', 'favorable count', 'unfavorable count', 'possession count']
df_fav = df_fav.set_index('Player')

for player in df_fav.index:
  fav_count = df_l2m.loc[
    (df_l2m['committing'].eq(player) & df_l2m['decision'].eq('INC')) |
    (df_l2m['disadvantaged'].eq(player) & df_l2m['decision'].eq('IC'))
  ]
  unfav_count = df_l2m.loc[
    (df_l2m['committing'].eq(player) & df_l2m['decision'].eq('IC')) |
    (df_l2m['disadvantaged'].eq(player) & df_l2m['decision'].eq('INC'))
  ]
  poss_count = df_l2m['committing'].tolist().count(player) + df_l2m['disadvantaged'].tolist().count(player)
  df_fav.at[player, 'possession count'] = poss_count
  df_fav.at[player, 'favorable count'] = len(fav_count)
  df_fav.at[player, 'unfavorable count'] = len(unfav_count)

df_fav['net favorable count'] = df_fav['favorable count'] - df_fav['unfavorable count']
df_fav['net fav count per 100 poss'] = df_fav['net favorable count'] / df_fav['possession count'] * 100

df_fav = df_fav.reset_index()

df_fav.shape

(435, 6)

We also need to consider the fact that some players have not been involved in enough possesions to where we can say that their net number of favorable calls per 100 possessions is a fair extrapolation. To fix this issue, we will drop all the players that have beenn involved in less than 5 possesions. The reason we use 5 is because if we go any lower, then we simply no longer will be fixing the problem, but if we go any higher, then there is a chance that we start cutting off meaningful data. The choice of having 5 as the baseline is also justified by the fact that the 25% percentile is 7 and the 50% percentile is 22, and so the vast majority of the data here is above the 5 benchmark.

pd.to_numeric(df_fav['possession count'], errors = 'coerce').describe()

count    435.000000
mean      32.611494
std       31.512463
min        1.000000
25%        7.000000
50%       22.000000
75%       52.000000
max      173.000000
Name: possession count, dtype: float64

df_fav = df_fav[df_fav['possession count'] > 4]

From here, we now need to join this DataFrame with the df_br_2021_adv_pf DataFrame in order to see if there is a relationship between Win Shares per 48 and VORP.

df_br_2021_adv_pf_fav = df_br_2021_adv.merge(df_fav, how = 'inner') 

df_br_2021_adv_pf_fav.plot.scatter(x = 'WS/48', y = 'net fav count per 100 poss')

result = stats.linregress(x = df_br_2021_adv_pf_fav['WS/48'], y = pd.to_numeric(df_br_2021_adv_pf_fav['net fav count per 100 poss']))

plt.plot(df_br_2021_adv_pf_fav['WS/48'], result.intercept + result.slope*(df_br_2021_adv_pf_fav['WS/48']), 'r')
plt.text(-0.25, 5.5, 'r^2 = %0.2f' % (result.rvalue) ** 2)
plt.title('WS/48 vs Favorable Calls Per 100 Poss')
plt.xlabel('Win Shares/48')
plt.ylabel('Favorable Calls Per 100 Poss')

df_br_2021_adv_pf_fav.plot.scatter(x = 'VORP', y = 'net fav count per 100 poss')

result = stats.linregress(x = df_br_2021_adv_pf_fav['VORP'], y = pd.to_numeric(df_br_2021_adv_pf_fav['net fav count per 100 poss']))

plt.plot(df_br_2021_adv_pf_fav['VORP'], result.intercept + result.slope*(df_br_2021_adv_pf_fav['WS/48']), 'r')
plt.text(5, 6.5, 'r^2 = %0.2f' % (result.rvalue) ** 2)
plt.title('VORP vs Favorable Calls Per 100 Poss')
plt.xlabel('VORP')
plt.ylabel('Favorable Calls Per 100 Poss')

Text(0, 0.5, 'Favorable Calls Per 100 Poss')

We can see that there seems to be no relationship between the net number of favorable calls per 100 possesions and VORP, and the net number of favorable calls per 100 possesions and Win Shares per 48.

Conclusion Regarding The Relationship Between Star Players and Foul Calls¶

Based upon our analysis, we can conclude that there seems to be no evidence indicating that there exists a relationship between if a player is a star player and if they receive more beneficial foul calls.

Are Some Referees Bad?¶

Compare the Actual Number of Correct Calls Made to the Average¶

Our next goal is to determine if the calls made by some referees show evidence of statistical bias, which would then make them be considered bad. We will first use the Last 2 Minutes Dataset to determine the average number of the net correct calls made by the referees, which is defined to be the diffeence between the number of correct calls and the number of incorrect calls, and from there we will determine if there are any referees that can be considered statistical outliers. We will then determine the average number of correct calls made and also see if there are any statistical outliers.

To find the average number of net correct calls made, we first need to find all the individual games that have their data in the DataSet, and then from there find the number of correct and incorrect foul calls that happened within those games. Finally, we will sum up all the incorrect foul calls and then divide this value by the number of games.

# Import data from NBA L2M, provided by atlhawksfanatic

url_l2m = 'https://raw.githubusercontent.com/atlhawksfanatic/L2M/master/1-tidy/L2M/L2M_stats_nba.csv'
df_l2m_ref = pd.read_csv(url_l2m)
df_l2m_ref = df_l2m_ref.drop((df_l2m_ref[(df_l2m_ref['season'] != 2022) | (df_l2m_ref['playoff'] == True)].index)) #'2022' here means 2021-22 season, and only want Regular Season
df_l2m_ref['game_id'].head()

# separate DataFrame into individual games

df_games_list = []
for id in df_l2m_ref['game_id'].unique():
  df_games_list.append(df_l2m_ref[df_l2m_ref['game_id'] ==id])

df_game_info = pd.DataFrame()
df_game_info['game id'] = df_l2m_ref['game_id'].unique()
df_game_info['correct calls'] = ' '
df_game_info['incorrect calls'] = ' '
df_game_info['net correct calls'] = ' '
df_game_info['ref 1'] = ' '
df_game_info['ref 2'] = ' '
df_game_info['ref 3'] = ' '

# calculate calls made in each game

call_list = [pd.DataFrame(game['decision'].value_counts()) for game in df_games_list]
call_list = [call.reset_index() for call in call_list]
for call in call_list:
  call.columns = ['type', 'count']

for call in call_list:
  call_type = ['IC', 'INC', 'CC', 'CNC']
  for call_t in call_type:
    if (call_t not in call.values):
      call.loc[len(call.index)] = [call_t, 0]
  
for call in call_list:
  call.set_index('type', inplace=True)

# add ref names to dataframe

correct_call_list = [call.loc['CNC', 'count'] + call.loc['CC', 'count'] for call in call_list]
incorrect_call_list = [call.loc['IC', 'count'] + call.loc['INC', 'count'] for call in call_list]
ref_1_list = [game[['OFFICIAL_1']].loc[game.first_valid_index(), 'OFFICIAL_1'] for game in df_games_list]
ref_2_list = [game[['OFFICIAL_2']].loc[game.first_valid_index(), 'OFFICIAL_2'] for game in df_games_list]
ref_3_list = [game[['OFFICIAL_3']].loc[game.first_valid_index(), 'OFFICIAL_3'] for game in df_games_list]

df_game_info['correct calls'] = correct_call_list
df_game_info['incorrect calls'] = incorrect_call_list
df_game_info['ref 1'] = ref_1_list
df_game_info['ref 2'] = ref_2_list
df_game_info['ref 3'] = ref_3_list
df_game_info['net correct calls'] = df_game_info['correct calls'] - df_game_info['incorrect calls']

df_game_info.head()

/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py:3326: DtypeWarning: Columns (7,23,24,26,30,31,37,40,51) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)

What we will now do is associate each referee with the number of net correct calls made in the games they oversaw, along with their average net correct calls made per 2 minutes. We find this latter value in an attempt to take into account the fact that some referees oversee more games than others.

However, it should be mentioned that this analysis is flawed. With this method, I will be "sharing the blame." It could be the case that a referee did not actually make many, or any for that matter, incorrect calls, but they will still be labelled as doing so if they were working with a referee that made incorrect calls. To try and mitigate this issue, I will be dividing the number of correct and incorrect calls associated with each referee by 3. Thus, I am making the assumption that each ref is equally likely to make an incorrect call. This is a faulty assumption.

df_ref_info = pd.DataFrame()
df_ref_info['name'] = ' '
df_ref_info['correct calls'] = ' '
df_ref_info['incorrect calls'] = ' '
df_ref_info['net correct calls'] = ' '

df_ref_info['name'] = pd.DataFrame(pd.concat([df_game_info['ref 1'], df_game_info['ref 2'], df_game_info['ref 3']], ignore_index = True).unique())
ref_info_list = [df_game_info.loc[(df_game_info['ref 1'] == name) | (df_game_info['ref 2'] == name) | (df_game_info['ref 3'] == name)] for name in list(df_ref_info['name'])]

cor_call_list = [ref['correct calls'].sum() for ref in ref_info_list]
df_ref_info['correct calls'] = np.divide(cor_call_list, 3)

incor_call_list = [ref['incorrect calls'].sum() for ref in ref_info_list]
df_ref_info['incorrect calls'] = np.divide(incor_call_list, 3)

net_cor_call_list = [ref['net correct calls'].sum() for ref in ref_info_list]
df_ref_info['net correct calls'] = np.divide(net_cor_call_list, 3)

df_ref_info['average net correct calls per 2 min'] = np.divide([ref['net correct calls'].sum() / len(ref) for ref in ref_info_list], 3)
df_ref_info['difference of individual avg NCC/2 min and overall'] = ( df_ref_info['average net correct calls per 2 min'] 
                                                                    - (df_ref_info['average net correct calls per 2 min'].sum() / len(df_ref_info)))

df_ref_info.head()

We now remove all referees that have less than 10 minutes reffed in this dataset. We remove them because there is not sufficient data to make any claims about possible bias.

df_ref_info['minutes reffed'] = [2*len(ref) for ref in ref_info_list]

df_ref_info = df_ref_info[df_ref_info['minutes reffed'] > 9]

df_ref_info.head()

Let us now visualize how this data looks through the use of a scatter plot.

df_ref_info.plot.scatter(x = 'name', y = 'difference of individual avg NCC/2 min and overall')

plt.title('Difference Between Individual Average Net Correct Calls/2 min and Overall')
plt.tick_params(
    axis='x',          
    which='both',      
    bottom=True,      
    top=False,         
    labelbottom=False) 
plt.xlabel('Referee')
plt.ylabel('Difference')

Text(0, 0.5, 'Difference')

As seen in the scatter plot, the vast majority of referees fall within committing either 1 incorrect or 1 correct call more than the average. We cannot attribute this to any bias, and so we can say that for the vast majority of referees, there is not sufficient evidence to claim that they are biased, at least within the last two minutes.

With that being said, it is interesting to note that we can see clearly that there is one data point that is significantly higher than the rest, and therefore consistently makes better calls than most referees, and there is another at the bottom that consistently makes worse calls than most referees.

print(df_ref_info.iloc[df_ref_info['difference of individual avg NCC/2 min and overall'].idxmin()])
print(df_ref_info.iloc[df_ref_info['difference of individual avg NCC/2 min and overall'].idxmax()])

name                                                  Bennie Adams
correct calls                                                 58.0
incorrect calls                                                6.0
net correct calls                                             52.0
average net correct calls per 2 min                       3.714286
difference of individual avg NCC/2 min and overall        -1.88862
minutes reffed                                                  28
Name: 29, dtype: object
name                                                  Scott Twardoski
correct calls                                               91.333333
incorrect calls                                                   6.0
net correct calls                                           85.333333
average net correct calls per 2 min                          6.095238
difference of individual avg NCC/2 min and overall           0.492332
minutes reffed                                                     28
Name: 64, dtype: object

It appears that Bennie Adams, on average, consistently makes worse calls in the last 2 minutes than most referees, while Scott Twardoski makes better calls in the last two minutes than most referees. But once again, since this analysis does not specifically consider which referee made the incorrect call, and instead views the three referees in the game as one unit, it is not fair for me to say that Bennie Adams can be considered a bad referee.

A possible way to determine if a referee such as Bennie Adams is being negatively impacted in this analysis due to the other referees they are working with could be to see how many net correct calls their most frequent co-referees make, and determine if there is any relationship between the net number of correct calls a referee such as Bennie Adams make and the number that their co-referees make.

Calculate Number of Overall Calls Made¶

In order to develop a better understanding on if some referees exhibit some form of bias or not, we will now consider the overall average number of calls made by each individual referee. Let us first import the data.

url_br_2021_ref = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/BBall-Ref_Ref.csv'
df_br_2021_ref = pd.read_csv(url_br_2021_ref)

df_br_2021_ref.drop(labels=0, axis=0, inplace=True)
df_br_2021_ref.reset_index(drop=True, inplace=True)

df_br_2021_ref['Per Game Relative.2'] = pd.to_numeric(df_br_2021_ref['Per Game Relative.2'])
df_br_2021_ref['Unnamed: 2'] = pd.to_numeric(df_br_2021_ref['Unnamed: 2'])


df_br_2021_ref = df_br_2021_ref[df_br_2021_ref['Unnamed: 2'] >= 40] # need sufficient amount of games

df_br_2021_ref.head()

The only columns we care about are "Unnamed: 0", "Per Game Relative.3", "Home Minus Visitor.3", and "Per Game.2". The first mentioned column states the name of the referee. The second mentioned one corresponds to the net difference between the number of foul calls the individual referee made and the number of foul calls on average made. We will use this to see if there are any referees that are outliers when it comes to the number of fouls called on average.

The third mentioned column corresponds to the average net difference in fouls called against the home team and the fouls called against the away team. So a negative number indicates that away teams are called for more fouls on average. However, consider a situation where Referee 1 calls on average 4 fouls on the home team, but 8 fouls on the away team. Now, suppose Referee 2 calls on average 20 fouls on the home team, but 25 fouls on the away team. Strictly comparing the difference would imply that Referee 2 has more of a bias to home teams than Referee 1, but this does not contain the entire information. Therefore, we will normalize this difference using the fourth mentioned column, which says the number of fouls called in total on average. To increase readability, we will have this number represented as a percentage.

df_br_2021_ref = df_br_2021_ref[['Unnamed: 0', 'Per Game.2', 'Home Minus Visitor.3', 'Per Game Relative.3']]

df_br_2021_ref.columns  =['Referee', 'Total Fouls Called', 'Home Minus Visitor Foul Calls', 'Difference between Ref and Avg']

df_br_2021_ref = df_br_2021_ref.astype({'Home Minus Visitor Foul Calls' : "float", 'Total Fouls Called' : "float", "Difference between Ref and Avg" : "float"})

df_br_2021_ref['Percent Normalized Home Minus Visitor'] = 100 * df_br_2021_ref['Home Minus Visitor Foul Calls'] / df_br_2021_ref['Total Fouls Called']

df_br_2021_ref.head()

Now, with this data, we will first look to see if there are any referees that are outliers in reagrds to average number of foul calls per game and the amount of fouls called on home teams versus away teams.

import matplotlib.pyplot as plt

df_br_2021_ref.plot.scatter(x='Referee', y = 'Difference between Ref and Avg')

plt.title('Per Game Foul Call Difference With League Average')
plt.tick_params(
    axis='x',          
    which='both',      
    bottom=True,      
    top=False,         
    labelbottom=False) 
plt.xlabel('Referee')
plt.ylabel('Difference')

df_br_2021_ref.plot.scatter(x='Referee', y = 'Percent Normalized Home Minus Visitor')
plt.title('Percent Difference of Fals Called Against Home and Visitor')
plt.tick_params(
    axis='x',          
    which='both',      
    bottom=True,      
    top=False,         
    labelbottom=False) 
plt.xlabel('Referee')
plt.ylabel('Difference')

Text(0, 0.5, 'Difference')

As we can see, it appears that on average, most referees do not have a significant difference in regards to the number of fouls they call in total, and the fouls they call between home and away teams. In the case for the fouls called between home and away team, the max percent difference is approximately $\pm \% 4$, which is not enough to claim any evidence of bias in this area for any referee.

However, we do see that there are some referees that tend to call significantly more or less total amount of foul calls per game. According to Basketball-Reference, the average number of fouls called per game in the 2021-22 NBA season was 40.7. So we will now look to find out which referees called 4 more foul calls on average, or 4 less foul calls on average.

print(df_br_2021_ref.loc[(df_br_2021_ref['Difference between Ref and Avg'] >= 4) | (df_br_2021_ref['Difference between Ref and Avg'] <= -4), 'Referee'])

28          David Guthrie
45             Matt Myers
49    Gediminas Petraitis
56             Evan Scott
58            Aaron Smith
59          Michael Smith
62          Dedric Taylor
Name: Referee, dtype: object

These referees call an average number of total foul calls that is at least $\% 10$ away from the league average. This is a significant amount. This means that these referees either tend to be very quick to call a foul, or they tend to be more conservative in their foul calling. But, as mentioned above, it does not seem to be the case that any of these referees exhibit any bias in favor or against the home team or away team.

Conclusion Regarding Existence of Bad Referees¶

Based upon our analysis, we cannot conclude that there exists bad referees. When looking at the number of net correct calls made, all the referees fall within a reasonable window. The same occurs when comparing the difference in foul calls made to the home team versus the away team.

We can say that there are some referees who either call significantly more fouls or significantly less fouls than the league average, but they do not appear to call or not call these fouls in a biased manner.

Do Home Teams Receive Beneficial Treatment?¶

We will now look to determine if there is any relationship between the number of foul calls and if the team is an away team or a home team. We will be looking at data from the 2016-17 NBA season to the 2021-22 season. The reason why we are not looking at data before the 2016 season is because the NBA went through the "3 Point Revolution" through that time, resulting in enormous changes to NBA offenses. As a result, through my own judgement, I believe that it is not helpful to use data from before the 2016 season to make any conclusions about the modern NBA.

We will first look to see what are the average number of fouls called on the home team and compare it to the amount called on the away team to determine if there is any difference in how they are referred. The data we will be using from this point on comes from TeamRank, and luckily the data is already in an extremely clean and neat format and is immediately ready to be used.

url_home_away_2016_22_pf = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/teamrank_2016-22_PF_league_avg.csv'
df_br_2016_22_home_away_pf = pd.read_csv(url_home_away_2016_22_pf)
df_br_2016_22_home_away_pf.columns = ['Season', 'Home PF', 'Away PF']
df_br_2016_22_home_away_pf.set_index('Season', inplace=True)
df_br_2016_22_home_away_pf

Let us see how this looks visually.

ax = df_br_2016_22_home_away_pf.plot.bar(rot=0, title='Average Number of Fouls Called on Home vs Away')
ax.set_ylabel('Foul Calls')

Text(0, 0.5, 'Foul Calls')

As we can see, there is a negligible difference between the number of fouls called ont the home team versus the amount called on the away team. But, it should be noted that for all seasons listed here, home teams did receive more foul calls on average than away teams.

Now, we will look at the difference in foul calls that each team received as a home team versus as an away team. If the net difference is positive, then they received more calls on average as the home team.

url_home_away_pf_diff_2021 = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/teamrank_21_team_PF_avg.csv'
df_br_2021_home_away_pf_diff = pd.read_csv(url_home_away_pf_diff_2021)

df_br_2021_home_away_pf_diff.columns = ['Team', 'PF as Home', 'PF as Away']
df_br_2021_home_away_pf_diff['Net Difference'] = df_br_2021_home_away_pf_diff['PF as Home'] - df_br_2021_home_away_pf_diff['PF as Away']

df_br_2021_home_away_pf_diff.head()

Let us see how this looks graphically.

df_br_2021_home_away_pf_diff.plot.scatter(x='Team', y='Net Difference')
plt.title('Difference Between Avg Number of Fouls Committed as Home vs Away Team')
plt.tick_params(
    axis='x',          
    which='both',      
    bottom=True,      
    top=False,         
    labelbottom=False) 
plt.xlabel('Team')
plt.ylabel('Difference')

Text(0, 0.5, 'Difference')

We can see that on average, the net difference in how teams are called is within the $\pm 1$ range. Some teams do receive less calls as the away team, but none of them have a difference of at least $\pm 4$. We will use this threshold of "$4$" later as well.

Conclusion Regarding Potential Advantage for Home Teams¶

Based upon our analysis, we cannot conlude that there is statistical evidence for home teams receiving beneficial treatment, at least based upon foul call count. it is true that throughout all the listed seasons above, home teams did receive more foul calls, but the difference in foul calls made is too small to argue that there is bias.

Are Less Fouls Called in the Playoffs?¶

Now, we will look to see if there is any significant difference between the number of fouls called on average in the regular season and the playoffs from the 2016-17 season to the 2021-22 sseason. We will again be using TeamRank as our source of data, and so luckily again we will not have to perform any cleaning up of our data.

url_2016_21_reg_pf = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/teamrank_2016_21_reg_seas_league_PF_avg.csv'
url_2016_21_playoff_pf = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/teamrank_2016_21_playoffs_league_PF_avg.csv'

df_2016_21_reg_pf = pd.read_csv(url_2016_21_reg_pf)
df_2016_21_playoff_pf = pd.read_csv(url_2016_21_playoff_pf)

df_2016_21_reg_pf.columns = ['Season', 'PF in Reg Season']
df_2016_21_playoff_pf.columns = ['Season', 'PF in Playoffs']

df_2016_21_reg_pf.set_index('Season', inplace=True)
df_2016_21_playoff_pf.set_index('Season', inplace=True)

df_2016_21_reg_playoff_pf = df_2016_21_reg_pf.join(df_2016_21_playoff_pf, how='outer')

df_2016_21_reg_playoff_pf

Now let us see how this looks visually.

ax2 = df_2016_21_reg_playoff_pf.plot.bar(rot=0)
plt.title('Average Number of Fouls Called in Reg. Season vs Playoffs')
ax2.set_ylabel('Number of Fouls')

Text(0, 0.5, 'Number of Fouls')

Conclusion regarding Number of Foul Calls in the Playoffs¶

As we can see, there seems to be a negligible difference in the average number of fouls calls made in the Playoffs compared to the regular season. It should be noted that for every season, the number of foul calls actually increased in the Playoffs compared to the Regular Season, and so in fact more fouls are called in the Playoffs than the Regular Season

Prediction Model¶

We will now attempt to create a predictive model that will predict the number of net correct foul calls that a team will receive in a game based upon if they are a home team or away team and if the game is in the Regular Season or the Playoffs. We will be using k-Nearest Neighbor in this model. The reason why we will use k-Nearest Neighbors is because it is fair to assume that if two teams play games that have similar values for the input variables mentioned above, then they will receive a similar amount of foul calls.

We will not be including any information about the number of star players or who will be a referee into this model, because as we have just seen above, there seems to be no relationship betwen these variables and the number of net correct foul calls.

We will be using information from the NBA's Last Two Minutes Dataset.

From what we can see throughout this entire analysis, there is close to no evidence for any sense of correlation between if a team is the home or away one and the number of fouls called, and if it a game is during the regular season or playoffs and the number of calls made. So our prediction model most likely will not be very precise or accurate.

url_l2m2 = 'https://raw.githubusercontent.com/atlhawksfanatic/L2M/master/1-tidy/L2M/L2M_stats_nba.csv'
df_l2m2 = pd.read_csv(url_l2m)


df_ncc = pd.DataFrame()
df_ncc['Game ID'] = ' '
df_ncc['Home Team'] = ' '
df_ncc['Away Team'] = ' '
df_ncc['Playoff or No'] = ' '
df_ncc['Net Correct Calls'] = ' '

df_ncc['Game ID'] = df_l2m2['GameId'].unique()
df_ncc['Home Team'] = (df_l2m2.drop_duplicates(subset='GameId'))['home_team'].reset_index(drop=True)
df_ncc['Away Team'] = (df_l2m2.drop_duplicates(subset='GameId'))['away_team'].reset_index(drop=True)
df_ncc['Playoff or No'] = (df_l2m2.drop_duplicates(subset='GameId'))['playoff'].reset_index(drop=True)

ncc = []
for call in df_l2m2['decision']:
  if (call == 'CC') | (call == 'CNC'):
    ncc.append(1)
  elif (call == 'IC') | (call == 'INC'):
    ncc.append(-1)
  else:
    ncc.append(0)

df_l2m2['Correct Call or No'] = ncc

df_l2m2.groupby('GameId')['Correct Call or No'].sum()
df_ncc.dropna(subset=['Game ID'], inplace=True)
df_ncc.sort_values(by=['Game ID'], inplace=True)
df_ncc.set_index('Game ID', drop=True, inplace=True)

df_ncc['Net Correct Calls'] = df_l2m2.groupby('GameId')['Correct Call or No'].sum()

df_ncc

/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py:3326: DtypeWarning: Columns (7,23,24,26,30,31,37,40,51) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)

Now we implement kNN and test the errors.

from sklearn.preprocessing import StandardScaler
from sklearn.feature_extraction import DictVectorizer
from sklearn.neighbors import KNeighborsRegressor
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score


def est_test_error(df, features, pred_val, k, iter_count): # iter_count must be at least 2
  mae_list = []
  rmse_list = []
  for i in range(1, iter_count):
    error_pair = get_cv_val_error2(df, features, pred_val, k)
    mae_list.append(error_pair[0])
    rmse_list.append(error_pair[1])
  
  return (np.mean(mae_list), np.mean(rmse_list))

def get_cv_val_error2(df, features, predict_val, k): # cross-validation model
    # define train and val sets
    train = df.sample(frac = .5)
    val = df.drop(train.index)

    x_train_dict = train[features].to_dict(orient="records")
    x_val_dict = val[features].to_dict(orient = "records")
    y_train = train[predict_val]
    y_val = val[predict_val]
  
    vec = DictVectorizer(sparse=False)
    vec.fit(x_train_dict)
    x_train = vec.transform(x_train_dict)
    x_val = vec.transform(x_val_dict)

    # standardize 
    scaler = StandardScaler()
    scaler.fit(x_train)
    x_train_sc = scaler.transform(x_train)
    x_val_sc = scaler.transform(x_val)
    
    # Fit kNN model
    model = KNeighborsRegressor(n_neighbors=k)
    model.fit(x_train_sc, y_train)
    
    # Predict on val set
    y_val_pred = model.predict(x_val_sc)
    rmse1 = np.sqrt(((y_val - y_val_pred) ** 2).mean())
    mae1 = ((y_val - y_val_pred).abs()).mean()

    # Perform cross-validation 
    (mae2, rmse2) = get_val_error_mod(x_val_dict, y_val, x_train_dict, y_train)

    # Estimate test error
    rmse_list = [rmse1, rmse2]
    mae_list = [mae1, mae2]
    return (np.mean(mae_list), np.mean(rmse_list))

def get_val_error_mod(X_train_dict, y_train, X_val_dict, y_val): # called back when swapping for cross-val
    
    # convert categorical variables to dummy variables
    vec = DictVectorizer(sparse=False)
    vec.fit(X_train_dict)
    X_train = vec.transform(X_train_dict)
    X_val = vec.transform(X_val_dict)

    # standardize the data
    scaler = StandardScaler()
    scaler.fit(X_train)
    X_train_sc = scaler.transform(X_train)
    X_val_sc = scaler.transform(X_val)
    
    # Fit a 10-nearest neighbors model.
    model = KNeighborsRegressor(n_neighbors=10)
    model.fit(X_train_sc, y_train)
    
    # Make predictions on the validation set.
    y_val_pred = model.predict(X_val_sc)
    rmse = np.sqrt(((y_val - y_val_pred) ** 2).mean())
    mae = ((y_val - y_val_pred).abs()).mean()
    
    return (mae, rmse)

def get_kFoldCV_mae(df, features, predict_val, k_nn, k_cv):
  x_dict = df[features].to_dict(orient="records")
  y = df[predict_val]

  # specify the pipeline
  vec = DictVectorizer(sparse=False)
  scaler = StandardScaler()
  model = KNeighborsRegressor(n_neighbors=k_nn)
  pipeline = Pipeline([("vectorizer", vec), ("scaler", scaler), ("fit", model)])

  scores = cross_val_score(pipeline, x_dict, y, 
                         cv=k_cv, scoring="neg_mean_squared_error")
  mae = (np.abs(-scores)).mean()
  return mae
  
def get_kNN_mae(df, features, predict_val, k): # features is list of strings and predict_val is string
  x_train_dict = df[features].to_dict(orient="records")
  y_train = df[predict_val]

  vec = DictVectorizer(sparse=False)
  vec.fit(x_train_dict)
  x_train = vec.transform(x_train_dict)

  scaler = StandardScaler()
  scaler.fit(x_train)
  x_train_sc = scaler.transform(x_train)

  # Fit model
  model = KNeighborsRegressor(n_neighbors=k)
  model.fit(x_train_sc, y_train)

  # Find predictions
  y_train_pred = model.predict(x_train_sc)

  # Find mae
  mae = ((y_train - y_train_pred).abs()).mean()
  return mae

features = ['Home Team', 'Away Team', 'Playoff or No']
predict_val = 'Net Correct Calls'

est_test_error(df_ncc, features, predict_val, 100, 10)

k_val = np.arange(1, 50)
train = df_ncc.sample(frac=.5)
knn_train_mae_list = [get_kNN_mae(train, features, predict_val, k) for k in k_val]
knn_val_mae_list = [est_test_error(df_ncc, features, predict_val, k, 50)[0] for k in k_val] 

#kFold_train_mae_list = [get_kFoldCV_mae(train, features, predict_val, k, 5) for k in k_val]
#kFold_val_mae_list = [get_kFoldCV_mae(df_ncc, features, predict_val, k, 5) for k in k_val]

plt.plot(k_val, knn_val_mae_list, label = 'kNN val error')
plt.plot(k_val, knn_train_mae_list, label = 'kNN train error')
#plt.plot(k_val, kFold_train_mae_list, label = '5 Fold train error')
#plt.plot(k_val, kFold_val_mae_list, label = '5 Fold val error')

plt.xlabel('k Value')
plt.ylabel('MAE')
plt.legend()

<matplotlib.legend.Legend at 0x7fadeffb80a0>

As we can see in our graph for varying $k$ values, our prediction was correct in the sense that our constructed model was not very good at prediction. There seems to be a relatively high error. The major flaw is that, as we have seen in our analysis so far, the variables we used in our prediction model do not seem to be correlated to any information about the number of foul calls.

Conclusion¶

Our analysis has shown that the commonly held beliefs listed above about NBA referees do not have enough statistical evidence to be believed. There seems to be no evidence of any relation between stardom and favorable treatment; there seems to be no evidence of existence of bad referees; there seems to be no evidence of a significant advantage given to home teams; and there seems to be no evidence of referees calling less fouls in the Playoffs compared to the Regular Season.

	Rk	Player	Pos	Age	Tm	G	GS	MP	FG	FGA	...	ORB	DRB	TRB	AST	STL	BLK	TOV	PF	PTS	Player-additional
0	1	Precious Achiuwa	C	22	TOR	73	28	1725	5.5	12.6	...	3.0	6.8	9.9	1.7	0.8	0.9	1.8	3.2	13.9	achiupr01
1	2	Steven Adams	C	28	MEM	76	75	1999	3.8	6.9	...	6.3	7.4	13.7	4.6	1.2	1.1	2.1	2.8	9.5	adamsst01
2	3	Bam Adebayo	C	24	MIA	56	56	1825	8.0	14.4	...	2.7	8.4	11.1	3.7	1.6	0.9	2.9	3.4	21.1	adebaba01
3	4	Santi Aldama	PF	21	MEM	32	0	360	5.3	13.2	...	3.3	5.4	8.7	2.1	0.6	1.0	1.6	3.6	13.2	aldamsa01
4	5	LaMarcus Aldridge	C	36	BRK	47	12	1050	8.6	15.7	...	2.5	6.3	8.8	1.4	0.5	1.6	1.5	2.7	20.8	aldrila01

	Rk	Player	Pos	Age	Tm	G	MP	PER	TS%	3PAr	...	ORB	DRB	TRB	AST	STL	BLK	TOV	PF	PTS	MPG
0	1	Precious Achiuwa	C	22	TOR	73	1725	12.7	0.503	0.259	...	3.0	6.8	9.9	1.7	0.8	0.9	1.8	3.2	13.9	23.630137
1	2	Steven Adams	C	28	MEM	76	1999	17.6	0.560	0.003	...	6.3	7.4	13.7	4.6	1.2	1.1	2.1	2.8	9.5	26.302632
2	3	Bam Adebayo	C	24	MIA	56	1825	21.8	0.608	0.008	...	2.7	8.4	11.1	3.7	1.6	0.9	2.9	3.4	21.1	32.589286
3	5	LaMarcus Aldridge	C	36	BRK	47	1050	19.6	0.604	0.100	...	2.5	6.3	8.8	1.4	0.5	1.6	1.5	2.7	20.8	22.340426
4	6	Nickeil Alexander-Walker	SG	23	TOT	65	1466	10.5	0.475	0.497	...	0.9	3.7	4.6	3.8	1.1	0.6	2.3	2.5	17.0	22.553846

	period	time	call_type	committing	disadvantaged	decision	comments	game_details	page	file	...	committing_min	committing_team	committing_side	disadvantaged_min	disadvantaged_team	disadvantaged_side	type2	time_min	time_sec	time2
54498	Q4	01:33.8	Violation: Delay of Game	Aaron Henry	Aaron Henry	IC	Called + No Infraction	NaN	NaN	0012100023.csv	...	7.366667	PHI	home	7.366667	PHI	home	DELAY OF GAME	1	33.8	1.563333
54499	Q4	00:53.2	Violation: Delay of Game	Aaron Henry	Aaron Henry	CC	Called + Infraction	NaN	NaN	0012100023.csv	...	7.366667	PHI	home	7.366667	PHI	home	DELAY OF GAME	0	53.2	0.886667
54500	Q4	00:30.9	Violation: Delay of Game	Aaron Henry	Shaquille Harrison	CNC	No Called + No Infraction	NaN	NaN	0012100023.csv	...	7.366667	PHI	home	7.366667	PHI	home	DELAY OF GAME	0	30.9	0.515000
54501	Q4	00:23.3	Turnover: Traveling	Jaden Springer	Raptors	INC	No Called + Infraction	NaN	NaN	0012100023.csv	...	6.916667	PHI	home	NaN	TOR	away	TRAVELING	0	23.3	0.388333
54502	Q4	01:45	Foul: Shooting	Obi Toppin	Jayson Tatum	CNC	Toppin (NY) makes clean contact with the ball ...	NaN	NaN	0022100005.csv	...	28.383333	NYK	home	44.650000	BOS	away	SHOOTING	1	45.0	1.750000

	game id	correct calls	incorrect calls	net correct calls	ref 1	ref 2	ref 3
0	12100023.0	2	2	0	Ray Acosta	Evan Scott	Mark Lindsay
1	22100005.0	34	2	32	Scott Foster	Lauren Holtkamp	Ed Malloy
2	22100004.0	22	1	21	Derek Richardson	Mousa Dagher	Tyler Ford
3	22100007.0	15	0	15	Michael Smith	John Conley	Kevin Scott
4	22100003.0	33	3	30	Rodney Mott	Scott Wall	Nate Green

	name	correct calls	incorrect calls	net correct calls	average net correct calls per 2 min	difference of individual avg NCC/2 min and overall
0	Ray Acosta	113.000000	12.333333	100.666667	4.575758	-1.027148
1	Scott Foster	105.666667	7.000000	98.666667	6.166667	0.563761
2	Derek Richardson	133.333333	7.333333	126.000000	6.631579	1.028673
3	Michael Smith	134.000000	11.000000	123.000000	5.347826	-0.255080
4	Rodney Mott	79.000000	5.333333	73.666667	6.696970	1.094064

	Unnamed: 0	Unnamed: 1	Unnamed: 2	Per Game	Per Game.1	Per Game.2	Per Game.3	Per Game Relative	Per Game Relative.1	Per Game Relative.2	...	Home Minus Visitor	Home Minus Visitor.1	Home Minus Visitor.2	Home Minus Visitor.3	Home Minus Visitor.4	Relative to Average*	Relative to Average*.1	Relative to Average*.2	Relative to Average*.3	Relative to Average*.4
0	Ray Acosta	NBA	65	176.9	45.0	39.4	222.7	+0.7	+1.3	0.1	...	-.108	+0.1	+0.7	-0.2	-1.1	-.201	0.0	+0.6	-0.2	-2.9
1	Brandon Adair	NBA	59	176.6	41.9	37.7	219.5	+0.4	-1.8	-1.6	...	+.186	-2.6	+2.1	-1.0	+1.6	+.093	-2.7	+2.0	-1.0	-0.2
2	Bennie Adams	NBA	46	174.3	41.9	39.0	218.0	-1.9	-1.8	-0.3	...	+.174	-0.6	-0.3	+0.1	+3.0	+.081	-0.7	-0.4	+0.1	+1.2
3	Brent Barnaky	NBA	64	178.9	41.8	37.6	220.8	+2.7	-1.9	-1.7	...	-.063	+1.1	-0.8	+1.5	+1.2	-.156	+1.0	-0.9	+1.5	-0.6
4	Curtis Blair	NBA	61	176.2	41.2	38.7	217.5	0.0	-2.5	-0.6	...	+.148	+0.1	+0.4	-0.5	+2.1	+.055	0.0	+0.3	-0.5	+0.3

	Home PF	Away PF
Season
16-17	20.1	19.9
17-18	20.2	20.1
18-19	21.3	20.7
19-20	20.9	20.1
20-21	20.0	20.1
21-22	19.9	19.4

	Team	PF as Home	PF as Away	Net Difference
0	Atlanta	20.8	19.3	1.5
1	Boston	19.5	21.7	-2.2
2	Brooklyn	18.3	21.4	-3.1
3	Charlotte	21.4	19.4	2.0
4	Chicago	20.8	19.9	0.9

	PF in Reg Season	PF in Playoffs
Season
16-17	40.1	41.8
17-18	40.4	42.1
18-19	41.4	43.9
19-20	41.3	42.0
20-21	40.2	40.9
21-22	39.7	41.6

	Home Team	Away Team	Playoff or No	Net Correct Calls
Game ID
12000007.0	Cavaliers	Pacers	False	13
12100023.0	76ers	Raptors	False	0
21800552.0	Kings	Trail Blazers	False	26
21800561.0	Suns	76ers	False	27
21800564.0	Kings	Nuggets	False	19
...	...	...	...	...
52000131.0	Grizzlies	Spurs	True	20
52000211.0	Warriors	Grizzlies	True	28
52100121.0	Timberwolves	Clippers	True	23
52100201.0	Cavaliers	Hawks	True	26
52100211.0	Clippers	Pelicans	True	23