An Analysis of Commonly Held Fan Beliefs About Referees

Nathaniel Vaduthala

Website

Motivation

Much of fan discourse around refereeing is based on emotion. Many of the claims made by fans actually made an implicit relationship between variables. For instance, the oft-repeated claim that "star players get more calls" is making the claim that there is a positive correlation between the stardom of a player and the number of calls they receive. The goal of this project to analyze some of the commonly repeated claims made by NBA fans and determine if there is some truth to it.

Project Goals

In this project, I will attempt to see if commonly held beliefs about referees amongst fans of the NBA are true. In particular, these beliefs include the opinions:

  • "star players always receive preferential treatment when it comes to foul calls"
  • "some referees are fair, while others are biased"
  • "home teams receive favorable treatment compared to the away team"
  • "in the NBA Playoffs, the referees call less fouls"

Using this information, I will then attempt to use k-Nearest Neighbors to see if we can predict the number of total foul calls made in a game.

Do Star Players Receive Favorable Treatment?

Relation Between Single-Value Player Metrics and Personal Foul Call Amounts

Before we can even determine if there is a relationship between a player's stardom and the number of foul calls they receive, we first need to define what a "star player" is. To do so, I will be using common single-value player metrics. These metrics attempt to provide a single number that quantifies how talented a player is. The single-value player metrics I will be using are Win Shares per 48 (WS/48), and Value Over Replacement Plyer (VORP). Notably, I am not using Player Efficiency Rating (PER) here, as it seems to be no longer used in basketball analytics.

The higher these values are, the more of a star the corresponding player is. Therefore, we will attempt to see if there is a relationship between these values and the average number of foul calls per game. I imported one data set from Basketball Reference that included the average number of personal foul calls a player received, along with another data set from Basketball Reference that was primarily focused on single-value player metrics.

In [8]:
# Import data for BBall Ref

import pandas as pd
import numpy as np

url_br_2021_per36 = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/bball_ref_2021_per36.csv' # '2021' here means 2021-22 season
df_br_2021_per36 = pd.read_csv(url_br_2021_per36) # DataFrame containing players' average stats

url_br_2021_adv = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/bball_ref_2021_advanced.csv'
df_br_2021_adv = pd.read_csv(url_br_2021_adv) # DateFrame containing single-value metrics

df_br_2021_per36.head()
Out[8]:
Rk Player Pos Age Tm G GS MP FG FGA ... ORB DRB TRB AST STL BLK TOV PF PTS Player-additional
0 1 Precious Achiuwa C 22 TOR 73 28 1725 5.5 12.6 ... 3.0 6.8 9.9 1.7 0.8 0.9 1.8 3.2 13.9 achiupr01
1 2 Steven Adams C 28 MEM 76 75 1999 3.8 6.9 ... 6.3 7.4 13.7 4.6 1.2 1.1 2.1 2.8 9.5 adamsst01
2 3 Bam Adebayo C 24 MIA 56 56 1825 8.0 14.4 ... 2.7 8.4 11.1 3.7 1.6 0.9 2.9 3.4 21.1 adebaba01
3 4 Santi Aldama PF 21 MEM 32 0 360 5.3 13.2 ... 3.3 5.4 8.7 2.1 0.6 1.0 1.6 3.6 13.2 aldamsa01
4 5 LaMarcus Aldridge C 36 BRK 47 12 1050 8.6 15.7 ... 2.5 6.3 8.8 1.4 0.5 1.6 1.5 2.7 20.8 aldrila01

5 rows × 30 columns

After importing the data for the 2021-22 NBA season, it is necessary now to remove of any data points that cannot provide us any value. In order to take into account that different players play a different number of minutes, and therefore will most likely have a higher number of foul calls, I imported in the data set that proportionately normalized the players' average statistics to if they had played 36 minutes every game. However, in the case of players that had played either less than 15 minutes on average or had played less than 35 games, we cannot make any normalization, as they have not played enough to make a realistic extrapolation. Therefore, I want to remove these players from the dataset

But before I can do that, I have to first create a new column that described how many minutes per game a player played on average. This is because the dataset containing the players' average statistics did not contain this information. But, it did contain the total number of minutes played ('MP') and the total number of games played ('G'), and so with this, I was able to find the average number of minutes played ('MPG').

But, because the data set I am importing performs some extrapolation, there are some illogical values in the dataset. I will force every value that was greater than 6 in the "Personal Fouls" column of the dataset6 to be equal to 6, as a player cannot receive more than 6 fouls in an NBA game.

In [9]:
# clean up data for BBall Ref

df_br_2021_per36['MPG'] = df_br_2021_per36.apply(lambda row: row['MP'] / row['G'], axis = 1) # calculate minutes per game
df_br_2021_per36 = df_br_2021_per36.drop(df_br_2021_per36[(df_br_2021_per36['MPG'] < 15) | (df_br_2021_per36['G'] < 35)].index) # remove players with insufficient games/min playes
df_br_2021_per36.loc[df_br_2021_per36['PF'] > 6, 'PF'] = 6 # impossible to have more than 6 fouls 

df_br_2021_adv_pf = df_br_2021_adv.merge(df_br_2021_per36, how = 'inner') # add personal fouls column onto advanced stats dataframe

df_br_2021_adv_pf.head()
Out[9]:
Rk Player Pos Age Tm G MP PER TS% 3PAr ... ORB DRB TRB AST STL BLK TOV PF PTS MPG
0 1 Precious Achiuwa C 22 TOR 73 1725 12.7 0.503 0.259 ... 3.0 6.8 9.9 1.7 0.8 0.9 1.8 3.2 13.9 23.630137
1 2 Steven Adams C 28 MEM 76 1999 17.6 0.560 0.003 ... 6.3 7.4 13.7 4.6 1.2 1.1 2.1 2.8 9.5 26.302632
2 3 Bam Adebayo C 24 MIA 56 1825 21.8 0.608 0.008 ... 2.7 8.4 11.1 3.7 1.6 0.9 2.9 3.4 21.1 32.589286
3 5 LaMarcus Aldridge C 36 BRK 47 1050 19.6 0.604 0.100 ... 2.5 6.3 8.8 1.4 0.5 1.6 1.5 2.7 20.8 22.340426
4 6 Nickeil Alexander-Walker SG 23 TOT 65 1466 10.5 0.475 0.497 ... 0.9 3.7 4.6 3.8 1.1 0.6 2.3 2.5 17.0 22.553846

5 rows × 53 columns

Compare Overall Fouls to Single-Value Player Metrics

From here, we can see some results.

In [10]:
# Plot WS/48 vs PF
import matplotlib.pyplot as plt
from scipy import stats

df_br_2021_adv_pf.plot.scatter(x = 'WS/48', y = 'PF')

result = stats.linregress(x = df_br_2021_adv_pf['WS/48'], y = df_br_2021_adv_pf['PF'])

plt.plot(df_br_2021_adv_pf['WS/48'], result.intercept + result.slope*(df_br_2021_adv_pf['WS/48']), 'r')
plt.text(-0.05, 5.5, 'r^2 = %0.2f' % (result.rvalue) ** 2)
plt.title('WS/48 vs (Adjusted) Personal Fouls/Game')
plt.xlabel('Win Shares/48')
plt.ylabel('Adjusted Personal Fouls/Game')
Out[10]:
Text(0, 0.5, 'Adjusted Personal Fouls/Game')

We can see from the graph that there does not appear to be any relationship between an increased Win Shares per 48--and therefore increased stardom--and the average number of personal fouls called on the player. Although there is a positive correlation, the r-squared value is small enough that we cannot make a claim of there being any relation.

In [11]:
df_br_2021_adv_pf.plot.scatter(x = 'VORP', y = 'PF')

result2 = stats.linregress(x = df_br_2021_adv_pf['VORP'], y = df_br_2021_adv_pf['PF'])

plt.plot(df_br_2021_adv_pf['VORP'], result2.intercept + result2.slope*(df_br_2021_adv_pf['VORP']), 'r')
plt.text(2, 5.5, 'r^2 = %0.2f' % (result2.rvalue) ** 2)
plt.title('VORP vs (Adjusted) Personal Fouls/Game')
plt.xlabel('VORP')
plt.ylabel('Adjusted Personal Fouls/Game')
Out[11]:
Text(0, 0.5, 'Adjusted Personal Fouls/Game')

In a similar manner as in the first graph, we cannot conclude any definitive relationship between the stardom of a player and the number of foul call received in a game. In this case, the correlation is negative, but the r-squared value is once again too small for us to state a relationship.

Compare Actual Number of Favorable Calls to Single-Value Player Metrics

One issue in our analysis is that we are extrapolating information off of the average. Another issue is that we are just comparing the number of foul calls to the single value metrics. It may be the case that a player received a lot of calls in their favor, and that all these calls were correct.

In order to try and remedy these issues, I will using the NBA's Last 2 Minutes dataset to look at the actual favorable calls made, and see if there is a relationship between the single-value player metrics and the number of favorable calls made. The dataset I will be using was organized by Github user athawksfanatic. This dataset states the fouls called made in the last two minutes of every game and states if they were the correct call to make or the incorrect one.

In particular, what I will be counting is the net number of favorable calls. In the NBA's Last 2 Minutes dataset, there are columns that outlines which player was disadvantaged and which player committed the foul. As a result, I can count how many times every player was either at an advantageous position in a foul call or at a a disadvantageous position. By assigning the value 1 to an advantageous position, and a -1 to a disadvantageous position, I will count up all the situations that a player was in, and this will be their net number of favorable calls. From here, I will then see if there is a relationship between this variable and the single-value player metrics.

Before I can do this, I must first remove all the rows that contain data not in the 2021-22 season.

In [12]:
# Import data from NBA L2M, provided by atlhawksfanatic

url_l2m = 'https://raw.githubusercontent.com/atlhawksfanatic/L2M/master/1-tidy/L2M/L2M_stats_nba.csv'
df_l2m = pd.read_csv(url_l2m)
df_l2m = df_l2m.drop((df_l2m[(df_l2m['season'] != 2022) | (df_l2m['playoff'] == True)].index)) #'2022' here means 2021-22 season, and only want Regular Season
df_l2m.head()
/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py:3326: DtypeWarning: Columns (7,23,24,26,30,31,37,40,51) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)
Out[12]:
period time call_type committing disadvantaged decision comments game_details page file ... committing_min committing_team committing_side disadvantaged_min disadvantaged_team disadvantaged_side type2 time_min time_sec time2
54498 Q4 01:33.8 Violation: Delay of Game Aaron Henry Aaron Henry IC Called + No Infraction NaN NaN 0012100023.csv ... 7.366667 PHI home 7.366667 PHI home DELAY OF GAME 1 33.8 1.563333
54499 Q4 00:53.2 Violation: Delay of Game Aaron Henry Aaron Henry CC Called + Infraction NaN NaN 0012100023.csv ... 7.366667 PHI home 7.366667 PHI home DELAY OF GAME 0 53.2 0.886667
54500 Q4 00:30.9 Violation: Delay of Game Aaron Henry Shaquille Harrison CNC No Called + No Infraction NaN NaN 0012100023.csv ... 7.366667 PHI home 7.366667 PHI home DELAY OF GAME 0 30.9 0.515000
54501 Q4 00:23.3 Turnover: Traveling Jaden Springer Raptors INC No Called + Infraction NaN NaN 0012100023.csv ... 6.916667 PHI home NaN TOR away TRAVELING 0 23.3 0.388333
54502 Q4 01:45 Foul: Shooting Obi Toppin Jayson Tatum CNC Toppin (NY) makes clean contact with the ball ... NaN NaN 0022100005.csv ... 28.383333 NYK home 44.650000 BOS away SHOOTING 1 45.0 1.750000

5 rows × 64 columns

As one can see, this code currently produces a DtypeWarning. This should not be an issue, as this only affects columns that I am not interested in. However, we see that this dataset contains violations that aren't fouls. We are not interested in this, and so we will remove all rows that contain information about a referee call made that is not a foul. Along with that, some rows contain team names where there should be player names, and so those also have to be removed.

In [13]:
df_l2m = df_l2m.dropna(subset=['call_type', 'committing', 'disadvantaged', # Remove NaN values in columns we care about
                               'decision'])

df_l2m = df_l2m[df_l2m['committing'] != df_l2m['disadvantaged']] # Avoid possible repeats

# remove calls that are not fouls, like delay of game or a turnover
df_l2m = df_l2m[df_l2m['call_type'].str.contains('Foul', case=False)]

# removes all possible team names, except the Trail Blazers
mask_commit = df_l2m['committing'].str.count(' ').ge(1)
df_l2m = df_l2m[mask_commit]

mask_disadv = df_l2m['disadvantaged'].str.count(' ').ge(1)
df_l2m = df_l2m[mask_disadv]

df_l2m = df_l2m[(df_l2m['committing'] != 'Trail Blazers') & (df_l2m['disadvantaged'] != 'Trail Blazers')]

Now, we say that a player received a favorable call if they are called committing a foul and the decision is seen as an incorrect non-call, or if they are the disadvantaged player and received an incorrect call. We say that a player received an unfavorable call if they are called committing a foul and the decision is seen as an incorrect call, or if they are the disadvantaged player and the decision is seen as an incorrect non-call. All other cases can be considered to be a neutral (correct) call and so they will automatically have a value of 0.

However, we must also take into account the fact that different players will play different amounts of time in the last two minutes and so the number of foul situations they were involved in will also differ, and therefore it may not be fair to directly compare the net number of favorable calls between players. In order to mitigate this issue, we will instead look at the net number of favorable calls per 100 possessions. In order to find this value, we will first find the net number of favorable calls, and then we will find the number of possessions that a player was involved in a foul call. We will then divide the net number of favorable calls by the number of possessions in order to find the net number of favorable calls per possion, and then multiply this value by 100.

In [14]:
df_fav = pd.DataFrame()
df_fav = pd.DataFrame(pd.concat([df_l2m['committing'], df_l2m['disadvantaged']], ignore_index = True).unique())
df_fav['favorable count'] = ' '
df_fav['unfavorable count'] = ' '
df_fav['possession count'] = ' '
df_fav.columns = ['Player', 'favorable count', 'unfavorable count', 'possession count']
df_fav = df_fav.set_index('Player')

for player in df_fav.index:
  fav_count = df_l2m.loc[
    (df_l2m['committing'].eq(player) & df_l2m['decision'].eq('INC')) |
    (df_l2m['disadvantaged'].eq(player) & df_l2m['decision'].eq('IC'))
  ]
  unfav_count = df_l2m.loc[
    (df_l2m['committing'].eq(player) & df_l2m['decision'].eq('IC')) |
    (df_l2m['disadvantaged'].eq(player) & df_l2m['decision'].eq('INC'))
  ]
  poss_count = df_l2m['committing'].tolist().count(player) + df_l2m['disadvantaged'].tolist().count(player)
  df_fav.at[player, 'possession count'] = poss_count
  df_fav.at[player, 'favorable count'] = len(fav_count)
  df_fav.at[player, 'unfavorable count'] = len(unfav_count)

df_fav['net favorable count'] = df_fav['favorable count'] - df_fav['unfavorable count']
df_fav['net fav count per 100 poss'] = df_fav['net favorable count'] / df_fav['possession count'] * 100

df_fav = df_fav.reset_index()

df_fav.shape
Out[14]:
(435, 6)

We also need to consider the fact that some players have not been involved in enough possesions to where we can say that their net number of favorable calls per 100 possessions is a fair extrapolation. To fix this issue, we will drop all the players that have beenn involved in less than 5 possesions. The reason we use 5 is because if we go any lower, then we simply no longer will be fixing the problem, but if we go any higher, then there is a chance that we start cutting off meaningful data. The choice of having 5 as the baseline is also justified by the fact that the 25% percentile is 7 and the 50% percentile is 22, and so the vast majority of the data here is above the 5 benchmark.

In [15]:
pd.to_numeric(df_fav['possession count'], errors = 'coerce').describe()
Out[15]:
count    435.000000
mean      32.611494
std       31.512463
min        1.000000
25%        7.000000
50%       22.000000
75%       52.000000
max      173.000000
Name: possession count, dtype: float64
In [16]:
df_fav = df_fav[df_fav['possession count'] > 4]

From here, we now need to join this DataFrame with the df_br_2021_adv_pf DataFrame in order to see if there is a relationship between Win Shares per 48 and VORP.

In [17]:
df_br_2021_adv_pf_fav = df_br_2021_adv.merge(df_fav, how = 'inner') 

df_br_2021_adv_pf_fav.plot.scatter(x = 'WS/48', y = 'net fav count per 100 poss')

result = stats.linregress(x = df_br_2021_adv_pf_fav['WS/48'], y = pd.to_numeric(df_br_2021_adv_pf_fav['net fav count per 100 poss']))

plt.plot(df_br_2021_adv_pf_fav['WS/48'], result.intercept + result.slope*(df_br_2021_adv_pf_fav['WS/48']), 'r')
plt.text(-0.25, 5.5, 'r^2 = %0.2f' % (result.rvalue) ** 2)
plt.title('WS/48 vs Favorable Calls Per 100 Poss')
plt.xlabel('Win Shares/48')
plt.ylabel('Favorable Calls Per 100 Poss')

df_br_2021_adv_pf_fav.plot.scatter(x = 'VORP', y = 'net fav count per 100 poss')

result = stats.linregress(x = df_br_2021_adv_pf_fav['VORP'], y = pd.to_numeric(df_br_2021_adv_pf_fav['net fav count per 100 poss']))

plt.plot(df_br_2021_adv_pf_fav['VORP'], result.intercept + result.slope*(df_br_2021_adv_pf_fav['WS/48']), 'r')
plt.text(5, 6.5, 'r^2 = %0.2f' % (result.rvalue) ** 2)
plt.title('VORP vs Favorable Calls Per 100 Poss')
plt.xlabel('VORP')
plt.ylabel('Favorable Calls Per 100 Poss')
Out[17]:
Text(0, 0.5, 'Favorable Calls Per 100 Poss')

We can see that there seems to be no relationship between the net number of favorable calls per 100 possesions and VORP, and the net number of favorable calls per 100 possesions and Win Shares per 48.

Conclusion Regarding The Relationship Between Star Players and Foul Calls

Based upon our analysis, we can conclude that there seems to be no evidence indicating that there exists a relationship between if a player is a star player and if they receive more beneficial foul calls.

Are Some Referees Bad?

Compare the Actual Number of Correct Calls Made to the Average

Our next goal is to determine if the calls made by some referees show evidence of statistical bias, which would then make them be considered bad. We will first use the Last 2 Minutes Dataset to determine the average number of the net correct calls made by the referees, which is defined to be the diffeence between the number of correct calls and the number of incorrect calls, and from there we will determine if there are any referees that can be considered statistical outliers. We will then determine the average number of correct calls made and also see if there are any statistical outliers.

To find the average number of net correct calls made, we first need to find all the individual games that have their data in the DataSet, and then from there find the number of correct and incorrect foul calls that happened within those games. Finally, we will sum up all the incorrect foul calls and then divide this value by the number of games.

In [18]:
# Import data from NBA L2M, provided by atlhawksfanatic

url_l2m = 'https://raw.githubusercontent.com/atlhawksfanatic/L2M/master/1-tidy/L2M/L2M_stats_nba.csv'
df_l2m_ref = pd.read_csv(url_l2m)
df_l2m_ref = df_l2m_ref.drop((df_l2m_ref[(df_l2m_ref['season'] != 2022) | (df_l2m_ref['playoff'] == True)].index)) #'2022' here means 2021-22 season, and only want Regular Season
df_l2m_ref['game_id'].head()

# separate DataFrame into individual games

df_games_list = []
for id in df_l2m_ref['game_id'].unique():
  df_games_list.append(df_l2m_ref[df_l2m_ref['game_id'] ==id])

df_game_info = pd.DataFrame()
df_game_info['game id'] = df_l2m_ref['game_id'].unique()
df_game_info['correct calls'] = ' '
df_game_info['incorrect calls'] = ' '
df_game_info['net correct calls'] = ' '
df_game_info['ref 1'] = ' '
df_game_info['ref 2'] = ' '
df_game_info['ref 3'] = ' '

# calculate calls made in each game

call_list = [pd.DataFrame(game['decision'].value_counts()) for game in df_games_list]
call_list = [call.reset_index() for call in call_list]
for call in call_list:
  call.columns = ['type', 'count']

for call in call_list:
  call_type = ['IC', 'INC', 'CC', 'CNC']
  for call_t in call_type:
    if (call_t not in call.values):
      call.loc[len(call.index)] = [call_t, 0]
  
for call in call_list:
  call.set_index('type', inplace=True)

# add ref names to dataframe

correct_call_list = [call.loc['CNC', 'count'] + call.loc['CC', 'count'] for call in call_list]
incorrect_call_list = [call.loc['IC', 'count'] + call.loc['INC', 'count'] for call in call_list]
ref_1_list = [game[['OFFICIAL_1']].loc[game.first_valid_index(), 'OFFICIAL_1'] for game in df_games_list]
ref_2_list = [game[['OFFICIAL_2']].loc[game.first_valid_index(), 'OFFICIAL_2'] for game in df_games_list]
ref_3_list = [game[['OFFICIAL_3']].loc[game.first_valid_index(), 'OFFICIAL_3'] for game in df_games_list]

df_game_info['correct calls'] = correct_call_list
df_game_info['incorrect calls'] = incorrect_call_list
df_game_info['ref 1'] = ref_1_list
df_game_info['ref 2'] = ref_2_list
df_game_info['ref 3'] = ref_3_list
df_game_info['net correct calls'] = df_game_info['correct calls'] - df_game_info['incorrect calls']

df_game_info.head()
/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py:3326: DtypeWarning: Columns (7,23,24,26,30,31,37,40,51) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)
Out[18]:
game id correct calls incorrect calls net correct calls ref 1 ref 2 ref 3
0 12100023.0 2 2 0 Ray Acosta Evan Scott Mark Lindsay
1 22100005.0 34 2 32 Scott Foster Lauren Holtkamp Ed Malloy
2 22100004.0 22 1 21 Derek Richardson Mousa Dagher Tyler Ford
3 22100007.0 15 0 15 Michael Smith John Conley Kevin Scott
4 22100003.0 33 3 30 Rodney Mott Scott Wall Nate Green

What we will now do is associate each referee with the number of net correct calls made in the games they oversaw, along with their average net correct calls made per 2 minutes. We find this latter value in an attempt to take into account the fact that some referees oversee more games than others.

However, it should be mentioned that this analysis is flawed. With this method, I will be "sharing the blame." It could be the case that a referee did not actually make many, or any for that matter, incorrect calls, but they will still be labelled as doing so if they were working with a referee that made incorrect calls. To try and mitigate this issue, I will be dividing the number of correct and incorrect calls associated with each referee by 3. Thus, I am making the assumption that each ref is equally likely to make an incorrect call. This is a faulty assumption.

In [19]:
df_ref_info = pd.DataFrame()
df_ref_info['name'] = ' '
df_ref_info['correct calls'] = ' '
df_ref_info['incorrect calls'] = ' '
df_ref_info['net correct calls'] = ' '

df_ref_info['name'] = pd.DataFrame(pd.concat([df_game_info['ref 1'], df_game_info['ref 2'], df_game_info['ref 3']], ignore_index = True).unique())
ref_info_list = [df_game_info.loc[(df_game_info['ref 1'] == name) | (df_game_info['ref 2'] == name) | (df_game_info['ref 3'] == name)] for name in list(df_ref_info['name'])]

cor_call_list = [ref['correct calls'].sum() for ref in ref_info_list]
df_ref_info['correct calls'] = np.divide(cor_call_list, 3)

incor_call_list = [ref['incorrect calls'].sum() for ref in ref_info_list]
df_ref_info['incorrect calls'] = np.divide(incor_call_list, 3)

net_cor_call_list = [ref['net correct calls'].sum() for ref in ref_info_list]
df_ref_info['net correct calls'] = np.divide(net_cor_call_list, 3)

df_ref_info['average net correct calls per 2 min'] = np.divide([ref['net correct calls'].sum() / len(ref) for ref in ref_info_list], 3)
df_ref_info['difference of individual avg NCC/2 min and overall'] = ( df_ref_info['average net correct calls per 2 min'] 
                                                                    - (df_ref_info['average net correct calls per 2 min'].sum() / len(df_ref_info)))

df_ref_info.head()
Out[19]:
name correct calls incorrect calls net correct calls average net correct calls per 2 min difference of individual avg NCC/2 min and overall
0 Ray Acosta 113.000000 12.333333 100.666667 4.575758 -1.027148
1 Scott Foster 105.666667 7.000000 98.666667 6.166667 0.563761
2 Derek Richardson 133.333333 7.333333 126.000000 6.631579 1.028673
3 Michael Smith 134.000000 11.000000 123.000000 5.347826 -0.255080
4 Rodney Mott 79.000000 5.333333 73.666667 6.696970 1.094064

We now remove all referees that have less than 10 minutes reffed in this dataset. We remove them because there is not sufficient data to make any claims about possible bias.

In [20]:
df_ref_info['minutes reffed'] = [2*len(ref) for ref in ref_info_list]

df_ref_info = df_ref_info[df_ref_info['minutes reffed'] > 9]

df_ref_info.head()
Out[20]:
name correct calls incorrect calls net correct calls average net correct calls per 2 min difference of individual avg NCC/2 min and overall minutes reffed
0 Ray Acosta 113.000000 12.333333 100.666667 4.575758 -1.027148 44
1 Scott Foster 105.666667 7.000000 98.666667 6.166667 0.563761 32
2 Derek Richardson 133.333333 7.333333 126.000000 6.631579 1.028673 38
3 Michael Smith 134.000000 11.000000 123.000000 5.347826 -0.255080 46
4 Rodney Mott 79.000000 5.333333 73.666667 6.696970 1.094064 22

Let us now visualize how this data looks through the use of a scatter plot.

In [21]:
df_ref_info.plot.scatter(x = 'name', y = 'difference of individual avg NCC/2 min and overall')

plt.title('Difference Between Individual Average Net Correct Calls/2 min and Overall')
plt.tick_params(
    axis='x',          
    which='both',      
    bottom=True,      
    top=False,         
    labelbottom=False) 
plt.xlabel('Referee')
plt.ylabel('Difference')
Out[21]:
Text(0, 0.5, 'Difference')

As seen in the scatter plot, the vast majority of referees fall within committing either 1 incorrect or 1 correct call more than the average. We cannot attribute this to any bias, and so we can say that for the vast majority of referees, there is not sufficient evidence to claim that they are biased, at least within the last two minutes.

With that being said, it is interesting to note that we can see clearly that there is one data point that is significantly higher than the rest, and therefore consistently makes better calls than most referees, and there is another at the bottom that consistently makes worse calls than most referees.

In [22]:
print(df_ref_info.iloc[df_ref_info['difference of individual avg NCC/2 min and overall'].idxmin()])
print(df_ref_info.iloc[df_ref_info['difference of individual avg NCC/2 min and overall'].idxmax()])
name                                                  Bennie Adams
correct calls                                                 58.0
incorrect calls                                                6.0
net correct calls                                             52.0
average net correct calls per 2 min                       3.714286
difference of individual avg NCC/2 min and overall        -1.88862
minutes reffed                                                  28
Name: 29, dtype: object
name                                                  Scott Twardoski
correct calls                                               91.333333
incorrect calls                                                   6.0
net correct calls                                           85.333333
average net correct calls per 2 min                          6.095238
difference of individual avg NCC/2 min and overall           0.492332
minutes reffed                                                     28
Name: 64, dtype: object

It appears that Bennie Adams, on average, consistently makes worse calls in the last 2 minutes than most referees, while Scott Twardoski makes better calls in the last two minutes than most referees. But once again, since this analysis does not specifically consider which referee made the incorrect call, and instead views the three referees in the game as one unit, it is not fair for me to say that Bennie Adams can be considered a bad referee.

A possible way to determine if a referee such as Bennie Adams is being negatively impacted in this analysis due to the other referees they are working with could be to see how many net correct calls their most frequent co-referees make, and determine if there is any relationship between the net number of correct calls a referee such as Bennie Adams make and the number that their co-referees make.

Calculate Number of Overall Calls Made

In order to develop a better understanding on if some referees exhibit some form of bias or not, we will now consider the overall average number of calls made by each individual referee. Let us first import the data.

In [126]:
url_br_2021_ref = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/BBall-Ref_Ref.csv'
df_br_2021_ref = pd.read_csv(url_br_2021_ref)

df_br_2021_ref.drop(labels=0, axis=0, inplace=True)
df_br_2021_ref.reset_index(drop=True, inplace=True)

df_br_2021_ref['Per Game Relative.2'] = pd.to_numeric(df_br_2021_ref['Per Game Relative.2'])
df_br_2021_ref['Unnamed: 2'] = pd.to_numeric(df_br_2021_ref['Unnamed: 2'])


df_br_2021_ref = df_br_2021_ref[df_br_2021_ref['Unnamed: 2'] >= 40] # need sufficient amount of games

df_br_2021_ref.head()
Out[126]:
Unnamed: 0 Unnamed: 1 Unnamed: 2 Per Game Per Game.1 Per Game.2 Per Game.3 Per Game Relative Per Game Relative.1 Per Game Relative.2 ... Home Minus Visitor Home Minus Visitor.1 Home Minus Visitor.2 Home Minus Visitor.3 Home Minus Visitor.4 Relative to Average* Relative to Average*.1 Relative to Average*.2 Relative to Average*.3 Relative to Average*.4
0 Ray Acosta NBA 65 176.9 45.0 39.4 222.7 +0.7 +1.3 0.1 ... -.108 +0.1 +0.7 -0.2 -1.1 -.201 0.0 +0.6 -0.2 -2.9
1 Brandon Adair NBA 59 176.6 41.9 37.7 219.5 +0.4 -1.8 -1.6 ... +.186 -2.6 +2.1 -1.0 +1.6 +.093 -2.7 +2.0 -1.0 -0.2
2 Bennie Adams NBA 46 174.3 41.9 39.0 218.0 -1.9 -1.8 -0.3 ... +.174 -0.6 -0.3 +0.1 +3.0 +.081 -0.7 -0.4 +0.1 +1.2
3 Brent Barnaky NBA 64 178.9 41.8 37.6 220.8 +2.7 -1.9 -1.7 ... -.063 +1.1 -0.8 +1.5 +1.2 -.156 +1.0 -0.9 +1.5 -0.6
4 Curtis Blair NBA 61 176.2 41.2 38.7 217.5 0.0 -2.5 -0.6 ... +.148 +0.1 +0.4 -0.5 +2.1 +.055 0.0 +0.3 -0.5 +0.3

5 rows × 31 columns

The only columns we care about are "Unnamed: 0", "Per Game Relative.3", "Home Minus Visitor.3", and "Per Game.2". The first mentioned column states the name of the referee. The second mentioned one corresponds to the net difference between the number of foul calls the individual referee made and the number of foul calls on average made. We will use this to see if there are any referees that are outliers when it comes to the number of fouls called on average.

The third mentioned column corresponds to the average net difference in fouls called against the home team and the fouls called against the away team. So a negative number indicates that away teams are called for more fouls on average. However, consider a situation where Referee 1 calls on average 4 fouls on the home team, but 8 fouls on the away team. Now, suppose Referee 2 calls on average 20 fouls on the home team, but 25 fouls on the away team. Strictly comparing the difference would imply that Referee 2 has more of a bias to home teams than Referee 1, but this does not contain the entire information. Therefore, we will normalize this difference using the fourth mentioned column, which says the number of fouls called in total on average. To increase readability, we will have this number represented as a percentage.

In [127]:
df_br_2021_ref = df_br_2021_ref[['Unnamed: 0', 'Per Game.2', 'Home Minus Visitor.3', 'Per Game Relative.3']]

df_br_2021_ref.columns  =['Referee', 'Total Fouls Called', 'Home Minus Visitor Foul Calls', 'Difference between Ref and Avg']

df_br_2021_ref = df_br_2021_ref.astype({'Home Minus Visitor Foul Calls' : "float", 'Total Fouls Called' : "float", "Difference between Ref and Avg" : "float"})

df_br_2021_ref['Percent Normalized Home Minus Visitor'] = 100 * df_br_2021_ref['Home Minus Visitor Foul Calls'] / df_br_2021_ref['Total Fouls Called']

df_br_2021_ref.head()
Out[127]:
Referee Total Fouls Called Home Minus Visitor Foul Calls Difference between Ref and Avg Percent Normalized Home Minus Visitor
0 Ray Acosta 39.4 -0.2 1.5 -0.507614
1 Brandon Adair 37.7 -1.0 -1.7 -2.652520
2 Bennie Adams 39.0 0.1 -3.2 0.256410
3 Brent Barnaky 37.6 1.5 -0.4 3.989362
4 Curtis Blair 38.7 -0.5 -3.7 -1.291990

Now, with this data, we will first look to see if there are any referees that are outliers in reagrds to average number of foul calls per game and the amount of fouls called on home teams versus away teams.

In [128]:
import matplotlib.pyplot as plt

df_br_2021_ref.plot.scatter(x='Referee', y = 'Difference between Ref and Avg')

plt.title('Per Game Foul Call Difference With League Average')
plt.tick_params(
    axis='x',          
    which='both',      
    bottom=True,      
    top=False,         
    labelbottom=False) 
plt.xlabel('Referee')
plt.ylabel('Difference')

df_br_2021_ref.plot.scatter(x='Referee', y = 'Percent Normalized Home Minus Visitor')
plt.title('Percent Difference of Fals Called Against Home and Visitor')
plt.tick_params(
    axis='x',          
    which='both',      
    bottom=True,      
    top=False,         
    labelbottom=False) 
plt.xlabel('Referee')
plt.ylabel('Difference')
Out[128]:
Text(0, 0.5, 'Difference')

As we can see, it appears that on average, most referees do not have a significant difference in regards to the number of fouls they call in total, and the fouls they call between home and away teams. In the case for the fouls called between home and away team, the max percent difference is approximately $\pm \% 4$, which is not enough to claim any evidence of bias in this area for any referee.

However, we do see that there are some referees that tend to call significantly more or less total amount of foul calls per game. According to Basketball-Reference, the average number of fouls called per game in the 2021-22 NBA season was 40.7. So we will now look to find out which referees called 4 more foul calls on average, or 4 less foul calls on average.

In [131]:
print(df_br_2021_ref.loc[(df_br_2021_ref['Difference between Ref and Avg'] >= 4) | (df_br_2021_ref['Difference between Ref and Avg'] <= -4), 'Referee'])
28          David Guthrie
45             Matt Myers
49    Gediminas Petraitis
56             Evan Scott
58            Aaron Smith
59          Michael Smith
62          Dedric Taylor
Name: Referee, dtype: object

These referees call an average number of total foul calls that is at least $\% 10$ away from the league average. This is a significant amount. This means that these referees either tend to be very quick to call a foul, or they tend to be more conservative in their foul calling. But, as mentioned above, it does not seem to be the case that any of these referees exhibit any bias in favor or against the home team or away team.

Conclusion Regarding Existence of Bad Referees

Based upon our analysis, we cannot conclude that there exists bad referees. When looking at the number of net correct calls made, all the referees fall within a reasonable window. The same occurs when comparing the difference in foul calls made to the home team versus the away team.

We can say that there are some referees who either call significantly more fouls or significantly less fouls than the league average, but they do not appear to call or not call these fouls in a biased manner.

Do Home Teams Receive Beneficial Treatment?

We will now look to determine if there is any relationship between the number of foul calls and if the team is an away team or a home team. We will be looking at data from the 2016-17 NBA season to the 2021-22 season. The reason why we are not looking at data before the 2016 season is because the NBA went through the "3 Point Revolution" through that time, resulting in enormous changes to NBA offenses. As a result, through my own judgement, I believe that it is not helpful to use data from before the 2016 season to make any conclusions about the modern NBA.

We will first look to see what are the average number of fouls called on the home team and compare it to the amount called on the away team to determine if there is any difference in how they are referred. The data we will be using from this point on comes from TeamRank, and luckily the data is already in an extremely clean and neat format and is immediately ready to be used.

In [149]:
url_home_away_2016_22_pf = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/teamrank_2016-22_PF_league_avg.csv'
df_br_2016_22_home_away_pf = pd.read_csv(url_home_away_2016_22_pf)
df_br_2016_22_home_away_pf.columns = ['Season', 'Home PF', 'Away PF']
df_br_2016_22_home_away_pf.set_index('Season', inplace=True)
df_br_2016_22_home_away_pf
Out[149]:
Home PF Away PF
Season
16-17 20.1 19.9
17-18 20.2 20.1
18-19 21.3 20.7
19-20 20.9 20.1
20-21 20.0 20.1
21-22 19.9 19.4

Let us see how this looks visually.

In [151]:
ax = df_br_2016_22_home_away_pf.plot.bar(rot=0, title='Average Number of Fouls Called on Home vs Away')
ax.set_ylabel('Foul Calls')
Out[151]:
Text(0, 0.5, 'Foul Calls')

As we can see, there is a negligible difference between the number of fouls called ont the home team versus the amount called on the away team. But, it should be noted that for all seasons listed here, home teams did receive more foul calls on average than away teams.

Now, we will look at the difference in foul calls that each team received as a home team versus as an away team. If the net difference is positive, then they received more calls on average as the home team.

In [153]:
url_home_away_pf_diff_2021 = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/teamrank_21_team_PF_avg.csv'
df_br_2021_home_away_pf_diff = pd.read_csv(url_home_away_pf_diff_2021)

df_br_2021_home_away_pf_diff.columns = ['Team', 'PF as Home', 'PF as Away']
df_br_2021_home_away_pf_diff['Net Difference'] = df_br_2021_home_away_pf_diff['PF as Home'] - df_br_2021_home_away_pf_diff['PF as Away']

df_br_2021_home_away_pf_diff.head()
Out[153]:
Team PF as Home PF as Away Net Difference
0 Atlanta 20.8 19.3 1.5
1 Boston 19.5 21.7 -2.2
2 Brooklyn 18.3 21.4 -3.1
3 Charlotte 21.4 19.4 2.0
4 Chicago 20.8 19.9 0.9

Let us see how this looks graphically.

In [154]:
df_br_2021_home_away_pf_diff.plot.scatter(x='Team', y='Net Difference')
plt.title('Difference Between Avg Number of Fouls Committed as Home vs Away Team')
plt.tick_params(
    axis='x',          
    which='both',      
    bottom=True,      
    top=False,         
    labelbottom=False) 
plt.xlabel('Team')
plt.ylabel('Difference')
Out[154]:
Text(0, 0.5, 'Difference')

We can see that on average, the net difference in how teams are called is within the $\pm 1$ range. Some teams do receive less calls as the away team, but none of them have a difference of at least $\pm 4$. We will use this threshold of "$4$" later as well.

Conclusion Regarding Potential Advantage for Home Teams

Based upon our analysis, we cannot conlude that there is statistical evidence for home teams receiving beneficial treatment, at least based upon foul call count. it is true that throughout all the listed seasons above, home teams did receive more foul calls, but the difference in foul calls made is too small to argue that there is bias.

Are Less Fouls Called in the Playoffs?

Now, we will look to see if there is any significant difference between the number of fouls called on average in the regular season and the playoffs from the 2016-17 season to the 2021-22 sseason. We will again be using TeamRank as our source of data, and so luckily again we will not have to perform any cleaning up of our data.

In [156]:
url_2016_21_reg_pf = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/teamrank_2016_21_reg_seas_league_PF_avg.csv'
url_2016_21_playoff_pf = 'https://raw.githubusercontent.com/NVaduthala3/cmps-6160-project/main/teamrank_2016_21_playoffs_league_PF_avg.csv'

df_2016_21_reg_pf = pd.read_csv(url_2016_21_reg_pf)
df_2016_21_playoff_pf = pd.read_csv(url_2016_21_playoff_pf)

df_2016_21_reg_pf.columns = ['Season', 'PF in Reg Season']
df_2016_21_playoff_pf.columns = ['Season', 'PF in Playoffs']

df_2016_21_reg_pf.set_index('Season', inplace=True)
df_2016_21_playoff_pf.set_index('Season', inplace=True)

df_2016_21_reg_playoff_pf = df_2016_21_reg_pf.join(df_2016_21_playoff_pf, how='outer')

df_2016_21_reg_playoff_pf
Out[156]:
PF in Reg Season PF in Playoffs
Season
16-17 40.1 41.8
17-18 40.4 42.1
18-19 41.4 43.9
19-20 41.3 42.0
20-21 40.2 40.9
21-22 39.7 41.6

Now let us see how this looks visually.

In [155]:
ax2 = df_2016_21_reg_playoff_pf.plot.bar(rot=0)
plt.title('Average Number of Fouls Called in Reg. Season vs Playoffs')
ax2.set_ylabel('Number of Fouls')
Out[155]:
Text(0, 0.5, 'Number of Fouls')

Conclusion regarding Number of Foul Calls in the Playoffs

As we can see, there seems to be a negligible difference in the average number of fouls calls made in the Playoffs compared to the regular season. It should be noted that for every season, the number of foul calls actually increased in the Playoffs compared to the Regular Season, and so in fact more fouls are called in the Playoffs than the Regular Season

Prediction Model

We will now attempt to create a predictive model that will predict the number of net correct foul calls that a team will receive in a game based upon if they are a home team or away team and if the game is in the Regular Season or the Playoffs. We will be using k-Nearest Neighbor in this model. The reason why we will use k-Nearest Neighbors is because it is fair to assume that if two teams play games that have similar values for the input variables mentioned above, then they will receive a similar amount of foul calls.

We will not be including any information about the number of star players or who will be a referee into this model, because as we have just seen above, there seems to be no relationship betwen these variables and the number of net correct foul calls.

We will be using information from the NBA's Last Two Minutes Dataset.

From what we can see throughout this entire analysis, there is close to no evidence for any sense of correlation between if a team is the home or away one and the number of fouls called, and if it a game is during the regular season or playoffs and the number of calls made. So our prediction model most likely will not be very precise or accurate.

In [239]:
url_l2m2 = 'https://raw.githubusercontent.com/atlhawksfanatic/L2M/master/1-tidy/L2M/L2M_stats_nba.csv'
df_l2m2 = pd.read_csv(url_l2m)


df_ncc = pd.DataFrame()
df_ncc['Game ID'] = ' '
df_ncc['Home Team'] = ' '
df_ncc['Away Team'] = ' '
df_ncc['Playoff or No'] = ' '
df_ncc['Net Correct Calls'] = ' '

df_ncc['Game ID'] = df_l2m2['GameId'].unique()
df_ncc['Home Team'] = (df_l2m2.drop_duplicates(subset='GameId'))['home_team'].reset_index(drop=True)
df_ncc['Away Team'] = (df_l2m2.drop_duplicates(subset='GameId'))['away_team'].reset_index(drop=True)
df_ncc['Playoff or No'] = (df_l2m2.drop_duplicates(subset='GameId'))['playoff'].reset_index(drop=True)

ncc = []
for call in df_l2m2['decision']:
  if (call == 'CC') | (call == 'CNC'):
    ncc.append(1)
  elif (call == 'IC') | (call == 'INC'):
    ncc.append(-1)
  else:
    ncc.append(0)

df_l2m2['Correct Call or No'] = ncc

df_l2m2.groupby('GameId')['Correct Call or No'].sum()
df_ncc.dropna(subset=['Game ID'], inplace=True)
df_ncc.sort_values(by=['Game ID'], inplace=True)
df_ncc.set_index('Game ID', drop=True, inplace=True)

df_ncc['Net Correct Calls'] = df_l2m2.groupby('GameId')['Correct Call or No'].sum()

df_ncc
/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py:3326: DtypeWarning: Columns (7,23,24,26,30,31,37,40,51) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)
Out[239]:
Home Team Away Team Playoff or No Net Correct Calls
Game ID
12000007.0 Cavaliers Pacers False 13
12100023.0 76ers Raptors False 0
21800552.0 Kings Trail Blazers False 26
21800561.0 Suns 76ers False 27
21800564.0 Kings Nuggets False 19
... ... ... ... ...
52000131.0 Grizzlies Spurs True 20
52000211.0 Warriors Grizzlies True 28
52100121.0 Timberwolves Clippers True 23
52100201.0 Cavaliers Hawks True 26
52100211.0 Clippers Pelicans True 23

1628 rows × 4 columns

Now we implement kNN and test the errors.

In [253]:
from sklearn.preprocessing import StandardScaler
from sklearn.feature_extraction import DictVectorizer
from sklearn.neighbors import KNeighborsRegressor
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score


def est_test_error(df, features, pred_val, k, iter_count): # iter_count must be at least 2
  mae_list = []
  rmse_list = []
  for i in range(1, iter_count):
    error_pair = get_cv_val_error2(df, features, pred_val, k)
    mae_list.append(error_pair[0])
    rmse_list.append(error_pair[1])
  
  return (np.mean(mae_list), np.mean(rmse_list))

def get_cv_val_error2(df, features, predict_val, k): # cross-validation model
    # define train and val sets
    train = df.sample(frac = .5)
    val = df.drop(train.index)

    x_train_dict = train[features].to_dict(orient="records")
    x_val_dict = val[features].to_dict(orient = "records")
    y_train = train[predict_val]
    y_val = val[predict_val]
  
    vec = DictVectorizer(sparse=False)
    vec.fit(x_train_dict)
    x_train = vec.transform(x_train_dict)
    x_val = vec.transform(x_val_dict)

    # standardize 
    scaler = StandardScaler()
    scaler.fit(x_train)
    x_train_sc = scaler.transform(x_train)
    x_val_sc = scaler.transform(x_val)
    
    # Fit kNN model
    model = KNeighborsRegressor(n_neighbors=k)
    model.fit(x_train_sc, y_train)
    
    # Predict on val set
    y_val_pred = model.predict(x_val_sc)
    rmse1 = np.sqrt(((y_val - y_val_pred) ** 2).mean())
    mae1 = ((y_val - y_val_pred).abs()).mean()

    # Perform cross-validation 
    (mae2, rmse2) = get_val_error_mod(x_val_dict, y_val, x_train_dict, y_train)

    # Estimate test error
    rmse_list = [rmse1, rmse2]
    mae_list = [mae1, mae2]
    return (np.mean(mae_list), np.mean(rmse_list))

def get_val_error_mod(X_train_dict, y_train, X_val_dict, y_val): # called back when swapping for cross-val
    
    # convert categorical variables to dummy variables
    vec = DictVectorizer(sparse=False)
    vec.fit(X_train_dict)
    X_train = vec.transform(X_train_dict)
    X_val = vec.transform(X_val_dict)

    # standardize the data
    scaler = StandardScaler()
    scaler.fit(X_train)
    X_train_sc = scaler.transform(X_train)
    X_val_sc = scaler.transform(X_val)
    
    # Fit a 10-nearest neighbors model.
    model = KNeighborsRegressor(n_neighbors=10)
    model.fit(X_train_sc, y_train)
    
    # Make predictions on the validation set.
    y_val_pred = model.predict(X_val_sc)
    rmse = np.sqrt(((y_val - y_val_pred) ** 2).mean())
    mae = ((y_val - y_val_pred).abs()).mean()
    
    return (mae, rmse)

def get_kFoldCV_mae(df, features, predict_val, k_nn, k_cv):
  x_dict = df[features].to_dict(orient="records")
  y = df[predict_val]

  # specify the pipeline
  vec = DictVectorizer(sparse=False)
  scaler = StandardScaler()
  model = KNeighborsRegressor(n_neighbors=k_nn)
  pipeline = Pipeline([("vectorizer", vec), ("scaler", scaler), ("fit", model)])

  scores = cross_val_score(pipeline, x_dict, y, 
                         cv=k_cv, scoring="neg_mean_squared_error")
  mae = (np.abs(-scores)).mean()
  return mae
  
def get_kNN_mae(df, features, predict_val, k): # features is list of strings and predict_val is string
  x_train_dict = df[features].to_dict(orient="records")
  y_train = df[predict_val]

  vec = DictVectorizer(sparse=False)
  vec.fit(x_train_dict)
  x_train = vec.transform(x_train_dict)

  scaler = StandardScaler()
  scaler.fit(x_train)
  x_train_sc = scaler.transform(x_train)

  # Fit model
  model = KNeighborsRegressor(n_neighbors=k)
  model.fit(x_train_sc, y_train)

  # Find predictions
  y_train_pred = model.predict(x_train_sc)

  # Find mae
  mae = ((y_train - y_train_pred).abs()).mean()
  return mae

features = ['Home Team', 'Away Team', 'Playoff or No']
predict_val = 'Net Correct Calls'

est_test_error(df_ncc, features, predict_val, 100, 10)

k_val = np.arange(1, 50)
train = df_ncc.sample(frac=.5)
knn_train_mae_list = [get_kNN_mae(train, features, predict_val, k) for k in k_val]
knn_val_mae_list = [est_test_error(df_ncc, features, predict_val, k, 50)[0] for k in k_val] 

#kFold_train_mae_list = [get_kFoldCV_mae(train, features, predict_val, k, 5) for k in k_val]
#kFold_val_mae_list = [get_kFoldCV_mae(df_ncc, features, predict_val, k, 5) for k in k_val]

plt.plot(k_val, knn_val_mae_list, label = 'kNN val error')
plt.plot(k_val, knn_train_mae_list, label = 'kNN train error')
#plt.plot(k_val, kFold_train_mae_list, label = '5 Fold train error')
#plt.plot(k_val, kFold_val_mae_list, label = '5 Fold val error')

plt.xlabel('k Value')
plt.ylabel('MAE')
plt.legend()
Out[253]:
<matplotlib.legend.Legend at 0x7fadeffb80a0>

As we can see in our graph for varying $k$ values, our prediction was correct in the sense that our constructed model was not very good at prediction. There seems to be a relatively high error. The major flaw is that, as we have seen in our analysis so far, the variables we used in our prediction model do not seem to be correlated to any information about the number of foul calls.

Conclusion

Our analysis has shown that the commonly held beliefs listed above about NBA referees do not have enough statistical evidence to be believed. There seems to be no evidence of any relation between stardom and favorable treatment; there seems to be no evidence of existence of bad referees; there seems to be no evidence of a significant advantage given to home teams; and there seems to be no evidence of referees calling less fouls in the Playoffs compared to the Regular Season.