How do professional basketball players ever miss free throws? NBA players sign multi-million dollar deals every single year, and there are still dozens of players that shoot an underwhelming percentage from the free throw line. How in the world does this happen? Why don’t they just practice their free throws? Well, their inability to shoot free throws might be caused by something outside of practice. Today, I pose the question: is there a statistically significant relationship between an NBA player’s height and his free throw percentage? Let’s take a look.

Data

I found a dataset on kaggle that contained the aggregate individual stats for 67 NBA seasons. The data was scraped from Basketball Reference. The data includes basic box-score attributes such as points, assists, rebounds, steals , etc. as well as the player’s height, weight, position and other details like the college they attended. This data was split up into three csv files so this will require some data filtering and merging of dataframes. One limitation of the dataset is that there are some extreme outliers that I will have to be careful of in my analysis. These outliers are players that have a free throw percentage of either 0% or 100%, just because they did not take that many free throws over the course of their career, if any.

Methodology

For this data I am going to be looking at a linear regression. Because we are testing for statistically significant relationship between two quantitative variables, our null hypothesis is that there is no relationship between a player’s height and his free throw percentage. Thus, our alternative hypothesis is that there is a statistically significant relationship between a player’s height and his free throw percentage.

The first bit of feature engineering I had to do was when I noticed that if a player was traded during a season, there were rows for all the teams he played on for that season as well as a row for their total stats for that season. However, if a player was not traded, they did not have this total stats row. I elected to drop the rows that were the totals for traded players. After that I filtered the Season_Stats data frame for years from 1976 onwards, this is because that is the year the ABA merged with the NBA. It’s common practice in the NBA today to have analytics starting from 1976. I further filtered the dataframe to only include the columns that I was looking at like their free throws attempted and free throws made. I filtered this down because I knew I had to merge this season data with the data from another dataframe that had the players’ heights. Before merging the dataframes I had to group by the player and find the sum of the his free throws attempted and free throws made for his entire career so that I could calculate his career free throw percentage. Once I calculated the career free throw percentage, I checked for and dropped any null values that were there from players that did not attempt any free throws over the course of his career. The last step was to merge those dataframes together so I had the player’s height and career free throw percentage in the same dataframe so I could perform my analysis. I calculated the correlation between these two variables by calculating the least squares regression line for the data.

Results

The results of my analysis showed that height was negatively corelated with free throw percentage with a correlation coefficient of -0.3789. I also had a p value of 0.000, which would be less than a significance level of both 0.05 and 0.01. Thus, we can reject the null hypothesis in favor of the alternative hypothesis that there is a statistical relationship between an NBA player’s height and free throw percentage. However, my R^2 value was only 0.093, meaning that only 9.3% of the variability of free throw percentage is explained by height. In an effort to see if there are any other factors at play that might improve the model’s predictive ability, I calculated the least squares regression line for the data, this time including a player’s position. I found that my adjusted R^2 value did increase to 0.134. Thus, after taking into account a player’s height, player position is statistically significantly associated with free throw percentage.

image.png

Conclusion

Based on this anlysis, I have found my answer that there is a statistical relationship between an NBA player’s height and his free throw percentage, and that even more of the variance can be explained by the player’s position after taking the player’s height into account. Not surprisingly, the shorter players that tend to shoot more like point guards and shooting guards tended to have higher free throw percentages, and taller players that tend to play their game in the paint like power forwards and centers had lower free throw percentages. But still only 13.4% of the variability of free throw percentage is explained by a player’s position after taking his height into account. That is a lot of variability left unexplained.

image.png

Is it as simple as the shorter players that play the positions that require more shooting just practice more? They’re more used to shooting than dunking, whereas the taller guys don’t need to practice that as much? How many more games would your team win if everyone was making those free throws, those free points! Think about all the games that are won or lost by one point, finding the answer to this could be the difference of a championship parade, or a long airplane ride home.