Using Regression to Adjust "Adjusted Points" for Top Tier Players '68-12
View Single Post
11-09-2012, 01:17 PM
Join Date: Apr 2007
I'll try my best to get this back on track..
Originally Posted by
Czech Your Math
I ran a linear regression for '80 to '12 using data that I already had, as follows:
Y = avg. adjusted scoring of top N players (N = # teams in NHL)
Xn = Number of teams in NHL
Xg = Avg. GPG in NHL
Xe = % of top N forwards who were born outside Canada (Canadian trained players from Europe, such as Heatley & Nolan were considered Canadian)
Xp = % of total goals recorded as special teams (PP & SH) goals
Using all 4 variables, the R-squared was 99.8% and the values for each X were as follows:
Using 3 variables (Xn excluded), the R-squared was 99.7% and the values of each X were as follows:
Both appear to be very solid models for predicting the avg. adjusted scoring of the top N players each season. The average for the 32 seasons was 88.95 adj. points with a standard deviation of 3.59. With 4 variables, the predicted Y had a mean of 88.87 with the avg. absolute value of the error being 3.13, and 21/32 seasons had errors of < 1 stdev. With 3 variables, the predicted Y had a mean of 88.71 with the avg. absolute value of the error being 3.86, and 18/32 seasons had errors of < 1 stdev.
It's important to note that in both models there was a positive coefficient for Xg (league GPG), meaning that as league scoring decreased, the model predicted avg. adj. points of the top N players to decrease as well (by ~7-8 points per 1.0 point drop in league scoring).
For those who understand this type of study, I certainly welcome comments, suggestions and even follow-up studies which may expand, improve or verify the results. This is what I meant by identifying, analyzing and quantifying various factors that may affect the difficulty of top level players to score adjusted points in various seasons. It can be done, and I have taken a step in that direction. I look forward to others taking further steps forward, instead of steps backward using improper analysis and/or pure speculation.
You report coefficients on each regressor but it's hard to really make sense of the results without the t-statistics. Given the insignificant drop in R-squared when you drop Xn I would assume that 1.05 coefficient is insignificant but I'd like to see the others.
I don't think you can make a judgement on how solid those models are for predicting anything based on R-squared, as that Y series is probably fairly stable. If you regressed Y on a constant you'd get a pretty high R-squared too.
My main takeaway:
Y is adjusted to 6 GPG (HR method), right? If your regression used all the players in the league, by definition you'd get Xg=0, because that's what the adjustment does. You have Xg>0 for the top 5% of players, that means the top guys are further away from the mean in high-scoring seasons than in low-scoring seasons. The adjustment may not bring down the top guys enough in high scoring seasons.
Am I correct?
View Public Profile
Find More Posts by barneyg