I ran a linear regression for '80 to '12 using data that I already had, as follows:
Y = avg. adjusted scoring of top N players (N = # teams in NHL)
Xn = Number of teams in NHL
Xg = Avg. GPG in NHL
Xe = % of top N forwards who were born outside Canada (Canadian trained players from Europe, such as Heatley & Nolan were considered Canadian)
Xp = % of total goals recorded as special teams (PP & SH) goals
Using all 4 variables, the Rsquared was 99.8% and the values for each X were as follows:
Xn= 1.05
Xg= 6.77
Xe= 16.4
Xp= 49.4
Using 3 variables (Xn excluded), the Rsquared was 99.7% and the values of each X were as follows:
Xg= 7.83
Xe= 39.5
Xp= 92.8
Both appear to be very solid models for predicting the avg. adjusted scoring of the top N players each season. The average for the 32 seasons was 88.95 adj. points with a standard deviation of 3.59. With 4 variables, the predicted Y had a mean of 88.87 with the avg. absolute value of the error being 3.13, and 21/32 seasons had errors of < 1 stdev. With 3 variables, the predicted Y had a mean of 88.71 with the avg. absolute value of the error being 3.86, and 18/32 seasons had errors of < 1 stdev.
It's important to note that in both models there was a positive coefficient for Xg (league GPG), meaning that as league scoring decreased, the model predicted avg. adj. points of the top N players to decrease as well (by ~78 points per 1.0 point drop in league scoring).
I did this rather quickly, since I was using data readily available to me. One small flaw is that Xe measures % nonCanadian forwards in top N forwards in points, rather than % nonCanadians in top N players. It would be better if this variable and Y were aligned completely, but given the relatively few defensemen who appear in the top N in scoring, I highly doubt there would be a major effect on the results. If anything, properly aligning Y & Xe may only strengthen the relationship between them, since in more recent years when avg. adj. scoring of top N players has increased, there have been substantially more Euro/US dmen (Lidstrom, Leetch, Zubov, Gonchar, etc.). Still, the actual % of the top N is quite small, so I expect any distortions were relatively minor.
For those who understand this type of study, I certainly welcome comments, suggestions and even followup studies which may expand, improve or verify the results. This is what I meant by identifying, analyzing and quantifying various factors that may affect the difficulty of top level players to score adjusted points in various seasons. It can be done, and I have taken a step in that direction. I look forward to others taking further steps forward, instead of steps backward using improper analysis and/or pure speculation.
Last edited by Czech Your Math: 11062012 at 01:34 PM.
Reason: spelling
