View Single Post
11-05-2012, 04:07 PM
Czech Your Math
Registered User
Czech Your Math's Avatar
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 4,846
vCash: 500
I ran a linear regression for '80 to '12 using data that I already had, as follows:

Y = avg. adjusted scoring of top N players (N = # teams in NHL)

Xn = Number of teams in NHL

Xg = Avg. GPG in NHL

Xe = % of top N forwards who were born outside Canada (Canadian trained players from Europe, such as Heatley & Nolan were considered Canadian)

Xp = % of total goals recorded as special teams (PP & SH) goals

Using all 4 variables, the R-squared was 99.8% and the values for each X were as follows:

Xn= 1.05
Xg= 6.77
Xe= 16.4
Xp= 49.4

Using 3 variables (Xn excluded), the R-squared was 99.7% and the values of each X were as follows:

Xg= 7.83
Xe= 39.5
Xp= 92.8

Both appear to be very solid models for predicting the avg. adjusted scoring of the top N players each season. The average for the 32 seasons was 88.95 adj. points with a standard deviation of 3.59. With 4 variables, the predicted Y had a mean of 88.87 with the avg. absolute value of the error being 3.13, and 21/32 seasons had errors of < 1 stdev. With 3 variables, the predicted Y had a mean of 88.71 with the avg. absolute value of the error being 3.86, and 18/32 seasons had errors of < 1 stdev.

It's important to note that in both models there was a positive coefficient for Xg (league GPG), meaning that as league scoring decreased, the model predicted avg. adj. points of the top N players to decrease as well (by ~7-8 points per 1.0 point drop in league scoring).

I did this rather quickly, since I was using data readily available to me. One small flaw is that Xe measures % non-Canadian forwards in top N forwards in points, rather than % non-Canadians in top N players. It would be better if this variable and Y were aligned completely, but given the relatively few defensemen who appear in the top N in scoring, I highly doubt there would be a major effect on the results. If anything, properly aligning Y & Xe may only strengthen the relationship between them, since in more recent years when avg. adj. scoring of top N players has increased, there have been substantially more Euro/US d-men (Lidstrom, Leetch, Zubov, Gonchar, etc.). Still, the actual % of the top N is quite small, so I expect any distortions were relatively minor.

For those who understand this type of study, I certainly welcome comments, suggestions and even follow-up studies which may expand, improve or verify the results. This is what I meant by identifying, analyzing and quantifying various factors that may affect the difficulty of top level players to score adjusted points in various seasons. It can be done, and I have taken a step in that direction. I look forward to others taking further steps forward, instead of steps backward using improper analysis and/or pure speculation.

Last edited by Czech Your Math: 11-06-2012 at 12:34 PM. Reason: spelling
Czech Your Math is offline   Reply With Quote