I'll try my best to get this back on track..
Quote:
Originally Posted by Czech Your Math
I ran a linear regression for '80 to '12 using data that I already had, as follows:
Y = avg. adjusted scoring of top N players (N = # teams in NHL)
Xn = Number of teams in NHL
Xg = Avg. GPG in NHL
Xe = % of top N forwards who were born outside Canada (Canadian trained players from Europe, such as Heatley & Nolan were considered Canadian)
Xp = % of total goals recorded as special teams (PP & SH) goals
Using all 4 variables, the Rsquared was 99.8% and the values for each X were as follows:
Xn= 1.05
Xg= 6.77
Xe= 16.4
Xp= 49.4
Using 3 variables (Xn excluded), the Rsquared was 99.7% and the values of each X were as follows:
Xg= 7.83
Xe= 39.5
Xp= 92.8
Both appear to be very solid models for predicting the avg. adjusted scoring of the top N players each season. The average for the 32 seasons was 88.95 adj. points with a standard deviation of 3.59. With 4 variables, the predicted Y had a mean of 88.87 with the avg. absolute value of the error being 3.13, and 21/32 seasons had errors of < 1 stdev. With 3 variables, the predicted Y had a mean of 88.71 with the avg. absolute value of the error being 3.86, and 18/32 seasons had errors of < 1 stdev.
It's important to note that in both models there was a positive coefficient for Xg (league GPG), meaning that as league scoring decreased, the model predicted avg. adj. points of the top N players to decrease as well (by ~78 points per 1.0 point drop in league scoring).
(...)
For those who understand this type of study, I certainly welcome comments, suggestions and even followup studies which may expand, improve or verify the results. This is what I meant by identifying, analyzing and quantifying various factors that may affect the difficulty of top level players to score adjusted points in various seasons. It can be done, and I have taken a step in that direction. I look forward to others taking further steps forward, instead of steps backward using improper analysis and/or pure speculation.

You report coefficients on each regressor but it's hard to really make sense of the results without the tstatistics. Given the insignificant drop in Rsquared when you drop Xn I would assume that 1.05 coefficient is insignificant but I'd like to see the others.
I don't think you can make a judgement on how solid those models are for predicting anything based on Rsquared, as that Y series is probably fairly stable. If you regressed Y on a constant you'd get a pretty high Rsquared too.
My main takeaway:
Y is adjusted to 6 GPG (HR method), right? If your regression used all the players in the league, by definition you'd get Xg=0, because that's what the adjustment does. You have Xg>0 for the top 5% of players, that means the top guys are further away from the mean in highscoring seasons than in lowscoring seasons. The adjustment may not bring down the top guys enough in high scoring seasons.
Am I correct?