Quote:
Originally Posted by Czech Your Math
Using all 4 variables, the Rsquared was 99.8% and the values for each X were as follows:
Xn= 1.05
Xg= 6.77
Xe= 16.4
Xp= 49.4
Using 3 variables (Xn excluded), the Rsquared was 99.7% and the values of each X were as follows:
Xg= 7.83
Xe= 39.5
Xp= 92.8

Let me just explain this process to those unfamiliar with it. Each model calculates coefficient values for each variable, which together produce the least total error (actually it's a sum of the square of each error). The equation for the first (4 variable) model is:
Y = 1.05*Xn + 6.77*Xg + 16.4*Xe + 49.4*Xp
or
Avg. Adj. Pts. of Top N Players = (1.05 * # Teams) + (6.77 * League Avg. GPG) + (16.4 * Ratio of nonCanadian Top N to Total Top N) + (49.4 * Ratio of PP & SH Goals to Total Goals)
In the second (3 variable) model, the coefficient values of Xe and Xp increase dramatically, as a result of Xn being excluded. This is because in the first model, Xn captured a lot of the effect present in Xe and Xp. IOW, in most of the same seasons where there was an increase in teams (due to expansion), there was also an increased representation by Euro/US players in the top N scorers, and an increased number of PP opportunities. I would think that a lot of the effects causes by increased nonCanadian players and increased PP opportunities was mistakenly attributed to the increase in the number of teams, because they each had increased values in most of the same seasons.
Here's what the second model would predict:
A) For each .10 increase in league gpg, a .78 increase in avg. adj. pts. of top N scorers
B) For each 10 percentage point increase in top N forwards which were nonCanadian (e.g. from 20% to 30%), a 3.95 increase in avg. adj. pts. of top N scorers
C) For each 1.0 percentage point increase (e.g. from 22% to 23%) in PP/SH goals as a % of total goals, a .49 increase in the avg. adj. pts. of top N scorers
There may be some small rounding or other errors present in each of the variables, but these shouldn't significantly affect the results. There are some alternative models that could be studied, but I would guess the most interesting modifications would be to the quality of the Y variable (using different quality or quantity of tiers), rather than to the X variables (I can't think of many other important X variables, except maybe a variable that measures parity between teams and/or players).