I ran another regression from 19682012, and this time included two variables for parity: Xf & Xa, which are the standard deviations for each of team GF & GA, divided by the mean of team GF & GA.
The new results are:
B = 82
Mn = (0.44)
Mf = 87
Ma = (18)
Me = 8.1
Mg = (.74)
Mp = 42
R^2 = .56
All variables appear significant, with Xa having the lowest tscore of ~4.8 (Ma/SEa = .73, N^.5 = 6.6).
I think this model holds a lot of promise, with R^2 > .5, all variables significant, and I thought this was interesting as well:
Standard deviation of Y (avg. adjusted points of top N players) was 3.86, and only 3/44 predicted values of Y varied from the actual value by more than this (the highest deviation was ~1.6 std dev). Each of those three predicted values was lower than the actual value, possibly in part due to some of the best players being in the league and having strong seasons (Orr, Espo, etc. in '72... Lemieux, Jagr, etc. in '96 & '97).
Some of the varaibles may improve with further refinement. It may be useful to define a variable that will somehow capture the effect of having so much top talent in the league, but not exactly sure what the best and fairest way to do that might be. Any suggestions welcome.
For those not familiar with regression, this is what the model suggests at this stage (Y is avg. adjusted points of top N players, where N = number of teams in league):
For each additional team, Y decreases by .44
For each 1 % point increase in standard deviation of teams' GF, Y increases by .87
For each 1 % point increase in standard deviation of teams' GA, Y decreases by .18
For each 10 % point increase in % of nonCanadians in top N, Y increases by .81
For each .10 increase in league GPG, Y decreases by .07
For each 1 % point increase in special teams goals as % of total goals, Y increases by .42
Last edited by Czech Your Math: 11172012 at 12:22 AM.
