View Single Post
11-19-2012, 07:52 AM
Registered User
Join Date: Apr 2007
Posts: 2,383
vCash: 500
Originally Posted by Czech Your Math View Post
I added one more variable. This variable is intended to capture some of the effect of "offensive powerhouses". A player like Orr, Gretzky or Lemieux may elevate the point totals of teammate(s), which would increase Y. A lack of parity in the league may also aid this process, but much of this should be captured in the parity variables (Xf & Xa). The variable is Xt and is defined as follows: the GF for the top 2N teams are added (for 21 teams, the top 4 teams, plus .2 * 5th team) and divided by 2N. This number is divided by league avg. GF, and one is subtracted from the result (this is the ratio to avg. by which the "powerhouses" differed from avg.). This result is then divided by Xf to scale the result based on parity (I thought this should help separate the variable from Xf and prevent much of the potential overlapping).
I'm not sure how to interpret those 3 extra variables. The R-square doesn't increase all that much when you add Xt but the coefficients on the main variables are all over the place. What's your rationale for a link between the concepts "parity" and "top player scoring"? Define that hypothesis, and then pick a construct that you think represents it best -- 3 variables makes it too tough to interpret IMO.

As for your Xt variable definition, I'm assuming you mean "the GF for the top N/5 teams"? "top 2N" would mean the top 42 teams for a 21-team league, doesn't make sense..

Originally Posted by Czech Your Math View Post
I still believe one of the most important missing "variables" is the presence or absence of certain great players at different times. For instance, how does one measure the fact that Ovechkin, Malkin, Crosby, and Thornton were all mostly healthy and at/near the top of their game in 2008... but in 2011, these players were mostly injured and/or off their games? It seems that incorporating discrete variables is more difficult than I initially thought, so I'm not sure how to measure this aspect of each season.
How about a variable for % of games missed by the people that were included in Y the previous year? or % of games missed by the top x% of people with the highest points/game?

Originally Posted by Czech Your Math
There is a large cross-correlation between many of the variables:

Xn & Xe = 88% ... did NHL expand in response to Euro influx? I think this is largely coincidental.
Xn & Xg = (83%) ... did expansion make goal scoring decrease? This wouldn't be the expected observation IMO.
Xn & Xp = 45% ... I don't see a logical relationship between Euro influx and increased power plays, but it's possible.
Xe & Xg = (79%) ... did Euro influx cause goal scoring to decrease? Considering Euros were disproportionately scoring forwards, this seems odd, although talent compression tends to decrease scoring IMO.
Xe & Xp = 50% ... did Euro influx cause an increase in power plays? I don't see why, esp. as it contradicts other correlation(s).
Xg & Xp = (18%)... not much of a correlation, but why would power plays and goal scoring be negatively correlated?

BTW, in case I wasn't clear before, Y = avg. of top N players' adjusted points. This means Y is per 82 games, adjusted to 6.00 gpg league avg., and assist/goal ratio of 5/3.

My main concerns with this model ares that it appears there may be important variables missing (given the low R^2) and that the variables are mostly cross-correlated. I don't think roster size is an issue during this period. What other variables may be missing? I think the cross-correlation of variables is largely coincidental, but can the variables be better defined to prevent this?
The 'source' of the correlation ('coincidental' or otherwise) doesn't really matter -- as I previously wrote, multicollinearity makes the coefficients unstable and sensitive to the variables you include next: look at what happened with Mn, Mg and Me when you added Xt as a 7th regressor.

Let me suggest: (-> means leads to)
90s expansion -> the trap -> decreased scoring
can't trap on special teams -> increased % of PP scoring (vs total scoring)

If you don't like "the trap" you can substitute with "video analysis" or something. That would explain all correlations except those involving Euros. Then,

salary increases (late 80s) -> Euro influx
fall of the Iron Curtain (89-91) -> Euro influx

You can't really operationalize all this stuff into regressions but if you used data points from 1960 onward (therefore including the first expansions) all those correlations would decrease significantly (I would assume early-mid 70s expansion and the addition of the WHA teams led to increased scoring in the NHL, which would reverse that negative Xn/Xg relationship).

barneyg is offline   Reply With Quote