Quote:
Originally Posted by Czech Your Math
With Xt, I was trying to capture the "teammate/linemate" effect in an objective manner, as well as how easy it was for teams to become offensive juggernauts (even after factoring out general parity). How does one factor in the fact that Gretzky or Lemieux, e.g., may have significantly elevated one or more teammates' point totals? Also, how does one factor in the presence of phenoms such as Gretzky and Lemieux in the league? I'm just not sure how to capture these effects objectively. Eliminating such "outliers" from the Y population is changing the Y population subjectively... and where do you draw the line and stop eliminating "outliers"?... and what about their teammates? This is one reason I was trying to build the model with the dummy variables, because it would bypass this problem.
My hunch is that the "Gretzky, Lemieux & friends" effect is going to be much larger than the injury factor. It's really difficult to measure why Ovechkin went from top of the league to just another very good scorer (and the effects on Backstrom, Semin, etc.). He wasn't injured, he's still in his prime. I don't think it's easy to objectively measure that.

Your math problem (Gretzky's effect on Anderson, Lemieux's effect on Rob Brown..) is the antistats people complaint in a nutshell  "hockey's a team game, so individual stats have to be flawed and you can't predict them". My personal opinion is exactly the same if you replace "you can't" with "it's tough to".
Not sure your model with the dummy variables would achieve your objective per se  the Gretzky dummy would only be equal to 1 for observations pertaining to Gretzky... Coffey would only have the Coffey dummy.. Kurri, the Kurri dummy... if Anderson had been freeriding on all those guys you still wouldn't catch it.
You're basically looking for a "quality of teammates" measure.. instead of going all the way into a different model with thousands of observations (every player, every year), couldn't you come up with a measure of how "concentrated" Y is? i.e. Y is adjusted scoring of top N players, what is the % of those N players that play on the top X teams? i.e. if in 198384 you have 5 Oilers and 3 Bruins out of the top 21 players, that's a "top 2 team" concentration of 8/21 = 38%. Of maybe some measure of distance between the very top player (Gretz 205) and the best player not on the Oilers (Goulet 121)..
Quote:
Originally Posted by Czech Your Math
The problem is that a lot of these changes happened in a short time: increased PPOs starting in the late 80s... salaries are everincreasing... fall of Iron Curtain in early 90s... expansion beginning in early 90s and continuing during decade... large Euro/Russian influx in early & mid 90s... scoring decreasing in mid90s... increased parity from the mid90s... increased use of defensive systems and better/larger goalie equipment, etc.
Some of these would be expected to show substantial correlation and do. Some would be expected, but show a much smaller correlation or even one opposite in direction to that expected. Some wouldn't be expected to have a large correlation, yet do. It's not easy to objectively determine how these changes are influencing each other, even if we agree that the effect on the model is to make it more sensitive to additional variables.

I would rephrase the bolded as: I don't really care how these changes influenced each other, all I want is to find a way to make the model more stable with respect to the implied relationship between each of those variables and adjusted scoring.
I think going back in time mitigates this problem. If that means dropping PPG or PPO because of data availability, so be it  you can always compare the other coefficients for (say) 19502010 with the model for 19642010 which will include it.
R^2 went from 0.32 to 0.56, is it due to the addition of 19681976 or those parity indicators? I would assume both.