View Single Post
11-19-2012, 02:14 PM
#15
Registered User

Join Date: Jan 2006
Location: bohemia
Country:
Posts: 4,845
vCash: 500
Quote:
 Originally Posted by barneyg If I understand correctly, your model is Y = b1*X(s1980) + ... + b32*X(s2011) + b33*X(a18) + ... + b55*X(a40) + b56*X(p1) + ... b67*X(p12) where X(s1980)...X(s2011) are season dummy variables (also called indicator variables), X(a18)...X(a40) are player age dummies, and X(p1)...X(p12) are player (name) dummies. I'm not sure why you want those player dummies in there (p1..p12).
Yes, you understand that model, which I was trying to build.

The reason for the p (player) dummy variables was to filter out the effect of the player's general value/skill, in order to produce accurate coefficients for the other variables (age, season).

Each observation of Y is one player's season. So '80 Gretzky would be: 117 = b1*X(s1980) + b34*X(a19) + b56*X(p1), where p1=Gretzky. I was hoping to simultaneously measure the effects of age and different seasons on top players' scoring, so that it could be applied to any top player in any season.

Quote:
 Originally Posted by barneyg But to get back to your question, it's not a question of degrees of freedom. The regressors in your model must be linearly independent, and they aren't. For example, right now for every player you have X(s2011) = 1 - X(s1980) - X(s1981) - .... - X(s2010) i.e. the sum of those 32 dummy variables is 1... same thing for the other 2 types, the sum of all variables of the same type for a given player is 1. A simple solution is to drop one of the dummies for each type, ie. drop X(s1980), X(a18), and X(p1). You will still get an error for some age coefficients if your sample doesn't have anyone playing up to age 40 or as early as 18 but the rest of the model should work.
Thanks, this is very insightful. You are right, the variables within each category are not independent, since only one dummy variable in each category will have value = 1, and the rest 0, and so always sum to 1.

If I understand you correctly, you suggest that eliminating one dummy variable in each category will solve that problem, but I don't see how that would change the fact that the variables within each category are not independent. BTW, in the simplified (small scale) model, I still had 3+ observations for each dummy variable. IOW, there were at least 3 observations at each age, at least 3 for each season, and at least 3 for each player.

It sounds to me like this type of model just isn't possible.