Using Regression to Make Fantasy Picks
1 Attachment(s)
With the new season being announced, so too has the fantasy season begun.
I went and found the career stats, broken down by season, for each of the top 150 or so skaters (no goalies). I've been trying to run a regression analysis to figure out which picks will be the best ones. I've been measuring the players goals, assists and power play points (the only skater stat categories my league measures) against the adjusted stats for each category. As you can see in the attachment, I've been getting an R^2 value of 0.90 to 0.99. I feel that this is way too high, but it makes sense due to the direct correlation of the raw stat to the adjusted stat. I am uncertain of if I am measuring the correct variables, and if not, which values I should change. Thanks 
Can you elaborate on what your dependent variables and independent variables are?
Alternatively, how do you calculate column I? I'd expect your R^2 to be high, since you're doing a concurrent analysis (as far as I can tell). 
My dependent variables are the following:
$ESG  Even strength goals, adjusted to a league scoring level of 200 ESG per team. $ESA  Even strength assists, adjusted to a league scoring level of 200 ESG per team $PPP  Power play points, adjusted to a league scoring level of 70 PPG per team and a leagueaverage number of power play opportunities. My independent variables are Even strength goals, even strength assists and power play points. 
Without knowing more about what you're doing, I would expect a high R^2, since you seem to be (essentially) translating what actually happened and using it to predict the same season.
So I guess the followup question is this  how does this help you to predict who's going to do well in 2013? 
That's really the issue I'm running into, is that I'm not certain how it will help.
I really am uncertain of which stats I would measure in order to get a better representation of future performance. 
What I'd recommend  if you prefer this style approach  is use season N variables (whichever you think might work, probably including age) as the independent variables, and use season N+1 goals/assists/points as the dependent variables.
And the solution for not knowing which variables will work? Try lots of things  that's half of the fun :) 
I included age as my dependent variable, and the graphs I get really don't show me too much.
I can send them to you, maybe I'm doing something wrong. Not too sure how I can use this data to help look forward. Would I want to analyze each individual player, and extrapolate from there? Or just look at groups of players and see which are the standouts? 
Quote:
What categories are counted in your fantasy league? I tend to think this is sort of like building a laser device to measure shoe size... much more complicated than it needs to be. Probably you should focus on situations where you expect players to outperform their past seasons, due to being at peak age, getting increased playing time, or having better linemates than in the past. Remember that goal scorers tend to peak earlier (many in their early 20s and most by their mid 20s) than playmakers. 
Quote:
It's goals, assists, powerplay points, and hits. And it's an 8 man league. What I've figured I'll do is sort it by each 20 players or so of similar ranking, giving me a better look at which players to expect to look for draft round per draft round. Quote:

It would be limited, but you could control for age (say 2230), take a pergame average of the past three seasons per player and then look at the delta between last season and that average. You'd need to do some weeding out for injuries, etc. but for the most part you'd have a decent time finding guys who underperformed comapred to expectations last season, who would then be "buy low" candidates on the draft board.
You could also go to behindthenet.ca and just look for guys above a certain TOI/game who have a low PDO. Beyond looking in terms of value based on variance, I don't see a particularly good way of using statistical analysis to find good picks. Everything else is a dog's breakfast of assumptions about how the player will be used the coming season, who their linemates may be, their ozone starts, etc. 
Quote:
Y = B0 + M1X1 + M2X2 + ... where X1 is T1, X2 is T2 I.e., Y could be goals in 2012, X1 is goals in 2011, X2 is goals in 2010, etc., all for the same player. Another Y would be goals in 2011, with X1 then being goals in 2010, X2 goals in 2009, etc. You can try different combos, but I would guess doing separate studies for each category would work best. Rather than just use raw goals, using adjusted GPG is probably going to yield more useful coefficients (otherwise variability in games may affect them as much or more than skill level). I did a quick, simple study as an example, using (when possible) Y seasons of 20082012 for each of several players (Crosby, Malkin, Ovechkin, Stamkos, St. Louis, H.Sedin, Kovalchuk, Thornton, and Iginla): Adjusted Total GPG: Y = .128 + .406*X1 + .33*X2 Adjusted Total APG: Y = .292 + .531*X1 + .078*X2 (Y = pergame metric in Year T, X1 = same metric in Year T1, X2 = same metric in Year T2). The further back you lag the series, the more observations you lose, and the more likely it is that those further lagged variables will be insignificant. Also the Yintercept (e.g. .128 for GPG in the above example) is going to vary with skill level, so you may have to either group players by general skill level in each category, or not use a Yintercept. For GPG, lagged independent variables such as shots/game or Sh% might also be useful. 
I did something like this for an econometrics project once. I found sh% to be a very, very effective variable..

Quote:
Here's one hint: let someone who doesn't know any better draft Radim Vrbata 
Quote:
I did go ahead though and sort players into groups of similar skill level/draft position. The groups range in size from 48 players. I also have a draft strategy, where each round I am picking from a corresponding group. I have 6 groups for forwards, 2 for defensemen and 2 for goalies. This strategy gives me a good grasp of which players to look out for, and when. The only really big issue is that they aren't organized by position, so I have to be monitoring that on the fly so that I don't draft 4 centremen and only 1 winger (our league has 2 positions of each C, LW, RW, 4 D, 2 Goalies and 1 Utility spot). All in all I feel comfortable and confident going into this draft. 
Quote:
For each player included in the study, do the following: Let's use Ovechkin's 7 seasons from '06'12 as an example. Calculate the Y variable in the manner you believe is most reliable for your study. For goals, I might suggest using "adjusted GPG." Once you have calculated this, it will be your dependent (Y) variable and should be in one column. So Ovechkin's adjusted GPG for seasons '06'12 might be in cells C1C7. Next, label your independent (X) variables, which will be time lagged from your Y variable. You might label them T1, T2, etc. You would then copy and paste cells C1C6 into cells D2D7 for variable T1. You don't copy cell C7, because that would be his adjusted GPG for 2012, and couldn't be used until at least 2013. You don't copy anything into cell D1, because he has no data before 2006. For variable T2, you would copy cells C1C5 (or D2D6) into cells E3E7. Again, for each season you lag, you would lose one observation (i.e., if only using T1, then lose Y for 2006... if using T2 also, then also lose Y for 2007). You don't want any gaps in your X variables (e.g., having a T1, but no T2), as this will affect your results. If you only used T1 & T2 as X variables, and found T2 to be an insignificant variable, then you would recaulculate the regression only using T1. Hopes this makes some sense and may be useful to someone. 
Quote:

Quote:
The main use I find is to spot inconsistencies between your data, predraft rankings and actual draft order (i.e. in real time if you can organise your spreadsheet to allow you to do so). This hopefully allows you to pick up a few 'bargains'. 
I'm not a real sophisticated stats guy but what about using your data for the years 11,10 and 9 and running a "projection" for 12?
At the very least you could find out how close to being accurate it might be and you could do so going back as far as we have stats for as well. 
All times are GMT 5. The time now is 05:03 AM. 
vBulletin Copyright ©2000  2015, Jelsoft Enterprises Ltd.
HFBoards.com, A property of CraveOnline, a division of AtomicOnline LLC ©2009 CraveOnline Media, LLC. All Rights Reserved.