HFBoards

Go Back   HFBoards > General Hockey Discussion > By The Numbers
Mobile Hockey's Future Become a Sponsor Site Rules Support Forum vBookie Page 2
By The Numbers Hockey Analytics... the Final Frontier. Explore strange new worlds, to seek out new algorithms, to boldly go where no one has gone before.

Using Regression to Make Fantasy Picks

Reply
 
Thread Tools
Old
01-10-2013, 01:33 PM
  #1
footitt
Registered User
 
Join Date: Jan 2013
Posts: 7
vCash: 500
Using Regression to Make Fantasy Picks

With the new season being announced, so too has the fantasy season begun.

I went and found the career stats, broken down by season, for each of the top 150 or so skaters (no goalies). I've been trying to run a regression analysis to figure out which picks will be the best ones.

I've been measuring the players goals, assists and power play points (the only skater stat categories my league measures) against the adjusted stats for each category.

As you can see in the attachment, I've been getting an R^2 value of 0.90 to 0.99. I feel that this is way too high, but it makes sense due to the direct correlation of the raw stat to the adjusted stat.

I am uncertain of if I am measuring the correct variables, and if not, which values I should change.

Thanks
Attached Files
File Type: xlsx forwardsdefensemen.xlsx‎ (194.1 KB, 55 views)

footitt is offline   Reply With Quote
Old
01-10-2013, 01:42 PM
  #2
Doctor No
Mod Supervisor
Retired?
 
Doctor No's Avatar
 
Join Date: Sep 2005
Posts: 24,354
vCash: 500
Can you elaborate on what your dependent variables and independent variables are?

Alternatively, how do you calculate column I?

I'd expect your R^2 to be high, since you're doing a concurrent analysis (as far as I can tell).

Doctor No is offline   Reply With Quote
Old
01-10-2013, 01:46 PM
  #3
footitt
Registered User
 
Join Date: Jan 2013
Posts: 7
vCash: 500
My dependent variables are the following:

$ESG - Even strength goals, adjusted to a league scoring level of 200 ESG per team.
$ESA - Even strength assists, adjusted to a league scoring level of 200 ESG per team
$PPP - Power play points, adjusted to a league scoring level of 70 PPG per team and a league-average number of power play opportunities.

My independent variables are Even strength goals, even strength assists and power play points.

footitt is offline   Reply With Quote
Old
01-10-2013, 06:59 PM
  #4
Doctor No
Mod Supervisor
Retired?
 
Doctor No's Avatar
 
Join Date: Sep 2005
Posts: 24,354
vCash: 500
Without knowing more about what you're doing, I would expect a high R^2, since you seem to be (essentially) translating what actually happened and using it to predict the same season.

So I guess the follow-up question is this - how does this help you to predict who's going to do well in 2013?

Doctor No is offline   Reply With Quote
Old
01-10-2013, 10:59 PM
  #5
footitt
Registered User
 
Join Date: Jan 2013
Posts: 7
vCash: 500
That's really the issue I'm running into, is that I'm not certain how it will help.

I really am uncertain of which stats I would measure in order to get a better representation of future performance.

footitt is offline   Reply With Quote
Old
01-11-2013, 12:42 AM
  #6
Doctor No
Mod Supervisor
Retired?
 
Doctor No's Avatar
 
Join Date: Sep 2005
Posts: 24,354
vCash: 500
What I'd recommend - if you prefer this style approach - is use season N variables (whichever you think might work, probably including age) as the independent variables, and use season N+1 goals/assists/points as the dependent variables.

And the solution for not knowing which variables will work? Try lots of things - that's half of the fun

Doctor No is offline   Reply With Quote
Old
01-11-2013, 01:34 PM
  #7
footitt
Registered User
 
Join Date: Jan 2013
Posts: 7
vCash: 500
I included age as my dependent variable, and the graphs I get really don't show me too much.

I can send them to you, maybe I'm doing something wrong.

Not too sure how I can use this data to help look forward. Would I want to analyze each individual player, and extrapolate from there? Or just look at groups of players and see which are the standouts?

footitt is offline   Reply With Quote
Old
01-11-2013, 04:55 PM
  #8
Czech Your Math
Registered User
 
Czech Your Math's Avatar
 
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 3,721
vCash: 500
Quote:
Originally Posted by footitt View Post
I included age as my dependent variable, and the graphs I get really don't show me too much.

I can send them to you, maybe I'm doing something wrong.

Not too sure how I can use this data to help look forward. Would I want to analyze each individual player, and extrapolate from there? Or just look at groups of players and see which are the standouts?
If you include age, it should be an independent (X) variable.

What categories are counted in your fantasy league?

I tend to think this is sort of like building a laser device to measure shoe size... much more complicated than it needs to be. Probably you should focus on situations where you expect players to outperform their past seasons, due to being at peak age, getting increased playing time, or having better linemates than in the past. Remember that goal scorers tend to peak earlier (many in their early 20s and most by their mid 20s) than playmakers.

Czech Your Math is offline   Reply With Quote
Old
01-11-2013, 05:34 PM
  #9
footitt
Registered User
 
Join Date: Jan 2013
Posts: 7
vCash: 500
Quote:
Originally Posted by Czech Your Math View Post
If you include age, it should be an independent (X) variable.

What categories are counted in your fantasy league?
Yes, that's what I have it set as. The problem is when I have 150 players stats over the majority of their career, it doesn't tell me too much when I look at it all at once.

It's goals, assists, powerplay points, and hits. And it's an 8 man league.

What I've figured I'll do is sort it by each 20 players or so of similar ranking, giving me a better look at which players to expect to look for draft round per draft round.

Quote:
I tend to think this is sort of like building a laser device to measure shoe size... much more complicated than it needs to be. Probably you should focus on situations where you expect players to outperform their past seasons, due to being at peak age, getting increased playing time, or having better linemates than in the past. Remember that goal scorers tend to peak earlier (many in their early 20s and most by their mid 20s) than playmakers.
Ya I realize all of that. I just felt that there would be a way to better back up my intuition, and I feel that having some numbers behind it really adds to it.

footitt is offline   Reply With Quote
Old
01-11-2013, 06:20 PM
  #10
SmellOfVictory
Registered User
 
SmellOfVictory's Avatar
 
Join Date: Jun 2011
Posts: 5,186
vCash: 50
It would be limited, but you could control for age (say 22-30), take a per-game average of the past three seasons per player and then look at the delta between last season and that average. You'd need to do some weeding out for injuries, etc. but for the most part you'd have a decent time finding guys who underperformed comapred to expectations last season, who would then be "buy low" candidates on the draft board.

You could also go to behindthenet.ca and just look for guys above a certain TOI/game who have a low PDO. Beyond looking in terms of value based on variance, I don't see a particularly good way of using statistical analysis to find good picks. Everything else is a dog's breakfast of assumptions about how the player will be used the coming season, who their linemates may be, their ozone starts, etc.

SmellOfVictory is offline   Reply With Quote
Old
01-11-2013, 06:30 PM
  #11
Czech Your Math
Registered User
 
Czech Your Math's Avatar
 
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 3,721
vCash: 500
Quote:
Originally Posted by footitt View Post
Yes, that's what I have it set as. The problem is when I have 150 players stats over the majority of their career, it doesn't tell me too much when I look at it all at once.

It's goals, assists, powerplay points, and hits. And it's an 8 man league.

What I've figured I'll do is sort it by each 20 players or so of similar ranking, giving me a better look at which players to expect to look for draft round per draft round.

Ya I realize all of that. I just felt that there would be a way to better back up my intuition, and I feel that having some numbers behind it really adds to it.
Perhaps the most useful type of regression would be a time series. Basically, your dependent and independent variables are the same, except the independent variables are time lagged. For instance, for goals:

Y = B0 + M1X1 + M2X2 + ... where X1 is T-1, X2 is T-2

I.e., Y could be goals in 2012, X1 is goals in 2011, X2 is goals in 2010, etc., all for the same player. Another Y would be goals in 2011, with X1 then being goals in 2010, X2 goals in 2009, etc.

You can try different combos, but I would guess doing separate studies for each category would work best. Rather than just use raw goals, using adjusted GPG is probably going to yield more useful coefficients (otherwise variability in games may affect them as much or more than skill level).

I did a quick, simple study as an example, using (when possible) Y seasons of 2008-2012 for each of several players (Crosby, Malkin, Ovechkin, Stamkos, St. Louis, H.Sedin, Kovalchuk, Thornton, and Iginla):

Adjusted Total GPG: Y = .128 + .406*X1 + .33*X2
Adjusted Total APG: Y = .292 + .531*X1 + .078*X2

(Y = per-game metric in Year T, X1 = same metric in Year T-1, X2 = same metric in Year T-2).

The further back you lag the series, the more observations you lose, and the more likely it is that those further lagged variables will be insignificant. Also the Y-intercept (e.g. .128 for GPG in the above example) is going to vary with skill level, so you may have to either group players by general skill level in each category, or not use a Y-intercept.

For GPG, lagged independent variables such as shots/game or Sh% might also be useful.


Last edited by Czech Your Math: 01-11-2013 at 06:36 PM.
Czech Your Math is offline   Reply With Quote
Old
01-11-2013, 09:29 PM
  #12
Gibson Les Palms
Registered User
 
Gibson Les Palms's Avatar
 
Join Date: Jul 2010
Location: Calgary
Country: Canada
Posts: 10,755
vCash: 814
I did something like this for an econometrics project once. I found sh% to be a very, very effective variable..

Gibson Les Palms is offline   Reply With Quote
Old
01-11-2013, 09:40 PM
  #13
GKJ
Global Moderator
Entertainment
 
GKJ's Avatar
 
Join Date: Feb 2002
Location: Do not trade plz
Country: United States
Posts: 112,496
vCash: 50
Quote:
Originally Posted by Jesus Teemu View Post
I did something like this for an econometrics project once. I found sh% to be a very, very effective variable..
You can make a lot of sense fantasy-wise using shooting percentage.


Here's one hint: let someone who doesn't know any better draft Radim Vrbata

GKJ is offline   Reply With Quote
Old
01-12-2013, 10:56 AM
  #14
footitt
Registered User
 
Join Date: Jan 2013
Posts: 7
vCash: 500
Quote:
Originally Posted by Czech Your Math View Post
Perhaps the most useful type of regression would be a time series. Basically, your dependent and independent variables are the same, except the independent variables are time lagged. For instance, for goals:

Y = B0 + M1X1 + M2X2 + ... where X1 is T-1, X2 is T-2

I.e., Y could be goals in 2012, X1 is goals in 2011, X2 is goals in 2010, etc., all for the same player. Another Y would be goals in 2011, with X1 then being goals in 2010, X2 goals in 2009, etc.

You can try different combos, but I would guess doing separate studies for each category would work best. Rather than just use raw goals, using adjusted GPG is probably going to yield more useful coefficients (otherwise variability in games may affect them as much or more than skill level).

I did a quick, simple study as an example, using (when possible) Y seasons of 2008-2012 for each of several players (Crosby, Malkin, Ovechkin, Stamkos, St. Louis, H.Sedin, Kovalchuk, Thornton, and Iginla):

Adjusted Total GPG: Y = .128 + .406*X1 + .33*X2
Adjusted Total APG: Y = .292 + .531*X1 + .078*X2

(Y = per-game metric in Year T, X1 = same metric in Year T-1, X2 = same metric in Year T-2).

The further back you lag the series, the more observations you lose, and the more likely it is that those further lagged variables will be insignificant. Also the Y-intercept (e.g. .128 for GPG in the above example) is going to vary with skill level, so you may have to either group players by general skill level in each category, or not use a Y-intercept.

For GPG, lagged independent variables such as shots/game or Sh% might also be useful.
I understand the concept behind this, but I do not know how to execute this in excel. Given that my draft is in 4 hours, I don't think I'm going to have enough time to do this.

I did go ahead though and sort players into groups of similar skill level/draft position. The groups range in size from 4-8 players. I also have a draft strategy, where each round I am picking from a corresponding group. I have 6 groups for forwards, 2 for defensemen and 2 for goalies.

This strategy gives me a good grasp of which players to look out for, and when. The only really big issue is that they aren't organized by position, so I have to be monitoring that on the fly so that I don't draft 4 centremen and only 1 winger (our league has 2 positions of each C, LW, RW, 4 D, 2 Goalies and 1 Utility spot).

All in all I feel comfortable and confident going into this draft.

footitt is offline   Reply With Quote
Old
01-12-2013, 01:24 PM
  #15
Czech Your Math
Registered User
 
Czech Your Math's Avatar
 
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 3,721
vCash: 500
Quote:
Originally Posted by footitt View Post
I understand the concept behind this, but I do not know how to execute this in excel. Given that my draft is in 4 hours, I don't think I'm going to have enough time to do this.

I did go ahead though and sort players into groups of similar skill level/draft position. The groups range in size from 4-8 players. I also have a draft strategy, where each round I am picking from a corresponding group. I have 6 groups for forwards, 2 for defensemen and 2 for goalies.

This strategy gives me a good grasp of which players to look out for, and when. The only really big issue is that they aren't organized by position, so I have to be monitoring that on the fly so that I don't draft 4 centremen and only 1 winger (our league has 2 positions of each C, LW, RW, 4 D, 2 Goalies and 1 Utility spot).

All in all I feel comfortable and confident going into this draft.
I think the main thing is to have a range for each player based on past performance. Using regression for this is probably more trouble than it's worth. However, for future reference, this is how you would construct a time series.

For each player included in the study, do the following:

Let's use Ovechkin's 7 seasons from '06-'12 as an example. Calculate the Y variable in the manner you believe is most reliable for your study. For goals, I might suggest using "adjusted GPG." Once you have calculated this, it will be your dependent (Y) variable and should be in one column. So Ovechkin's adjusted GPG for seasons '06-'12 might be in cells C1-C7. Next, label your independent (X) variables, which will be time lagged from your Y variable. You might label them T-1, T-2, etc. You would then copy and paste cells C1-C6 into cells D2-D7 for variable T-1. You don't copy cell C7, because that would be his adjusted GPG for 2012, and couldn't be used until at least 2013. You don't copy anything into cell D1, because he has no data before 2006. For variable T-2, you would copy cells C1-C5 (or D2-D6) into cells E3-E7. Again, for each season you lag, you would lose one observation (i.e., if only using T-1, then lose Y for 2006... if using T-2 also, then also lose Y for 2007). You don't want any gaps in your X variables (e.g., having a T-1, but no T-2), as this will affect your results. If you only used T-1 & T-2 as X variables, and found T-2 to be an insignificant variable, then you would recaulculate the regression only using T-1.

Hopes this makes some sense and may be useful to someone.

Czech Your Math is offline   Reply With Quote
Old
01-12-2013, 04:53 PM
  #16
footitt
Registered User
 
Join Date: Jan 2013
Posts: 7
vCash: 500
Quote:
Let's use Ovechkin's 7 seasons from '06-'12 as an example. Calculate the Y variable in the manner you believe is most reliable for your study. For goals, I might suggest using "adjusted GPG." Once you have calculated this, it will be your dependent (Y) variable and should be in one column. So Ovechkin's adjusted GPG for seasons '06-'12 might be in cells C1-C7. Next, label your independent (X) variables, which will be time lagged from your Y variable. You might label them T-1, T-2, etc. You would then copy and paste cells C1-C6 into cells D2-D7 for variable T-1. You don't copy cell C7, because that would be his adjusted GPG for 2012, and couldn't be used until at least 2013. You don't copy anything into cell D1, because he has no data before 2006. For variable T-2, you would copy cells C1-C5 (or D2-D6) into cells E3-E7. Again, for each season you lag, you would lose one observation (i.e., if only using T-1, then lose Y for 2006... if using T-2 also, then also lose Y for 2007). You don't want any gaps in your X variables (e.g., having a T-1, but no T-2), as this will affect your results. If you only used T-1 & T-2 as X variables, and found T-2 to be an insignificant variable, then you would recaulculate the regression only using T-1.
Well, turns out my draft got postponed till tomorrow. Perfect, gives me time to try this out.

footitt is offline   Reply With Quote
Old
01-13-2013, 11:10 AM
  #17
Yossarian54
Registered User
 
Yossarian54's Avatar
 
Join Date: Oct 2011
Location: Perth, WA
Country: Australia
Posts: 1,102
vCash: 500
Quote:
Originally Posted by footitt View Post
Not too sure how I can use this data to help look forward. Would I want to analyze each individual player, and extrapolate from there? Or just look at groups of players and see which are the standouts?
I use the previous three years data in a similar way (adjust dependent on age, sh%, etc).

The main use I find is to spot inconsistencies between your data, pre-draft rankings and actual draft order (i.e. in real time if you can organise your spreadsheet to allow you to do so). This hopefully allows you to pick up a few 'bargains'.

Yossarian54 is offline   Reply With Quote
Old
01-13-2013, 06:37 PM
  #18
Hardyvan123
tweet@HardyintheWack
 
Join Date: Jul 2010
Location: Vancouver
Country: Canada
Posts: 13,118
vCash: 500
I'm not a real sophisticated stats guy but what about using your data for the years 11,10 and 9 and running a "projection" for 12?

At the very least you could find out how close to being accurate it might be and you could do so going back as far as we have stats for as well.

Hardyvan123 is offline   Reply With Quote
Reply

Forum Jump


Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -5. The time now is 09:10 PM.

monitoring_string = "e4251c93e2ba248d29da988d93bf5144"
Contact Us - HFBoards - Archive - Privacy Statement - Terms of Use - Advertise - Top - AdChoices

vBulletin Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
HFBoards.com is a property of CraveOnline Media, LLC, an Evolve Media, LLC company. 2014 All Rights Reserved.