Mobile Hockey's Future Become a Sponsor Site Rules Support Forum vBookie Page 2
 Notices Please do not post or solicit links to illegal game streams.

 By The Numbers Hockey Analytics... the Final Frontier. Explore strange new worlds, to seek out new algorithms, to boldly go where no one has gone before.

08-18-2011, 06:07 PM
#151
seventieslord
Student Of The Game

Join Date: Mar 2006
Location: Regina, SK
Country:
Posts: 30,158
vCash: 500
Quote:
 Originally Posted by plusandminus Let's see what that results in... Made up example: Player plays 80 games. ESGF=80. ESGA=40. 80-40 = +40 (+0.5 per game) 80/40 = 2.0 (that's R-On) 2.0 * 80 = +160 +40 = +160 ?? (Doesn't sound logical, but maybe I've missed something.) And the we multiply that by the scoring equivalent.
No, you're taking me literally. +/- is ultimately the player's R-On times how long they maintained that R-on, and what kind of scoring environment the played in. I can demonstrate this without even using a number.

Say players A and B both played in an era with the exact same scoring level (in the games in which they each played) and both achieved the same R-On but player A played twice as many games. player A's +/- would obviously be exactly double the +/- of player B's.

This is the important part, but it's the same concept: Say players A and B both had the exact same R-On and the same number of games played, but there were twice as many goals, on average, in player A's era. Player A's +/- would obviously be exactly double the +/- of player B, despite both having the same R-On. In other words, under your proposed formula, player A gets an advantage by playing in a time when goals were more abundant.

Quote:
 I thought the +/- used in the division was a +/- adjusted for era, so that shouldn't make any difference. If R-Off is OK too, then I think my points stand. We'll see what overpass and others say, if they comment.
no, you were using their raw +/- in your calculation.

Quote:
 To me it does.
The above should conclusively demonstrate to you that it doesn't.

Quote:
 OK. You started this dialogue by claiming I was wrong. Since it was the xth time during the last weeks, and I think you've been wrong (at least most of) the earlier times, I took the time to argue with you.
well of course you think I'm wrong if my opinion is not yours! What a surprise.

08-18-2011, 06:28 PM
#152
overpass
Registered User

Join Date: Jun 2007
Posts: 3,907
vCash: 500
Looks like there's a lot of interest in the exact calculation, so I'll lay it out clearly here.

Calculate XEV+/- by

1. Keep (\$ESGF+\$ESGA) constant. (This is an assumption that probably works slightly to the disadvantage of great defensive players and to the advantage of great offensive players, when you think about it. But I don't think I can make any other assumptions with available data and it's not a big deal.)

2. Take R-OFF to the exponent 0.65 to find the GF/GA ratio used for XEV+/-. Calculate a plus-minus figure given the total number of goals from step 1 and this ratio.

XEV+/-=(\$ESGF+\$ESGA)/(1+R-OFF^0.65)*R-OFF^0.65) - (\$ESGF+\$ESGA)/(1+R-OFF^0.65)

So if the player has an R-OFF of 0.90, 250 \$ESGF, and 250 \$ESGA,

=(250+250)/(1+0.933)*0.933 - (250+250)/(1+0.933)
=241 - 259
=-18

And the the adjusted plus-minus figure is just (EV+/- adjusted for era) - (XEV+/-)

or \$ESGF - \$ESGA - XEV+/-

Why choose 0.65 as the exponent?

Let's look at the all-time highest and lowest players in XEV+/-. If it's calculated correctly, the two lists should be similar in adjusted plus-minus.

To try to make the lists similar in quality, I'll limit them to players with between 1000 and 1200 NHL regular season games played (a rough measure of career quality that is independent from plus-minus except in that they both measure quality to some degree).

 Player R-ON R-OFF XEV+/- EV+/- AEV+/- Serge Savard 1.44 1.52 318 425 107 Wayne Cashman 1.45 1.43 199 316 117 Guy Lafleur 1.67 1.35 193 492 299 Bob Gainey 1.23 1.51 177 136 -41 Phil Esposito 1.25 1.25 162 253 91 Eric Desjardins 1.27 1.25 157 255 98 Denis Potvin 1.49 1.23 145 420 275 Brad Park 1.40 1.20 144 400 256 Gary Suter 1.19 1.24 143 183 40 Jean Ratelle 1.51 1.27 135 353 218 Glenn Anderson 1.24 1.21 116 201 85 John Tonelli 1.34 1.25 112 221 109 Kris Draper 1.04 1.31 105 21 -84 Charlie Huddy 1.17 1.16 100 156 56 Bobby Carpenter 0.90 1.20 98 -87 -185 Brad Marsh 0.96 1.18 97 -32 -130 Rick Middleton 1.20 1.20 96 148 51 Bobby Clarke 1.79 1.20 93 454 361 Sergei Zubov 1.25 1.13 86 237 152 Gilbert Perreault 1.08 1.12 83 85 3 Mike Keane 1.06 1.21 81 40 -41 Average 1.26 1.25 135 223 87

 Player R-ON R-OFF XEV+/- EV+/- Dale Hawerchuk 1.00 0.93 -50 -2 Sergei Gonchar 1.13 0.92 -54 121 Mike Sillinger 0.69 0.89 -54 -258 Jay Wells 0.92 0.90 -56 -69 Bernie Federko 0.94 0.90 -58 -50 Ivan Boldirev 0.85 0.90 -59 -139 Andrew Cassels 0.92 0.88 -61 -65 Doug Wilson 1.08 0.90 -65 80 Jarome Iginla 1.14 0.90 -65 126 Harold Snepsts 0.90 0.88 -68 -86 Dave Ellett 0.97 0.90 -68 -34 Fredrik Olausson 1.04 0.88 -70 34 Ray Whitney 0.97 0.88 -76 -25 Andrew Brunette 1.01 0.84 -79 7 Rob Ramage 0.88 0.88 -83 -124 Geoff Sanderson 0.90 0.83 -92 -76 Don Lever 0.77 0.82 -96 -203 Dave Taylor 1.31 0.84 -99 234 Randy Carlyle 0.92 0.86 -102 -88 Borje Salming 1.14 0.82 -169 172 Average 0.96 0.88 -76 -22

Notice that the players with the higher XEV+/- (due to their higher R-OFF) still have a higher adjusted plus-minus than the players with the lower XEV+/- and lower R-OFF, suggesting that the adjustment is not too high.

I also ran a regression predicting R-ON using R-OFF for all players with between 500 and 1000 GP between 1968 and 2011, n=881. I chose those GP cutoffs in an attempt to pick players with a decent NHL career, but leaving out most of the great players. This regression gave me the exponent 0.736 to apply to R-OFF to predict R-ON.

Both these tests may be affected by the fact that NHL talent distribution has been fairly uneven at times since 1968, and good players have sometimes tended to end up on teams with other good players due to disparities in management quality. As a result I don't adjust as much as either of these tests suggest. And maybe I should adjust even less - it's hard to say for sure.

 08-19-2011, 08:16 AM #153 Czech Your Math Registered User     Join Date: Jan 2006 Location: bohemia Country: Posts: 4,841 vCash: 500 Even Strength "win shares" I have an alternative that might be fairer to players on great teams without using somewhat arbitrary regressions to the mean. It's not exactly comparable to adjusted plus-minus, but it uses much of the same methodology. For lack of a better term, I might call it "even strength value". It has two primary components: 1. Player's share of team success at even strength 2. Player's marginal (additional) success at even strength Once you calculate each component, simply add them together. Player's share of team success at ES is calculated as: 82 * (Team Exp. ES Win %) * (Player's ESGF + ESGA) / (Team's ESGF + ESGA) where Team Expected ES Win % = (ESGF)^N / (ESGF^N + ESGA^N) this is the pythagorean win formula; N = 2 (or another number if supported by data) Player's marginal contribution to team success is calculated as follows: Subtract the player's ESGF and ESGA from the team's totals. Recalculate the ES Win % from the new numbers (this is ES Win % without player). Subtract Team Exp. ES Win % from ES Win % without player. Multiply the difference in Win % by 82 to yield player's marginal contribution. Then add player's share of team success ES and player's marginal contribution to team success at ES to get "ES Value" (whatever is proper term). The results are in the same ballpark as plus-minus. Some career results that I just calculated: Bossy 220 (24.0/82) Potvin 324 (25.1/82) Gretzky 479 (26.4/82) Lemieux 277 (24.8/82) Forsberg 223 (25.9/82) Lindros 243 (26.3/82) Jagr 393 (25.3/82) Last edited by Czech Your Math: 08-19-2011 at 08:21 AM.
08-19-2011, 08:22 AM
#154
Registered User

Join Date: Jun 2010
Country:
Posts: 11,253
vCash: 500
Quote:
 Originally Posted by overpass Why choose 0.65 as the exponent? Let's look at the all-time highest and lowest players in XEV+/-. If it's calculated correctly, the two lists should be similar in adjusted plus-minus. To try to make the lists similar in quality, I'll limit them to players with between 1000 and 1200 NHL regular season games played (a rough measure of career quality that is independent from plus-minus except in that they both measure quality to some degree). Notice that the players with the higher XEV+/- (due to their higher R-OFF) still have a higher adjusted plus-minus than the players with the lower XEV+/- and lower R-OFF, suggesting that the adjustment is not too high. I also ran a regression predicting R-ON using R-OFF for all players with between 500 and 1000 GP between 1968 and 2011, n=881. I chose those GP cutoffs in an attempt to pick players with a decent NHL career, but leaving out most of the great players. This regression gave me the exponent 0.736 to apply to R-OFF to predict R-ON. Both these tests may be affected by the fact that NHL talent distribution has been fairly uneven at times since 1968, and good players have sometimes tended to end up on teams with other good players due to disparities in management quality. As a result I don't adjust as much as either of these tests suggest. And maybe I should adjust even less - it's hard to say for sure.
So this adjustment is the way it is, and has been updated a couple of times by you since it was first posted, to get it to look right?

I get that you chose the exponent for what look like good reasons but aren't we in essence saying here that we are massaging this into giving us what we went looking for?

08-19-2011, 12:34 PM
#155
seventieslord
Student Of The Game

Join Date: Mar 2006
Location: Regina, SK
Country:
Posts: 30,158
vCash: 500
Quote:
 Originally Posted by BraveCanadian So this adjustment is the way it is, and has been updated a couple of times by you since it was first posted, to get it to look right? I get that you chose the exponent for what look like good reasons but aren't we in essence saying here that we are massaging this into giving us what we went looking for?
it sounds arbitrary, but it is well backed up statistically, and more importantly it is logical. Simply because two players who both outperformed their team's ratio by 20% (one who took it from 0.80 to 0.96 and another who took it from 1.10 to 1.32) should not be viewed equally. It's tougher and more impressive to outperform your team's already strong results.

08-19-2011, 02:51 PM
#156
Registered User

Join Date: Jun 2010
Country:
Posts: 11,253
vCash: 500
Quote:
 Originally Posted by seventieslord it sounds arbitrary, but it is well backed up statistically, and more importantly it is logical. Simply because two players who both outperformed their team's ratio by 20% (one who took it from 0.80 to 0.96 and another who took it from 1.10 to 1.32) should not be viewed equally. It's tougher and more impressive to outperform your team's already strong results.
It doesn't just sound arbitrary - it is arbitrary - even if it is an educated guess that we think "looks right".

Secondly, going from 0.80 to 0.96 might actually be more difficult considering what that player has to work with in comparison to the guy on the stronger overall team.

You certainly can't say for sure based on a purely hypothetical case with just those numbers. That could go either way.

It could be harder for either one dependent on the environment they play in as well as the competition.

08-19-2011, 03:30 PM
#157
plusandminus
Registered User

Join Date: Mar 2011
Posts: 980
vCash: 500
Quote:
 Originally Posted by seventieslord As someone who understands numbers you should understand why an accumulated total should not just be divided by a ratio for no good statistical reason other than you think it looks good. A ratio over a ratio at least makes sense logically. ... I am sure the "best guys" will [explain it] eventually though.
Here is what overpass himself wrote yesterday (with me bolding some text):
Quote:
 Originally Posted by overpass ... 2. Take R-OFF to the exponent 0.65 to find the GF/GA ratio used for XEV+/-. Calculate a plus-minus figure given the total number of goals from step 1 and this ratio. XEV+/-=(\$ESGF+\$ESGA)/(1+R-OFF^0.65)*R-OFF^0.65) - (\$ESGF+\$ESGA)/(1+R-OFF^0.65)
You do see that he involves R-Off in the division?
He did post that after you several times had tried to lecture me that one can't.
He also, and you too in your last post, admits that it is even being done based on (at least partly) what we think "looks good".

In all honesty, I do hope you recognize things like these. Or will you continue this dialogue by continue to lecture me about how I don't understand how to use stats, etc.?

Several times in this thread you have tried to correct me regarding something I apparantly was right about, and which I still think you were pretty wrong about. I don't write statements randomly. I actually have a long experience of handling stats (including hockey stats), may have done more in-depth studies than you have done, and if you think I'm young and unexperienced I can inform you that I seem to be about ten years older than you.

What I just wrote doesn't necessarily say (and definately doesn't "prove") anything. I am sometimes wrong about things. Others are too, including the "best" ones. I should of course be "judged" on what I write, and single statements be "judged" by their validity/correctness. But it your case, it seems you just "think" things are a certain way - perhaps because of some preconception - rather than have an open mind.
To try an analogy (maybe not the best one):
For a stats experienced guy like me, it seems like when people make dead sure statements about players they really don't know much about (as a guy with an interest in history of hockey, I'm sure you've seen many examples of that). Just like veterans here might think it's a bit disrespectful when e.g. young people makes dead sure statements about things they seem to know little about, you may do well to think about stats experienced guys may thinking/feeling it's a bit disrespectful with "dead sure" statements like the ones you made in this case.

Everyone and everything should of course be open for questioning. Everyone can be wrong about everything, and even if not wrong, the debate can be good. I've just reacted to what I've experienced as some kind of "pattern" from your side. Of course I am free to leave the board if I don't like it (in case anyone would suggest that).

Wonder if I get even more "bashed" after writing this.

Last edited by plusandminus: 08-19-2011 at 03:31 PM. Reason: spelling

08-19-2011, 05:25 PM
#158
seventieslord
Student Of The Game

Join Date: Mar 2006
Location: Regina, SK
Country:
Posts: 30,158
vCash: 500
Quote:
 Originally Posted by plusandminus Here is what overpass himself wrote yesterday (with me bolding some text): (Continue replying to seventieslord) You do see that he involves R-Off in the division? He did post that after you several times had tried to lecture me that one can't.
OK, so now you just proved you're not even paying attention.

I never said R-Off shouldn't be included - R-off is very important to include! Otherwise, you wouldn't have anything to compare the player's on-ice performance to, duh. I did, however, algebraically demonstrate that including raw +/- in the way that you did is poor use of statistics. Sorry if that didn't sit right with you.

It was demonstrated quite clearly that +/- (basically, the degree to which it can swing both up and down) is directly impacted by the league scoring level. R-On, on the other hand, shouldn't be. You introduced a biased stat into the equation, taking out a good one that made logical sense to begin with, for no good reason other than you felt it looked good. You were also called on this by der kaiser quite eloquently (he's one of those people who can call out bad use of stats much better than I can) and haven't explained any real reasoning behind it.

Quote:
 He also, and you too in your last post, admits that it is even being done based on (at least partly) what we think "looks good".
Not just what "we think". Not sure what Brave Canadian might have been thinking to make his last post, but obviously a great player on a bad team is going to outperform his R-off to a greater degree than a great player on a good team (look how little Lidstrom outperforms his R-off by, and an elite defensive forward like Gainey couldn't even outperform his R-off, while Ramsay, who most consider inferior, easily outperformed Buffalo's.). Clearly a team adjustment is needed. overpass spoke already to its creation and I don't need to go on about that.

Quote:
ok, disprove what i proved to you, then.

to put it another way - GAA is a terrible stat to use for goalies. Even if one hates sv%, at least it is more pure. To anyone who uses GAA, I tell them "GAA is just the inverse of sv%, times shots against per game, why include something the goalie has little to no control over?" Same thing here. +/- is just R-on times duration times league offense level (assuming games in which the player played were in line with league average). So why take R-on and bastardize it with things the player can't control? Just use R-on! Like overpass already did.

Now disprove that, if you are still sure I'm so wrong.

Quote:
 Everyone and everything should of course be open for questioning. Everyone can be wrong about everything, and even if not wrong, the debate can be good. I've just reacted to what I've experienced as some kind of "pattern" from your side. Of course I am free to leave the board if I don't like it (in case anyone would suggest that). Wonder if I get even more "bashed" after writing this.
You are pretty hypersensitive if you think you are being bashed, bro.

08-19-2011, 06:21 PM
#159
overpass
Registered User

Join Date: Jun 2007
Posts: 3,907
vCash: 500
Quote:
 Originally Posted by BraveCanadian So this adjustment is the way it is, and has been updated a couple of times by you since it was first posted, to get it to look right? I get that you chose the exponent for what look like good reasons but aren't we in essence saying here that we are massaging this into giving us what we went looking for?
There's definitely a subjective element to the size of the adjustment. But there always is when working with numbers. It's important to have some subject-matter knowledge so you know what the numbers mean, and from that you decide what to do with them.

I've done my best to find a number that fits best for all situations, and have presented some reasons why I chose the one I did. It's definitely possible that I'm picking the numbers to fit my bias. I have tried to avoid that.

Do you think the adjustment for team strength is too high?

08-19-2011, 06:59 PM
#160
Registered User

Join Date: Jun 2010
Country:
Posts: 11,253
vCash: 500
Quote:
 Originally Posted by overpass There's definitely a subjective element to the size of the adjustment. But there always is when working with numbers. It's important to have some subject-matter knowledge so you know what the numbers mean, and from that you decide what to do with them. I've done my best to find a number that fits best for all situations, and have presented some reasons why I chose the one I did. It's definitely possible that I'm picking the numbers to fit my bias. I have tried to avoid that. Do you think the adjustment for team strength is too high?
To be honest I'm not sure what I think!

I have to let it stew some more. I didn't realize before you spelled it out more thoroughly that was how the numbers were arrived at..

Not saying that there is anything wrong with what you have done because obviously you have made reasonable moves to try and level the uneven playing field.

I just wonder, similarly to Czech your Math, if there is some way to do the leveling without introducing a component to the equation that is plugged in because it seems to fit.

Or maybe a different approach altogether might shed some light on things.

08-19-2011, 07:03 PM
#161
plusandminus
Registered User

Join Date: Mar 2011
Posts: 980
vCash: 500
Quote:
 Originally Posted by seventieslord I never said R-Off shouldn't be included - R-off is very important to include! Otherwise, you wouldn't have anything to compare the player's on-ice performance to, duh.
If so, we agree on that. (I've been saying that from the beginning.)

Quote:
 Originally Posted by seventieslord I did, however, algebraically demonstrate that including raw +/- in the way that you did is poor use of statistics. Sorry if that didn't sit right with you.
I didn't think you did demonstrate it, obviously.

Quote:
 Originally Posted by seventieslord It was demonstrated quite clearly that +/- (basically, the degree to which it can swing both up and down) is directly impacted by the league scoring level.
Yes, one needs to calculate an adjusted +/-.

Quote:
 Originally Posted by seventieslord R-On, on the other hand, shouldn't be.

Quote:
 Originally Posted by seventieslord You introduced a biased stat into the equation, taking out a good one that made logical sense to begin with, for no good reason other than you felt it looked good.
I did it based on the presumption that the +/- I used were adjusted in the same way as R-Off. That presumption was wrong, as it was a "raw" +/-.
It was simply a misunderstanding. You just could have mentioned that it wasn't, instead of going at me the way you did.

(Edit: Instead you started talking about me doing an error by "dividing an accumulated number by an average". Had you said that "You are dividing an unadjusted number by an adjusted", I would likely have understood, and likely agreed.)

Quote:
 Originally Posted by seventieslord You were also called on this by der kaiser quite eloquently (he's one of those people who can call out bad use of stats much better than I can) and haven't explained any real reasoning behind it.
I didn't interpret his post that way. Still don't after just having re-reading it. I interpret him as talking rather about eye-view and the seemingly arbitrary use of the expondent.

Our current dialogue started when I commented on overpass' adjustments, based on "eye-view".
You reacted upon a table put there as a quick example, in which I made a calculation based on the assumption that both the dividend and divisor were of the "same kind".
That wasn't due to me being a poor statistician (or whatever you call me, as opposed to the guys you mention you think highly of). It was simply a misunderstanding of what the different abbrevations (?) that overpass used stood for.

Quote:
 Originally Posted by seventieslord to put it another way - GAA is a terrible stat to use for goalies. Even if one hates sv%, at least it is more pure. To anyone who uses GAA, I tell them "GAA is just the inverse of sv%, times shots against per game, why include something the goalie has little to no control over?"
I used to think so too.

Quote:
 Originally Posted by seventieslord Same thing here. +/- is just R-on times duration times league offense level (assuming games in which the player played were in line with league average). So why take R-on and bastardize it with things the player can't control? Just use R-on! Like overpass already did.
I still think you overrate the method. It is still arbitrary, based on assumptions, and definitely is much biased by things the player cannot control.
That goes for R-On and R-Off too.

Last edited by plusandminus: 08-19-2011 at 07:49 PM.

08-19-2011, 08:36 PM
#162
seventieslord
Student Of The Game

Join Date: Mar 2006
Location: Regina, SK
Country:
Posts: 30,158
vCash: 500
Quote:
 Originally Posted by plusandminus I didn't think you did demonstrate it, obviously.
Then go back and explain what is wrong with that logic.

Quote:
 Yes, one needs to calculate an adjusted +/-.
But you only need the raw +/- in the final step, not at the point that you tried to insert it.

Quote:
 I did it based on the presumption that the +/- I used were adjusted in the same way as R-Off. That presumption was wrong, as it was a "raw" +/-. It was simply a misunderstanding. You just could have mentioned that it wasn't, instead of going at me the way you did.
Even if using an adjusted +/-, I don't see the logic of that formula. Why take the square root, for example?

Quote:
 (Edit: Instead you started talking about me doing an error by "dividing an accumulated number by an average". Had you said that "You are dividing an unadjusted number by an adjusted", I would likely have understood, and likely agreed.)
As per above, I don't think that makes the calculation any more sound.

Quote:
 I didn't interpret his post that way. Still don't after just having re-reading it. I interpret him as talking rather about eye-view and the seemingly arbitrary use of the expondent.
not "seemingly".

Quote:
 Our current dialogue started when I commented on overpass' adjustments, based on "eye-view". You reacted upon a table put there as a quick example, in which I made a calculation based on the assumption that both the dividend and divisor were of the "same kind". That wasn't due to me being a poor statistician (or whatever you call me, as opposed to the guys you mention you think highly of). It was simply a misunderstanding of what the different abbrevations (?) that overpass used stood for.
I don't understand how this makes it any better. You proposed a baseless adjustment, which was the point of my "discrediting", and now you say you were adjusting without fully understanding what it was you were adjusting, and what you say you actually meant to do is no less baseless!

Quote:
 I used to think so too.
You don't anymore?

Quote:
 I still think you overrate the method. It is still arbitrary, based on assumptions, and definitely is much biased by things the player cannot control. That goes for R-On and R-Off too.
Absolutely. I don't overrate the method. But what you did to it in that little example didn't make it better, it made it worse by making it even more of the above (much biased by things the player cannot control) in addition to being baseless. Here we are a page later.

08-20-2011, 10:28 AM
#163
plusandminus
Registered User

Join Date: Mar 2011
Posts: 980
vCash: 500
Quote:
 Originally Posted by seventieslord ...
For the soundness of the debate, I don't think the "You said!", "No, I didn't!", "He said!", "No, he didn't!" way of discussing is very fruitful. So I'll just let my previous written words stand, just like you let your stand, and try to focus on what I mean.

If attempting to calculate an "environment and era adjusted career +/-", for me it would be natural to think:

1. Adjust all seasonal ESGF, ESGA, +/-, time units for each player so that different seasons are comparable.
2. Calculate seasonal "without" ESGF, ESGA, +/-, time units for each player. Adjust them so that they are based on the same minutes on ice as the player.
3. (We now have the seasonal "with" and "without", both in the same "unit".)
4. Aggregate season by season. Handle time units wisely.
5. (We now have career "with" and "without".)
6. Combine "with" and "without" to produce an "environment adjusted" +/-.

We then show our results to the reader:
(Difference is between "environment adjusted" and "non-environment adjusted". For example it may differ by -40, or by + 140 %, or -25 %.)

If readers would be curious about instead/also see at ESGF (or ESGA), it would be easy to do so. One just instead shows:

My list of steps above does not detail everything about each step.

Now, I'll turn to R-On and R-Off, and my objections regarding them.

Instead of looking at them at something almost "divine" that one trusts more or less "blindly" ("because overpass knows what he's doing"), it should be OK to question them. They may be great, if so it might soon be showed.

I would say that R-On and R-Off should be attempted to be calculated after steps 1-6 above.
What they turn out to be, depends on how we use "adj +/- with" and "adj +/- without". Considering all the different ways they can be combined in to give the "environment adjusted +/-", I would consider the results (R-On and R-Off) very arbitrary.

R-On is an arbitrary calculated equivalent of my "adj +/- with".
R-Off is an arbitrary calculated equivalent of my "adj +/-" without".
Overpass environment adjusted +/- is a result of an arbitrary calculation using those two arbitrary values adjusted by an arbitrary exponent.

One might look upon R-On and R-Off as the "diff %" column above. It should be calculated last, and is the result of our arbitrary tweaking of "adj +/- with".

I admit it may look appealing show "with" and "without" as two ratios (R-On and R-Off).

But in reality, all we know is that R-On above 1 is basically "better than average" while R-On below 1 is "worse than average". The rest appears very unreliable. 1.2, 1.6, 1.1, 1.4 are all just numbers saying "above average", we can't really rank them saying one is better than the other. Same with R-Off, we only can see if above, at, or below average, not really say much more.
Also, comparing R-On with R-Off would be slightly like apples and oranges. We may not really know how an R-On of 1.3 compares to an R-Off of 1.2.

The same thing can be decided just by looking at the "adj +/- with" and "adj +/- without".
Actually, I think looking at them instead would be more telling.

If R-On and R-Off are more advanced than I've realized yet, just try to make me aware of that by simply trying to explain what I've missed.

Now I'll continue to try to explain more about why using adjusted +/-.

I think instead of showing those R-On and R-Off (which I consider unreliable, arbitrary and flawed), why not instead show "adj +/-" divided by the time unit.

It would be similar to those "per 60 min" stats that are around on the Internet. In this case, one can adjust it to "per season".

Get rid of R-On and R-Off.

Then we'll have an even clearer picture:

The "/seas" columns will be the equivalents of the ratios. Instead of showing values from say "2.0 to 0.1", we will show +/- in the range of say "+60 to -60". It will be as easy for the reader to see how players compare to average (which is 0 now, instead of 1 as in the case of the ratios). The difference is that instead of showing ESGF divided by ESGA, we show something based on ESGF-ESGA (+/-).

If I haven't already said it, I see no need to divide ESGF by ESGA.
Example (both players having same ice time):
ESGF=10, ESGA=6, ES+/- = +4. Quota = 1.667
ESGF=4, ESGA=2, ES+/- = +2. Quota = 2.000
I think +4 is better than +2. I think using quotas should be avoided.

I have done these things myself previously, for NHL mostly in 2002-03. One can very well argue that just comparing "with" and "without" makes for comparing apples with oranges.

One just can look at stats for single seasons. What would different players "environment adjusted +/-" look like? A simple assumpion might be that ideally player's tadj+/- would look pretty similar from year to year. Does it look similar from year to year based on overpass' method?
(I know aggregating seasons might make bias become lesser, but anyway.)

Quote:
 Even if using an adjusted +/-, I don't see the logic of that formula. Why take the square root, for example?
It was just an example.
Why take the .65 exponent like Overpass does? (I know he explained his reasons.)

Quote:
 But you only need the raw +/- in the final step, not at the point that you tried to insert it.
It may depend on how one does things, including in which order.

Last edited by plusandminus: 08-20-2011 at 10:44 AM.

08-20-2011, 10:40 AM
#164
overpass
Registered User

Join Date: Jun 2007
Posts: 3,907
vCash: 500
Quote:
Originally Posted by plusandminus
For the soundness of the debate, I don't think the "You said!", "No, I didn't!", "He said!", "No, he didn't!" way of discussing is very fruitful. So I'll just let my previous written words stand, just like you let your stand, and try to focus on what I mean.

If attempting to calculate an "environment and era adjusted career +/-", for me it would be natural to think:

1. Adjust all seasonal ESGF, ESGA, +/-, time units for each player so that different seasons are comparable.
2. Calculate seasonal "without" ESGF, ESGA, +/-, time units for each player. Adjust them so that they are based on the same minutes on ice as the player.
3. (We now have the seasonal "with" and "without", both in the same "unit".)
4. Aggregate season by season. Handle time units wisely.
5. (We now have career "with" and "without".)
6. Combine "with" and "without" to produce an "environment adjusted" +/-.

We then show our results to the reader:
(Difference is between "environment adjusted" and "non-environment adjusted". For example it may differ by -40, or by + 140 %, or -25 %.)

If readers would be curious about instead/also see at ESGF (or ESGA), it would be easy to do so. One just instead shows:

My list of steps above does not detail everything about each step.

Now, I'll turn to R-On and R-Off, and my objections regarding them.

Instead of looking at them at something almost "divine" that one trusts more or less "blindly" ("because overpass knows what he's doing"), it should be OK to question them. They may be great, if so it might soon be showed.

I would say that R-On and R-Off should be attempted to be calculated after steps 1-6 above.
What they turn out to be, depends on how we use "adj +/- with" and "adj +/- without". Considering all the different ways they can be combined in to give the "environment adjusted +/-", I would consider the results (R-On and R-Off) very arbitrary.

R-On is an arbitrary calculated equivalent of my "adj +/- with".
R-Off is an arbitrary calculated equivalent of my "adj +/-" without".
Overpass environment adjusted +/- is a result of an arbitrary calculation using those two arbitrary values adjusted by an arbitrary exponent.

One might look upon R-On and R-Off as the "diff %" column above. It should be calculated last, and is the result of our arbitrary tweaking of "adj +/- with".

I admit it may look appealing show "with" and "without" as two ratios (R-On and R-Off).

But in reality, all we know is that R-On above 1 is basically "better than average" while R-On below 1 is "worse than average". The rest appears very unreliable. 1.2, 1.6, 1.1, 1.4 are all just numbers saying "above average", we can't really rank them saying one is better than the other. Same with R-Off, we only can see if above, at, or below average, not really say much more.

The same thing can be decided just by looking at the "adj +/- with" and "adj +/- without".
Actually, I think looking at them instead would be more telling.

Now I'll continue to try to explain more about why using adjusted +/-.

I think instead of showing those R-On and R-Off (which I consider unreliable, arbitrary and flawed), why not instead show "adj +/-" divided by the time unit.

It would be similar to those "per 60 min" stats that are around on the Internet. In this case, one can adjust it to "per season".

Get rid of R-On and R-Off.

Then we'll have an even clearer picture:

The "/seas" columns will be the equivalents of the ratios. Instead of showing values from say "2.0 to 0.1", we will show +/- in the range of say "+60 to -60". It will be as easy for the reader to see how players compare to average (which is 0 now, instead of 1 as in the case of the ratios). The difference is that instead of showing ESGF divided by ESGA, we show something based on ESGF-ESGA (+/-).

If I haven't already said it, I see no need to divide ESGF by ESGA.
Example (both players having same ice time):
ESGF=10, ESGA=6, ES+/- = +4. Quota = 1.667
ESGF=4, ESGA=2, ES+/- = +2. Quota = 2.000
I think +4 is better than +2. I think using quotas should be avoided.

I have done these things myself previously, for NHL mostly in 2002-03. One can very well argue that just comparing "with" and "without" makes for comparing apples with oranges.

One just can look at stats for single seasons. What would different players "environment adjusted +/-" look like? A simple assumpion might be that ideally player's tadj+/- would look pretty similar from year to year. Does it look similar from year to year based on overpass' method?
(I know aggregating seasons might make bias become lesser, but anyway.)

It was just an example.
Why take the .65 exponent like Overpass does? (I know he explained his reasons.)

It may depend on how ones does things, including in which order.
FYI, time on ice is not available for seasons before 1997-98. If you think about it, that explains why I chose to calculate as I did. I wanted a method that didn't require time on ice data.

I can go into more detail at a later time.

08-20-2011, 10:53 AM
#165
plusandminus
Registered User

Join Date: Mar 2011
Posts: 980
vCash: 500
Quote:
 Originally Posted by overpass FYI, time on ice is not available for seasons before 1997-98. If you think about it, that explains why I chose to calculate as I did. I wanted a method that didn't require time on ice data. I can go into more detail at a later time.
Yes, you are welcome. And I don't think you're stupid or anything, just trying to sort things out.

I know TOI wasn't available. (And I know about using GF and GA to try to estimate ice times.) I'm not sure if that changes most of my points, but we'll see.
We'll see when/if you have read more carefully, and perhaps responded, if there is something either of us seem to have missed.

Edit: If it was my mention of "time units", you replied on, I think they doesn't necessarily have to be ice time. I think GP can be used. Maybe (ESGF+ESGA) / (teamESGF+teamESGA) would be good enough.

Last edited by plusandminus: 08-20-2011 at 08:10 PM. Reason: spelling

 08-20-2011, 12:15 PM #166 Czech Your Math Registered User     Join Date: Jan 2006 Location: bohemia Country: Posts: 4,841 vCash: 500 "even strength value" Here's some calculations I made for "even strength value": Career Value (per 82 games) ---------------- Bourque 491 (24.5) Gretzky 479 (25.8) Lidstrom 434 (23.2) Jagr 406 (25.5) Potvin 324 (25.1) Orr 283 (39.0) Lafleur 286 (20.8) Clarke 282 (20.2) Lemieux 278 (24.9) Lindros 261 (27.0) Forsberg 231 (25.5) Bossy 220 (24.0) Ovechkin 152 (26.3) Crosby 130 (25.8) What I like about this metric: - results are fairly consistent from season to season - no need to additionally adjust for league scoring level - career results look reasonable Last edited by Czech Your Math: 08-21-2011 at 12:37 AM.
08-20-2011, 01:41 PM
#167
matnor
Registered User

Join Date: Oct 2009
Location: Boston
Country:
Posts: 512
vCash: 500
Quote:
Originally Posted by overpass
Looks like there's a lot of interest in the exact calculation, so I'll lay it out clearly here.

Calculate XEV+/- by

1. Keep (\$ESGF+\$ESGA) constant. (This is an assumption that probably works slightly to the disadvantage of great defensive players and to the advantage of great offensive players, when you think about it. But I don't think I can make any other assumptions with available data and it's not a big deal.)

2. Take R-OFF to the exponent 0.65 to find the GF/GA ratio used for XEV+/-. Calculate a plus-minus figure given the total number of goals from step 1 and this ratio.

XEV+/-=(\$ESGF+\$ESGA)/(1+R-OFF^0.65)*R-OFF^0.65) - (\$ESGF+\$ESGA)/(1+R-OFF^0.65)

So if the player has an R-OFF of 0.90, 250 \$ESGF, and 250 \$ESGA,

=(250+250)/(1+0.933)*0.933 - (250+250)/(1+0.933)
=241 - 259
=-18

And the the adjusted plus-minus figure is just (EV+/- adjusted for era) - (XEV+/-)

or \$ESGF - \$ESGA - XEV+/-

Why choose 0.65 as the exponent?

Let's look at the all-time highest and lowest players in XEV+/-. If it's calculated correctly, the two lists should be similar in adjusted plus-minus.

To try to make the lists similar in quality, I'll limit them to players with between 1000 and 1200 NHL regular season games played (a rough measure of career quality that is independent from plus-minus except in that they both measure quality to some degree).

 Player R-ON R-OFF XEV+/- EV+/- AEV+/- Serge Savard 1.44 1.52 318 425 107 Wayne Cashman 1.45 1.43 199 316 117 Guy Lafleur 1.67 1.35 193 492 299 Bob Gainey 1.23 1.51 177 136 -41 Phil Esposito 1.25 1.25 162 253 91 Eric Desjardins 1.27 1.25 157 255 98 Denis Potvin 1.49 1.23 145 420 275 Brad Park 1.40 1.20 144 400 256 Gary Suter 1.19 1.24 143 183 40 Jean Ratelle 1.51 1.27 135 353 218 Glenn Anderson 1.24 1.21 116 201 85 John Tonelli 1.34 1.25 112 221 109 Kris Draper 1.04 1.31 105 21 -84 Charlie Huddy 1.17 1.16 100 156 56 Bobby Carpenter 0.90 1.20 98 -87 -185 Brad Marsh 0.96 1.18 97 -32 -130 Rick Middleton 1.20 1.20 96 148 51 Bobby Clarke 1.79 1.20 93 454 361 Sergei Zubov 1.25 1.13 86 237 152 Gilbert Perreault 1.08 1.12 83 85 3 Mike Keane 1.06 1.21 81 40 -41 Average 1.26 1.25 135 223 87

 Player R-ON R-OFF XEV+/- EV+/- Dale Hawerchuk 1.00 0.93 -50 -2 Sergei Gonchar 1.13 0.92 -54 121 Mike Sillinger 0.69 0.89 -54 -258 Jay Wells 0.92 0.90 -56 -69 Bernie Federko 0.94 0.90 -58 -50 Ivan Boldirev 0.85 0.90 -59 -139 Andrew Cassels 0.92 0.88 -61 -65 Doug Wilson 1.08 0.90 -65 80 Jarome Iginla 1.14 0.90 -65 126 Harold Snepsts 0.90 0.88 -68 -86 Dave Ellett 0.97 0.90 -68 -34 Fredrik Olausson 1.04 0.88 -70 34 Ray Whitney 0.97 0.88 -76 -25 Andrew Brunette 1.01 0.84 -79 7 Rob Ramage 0.88 0.88 -83 -124 Geoff Sanderson 0.90 0.83 -92 -76 Don Lever 0.77 0.82 -96 -203 Dave Taylor 1.31 0.84 -99 234 Randy Carlyle 0.92 0.86 -102 -88 Borje Salming 1.14 0.82 -169 172 Average 0.96 0.88 -76 -22

Notice that the players with the higher XEV+/- (due to their higher R-OFF) still have a higher adjusted plus-minus than the players with the lower XEV+/- and lower R-OFF, suggesting that the adjustment is not too high.

I also ran a regression predicting R-ON using R-OFF for all players with between 500 and 1000 GP between 1968 and 2011, n=881. I chose those GP cutoffs in an attempt to pick players with a decent NHL career, but leaving out most of the great players. This regression gave me the exponent 0.736 to apply to R-OFF to predict R-ON.

Both these tests may be affected by the fact that NHL talent distribution has been fairly uneven at times since 1968, and good players have sometimes tended to end up on teams with other good players due to disparities in management quality. As a result I don't adjust as much as either of these tests suggest. And maybe I should adjust even less - it's hard to say for sure.
Thanks for explaining it. Btw, something looks a bit off with Langway's XEV+/-. I assume the R-OFF and R-ON numbers are a bit too low?

08-20-2011, 07:13 PM
#168
overpass
Registered User

Join Date: Jun 2007
Posts: 3,907
vCash: 500
Quote:
 Originally Posted by Czech Your Math I have an alternative that might be fairer to players on great teams without using somewhat arbitrary regressions to the mean. It's not exactly comparable to adjusted plus-minus, but it uses much of the same methodology. For lack of a better term, I might call it "even strength value". It has two primary components: 1. Player's share of team success at even strength 2. Player's marginal (additional) success at even strength Once you calculate each component, simply add them together. Player's share of team success at ES is calculated as: 82 * (Team Exp. ES Win %) * (Player's ESGF + ESGA) / (Team's ESGF + ESGA) where Team Expected ES Win % = (ESGF)^N / (ESGF^N + ESGA^N) this is the pythagorean win formula; N = 2 (or another number if supported by data) Player's marginal contribution to team success is calculated as follows: Subtract the player's ESGF and ESGA from the team's totals. Recalculate the ES Win % from the new numbers (this is ES Win % without player). Subtract Team Exp. ES Win % from ES Win % without player. Multiply the difference in Win % by 82 to yield player's marginal contribution. Then add player's share of team success ES and player's marginal contribution to team success at ES to get "ES Value" (whatever is proper term). The results are in the same ballpark as plus-minus. Some career results that I just calculated: Bossy 220 (24.0/82) Potvin 324 (25.1/82) Gretzky 479 (26.4/82) Lemieux 277 (24.8/82) Forsberg 223 (25.9/82) Lindros 243 (26.3/82) Jagr 393 (25.3/82)
Interesting.

Share of team success is giving the player full credit for what happened while they are on the ice, right? Similar to plus-minus.

So a player who is on the ice for 1/3 of his team's ESGF and ESGA, on a team with a 0.500 team expected win% gets 41/3 points from this part.

And if the team's expected win% was 0.450 after subtracting the player's on-ice GF and GA, he gets additional credit for (0.500 - 0.450)*82, or 4.1.

My thoughts:

The first part of the calculation (dividing team value) seems very generous. I get that you want to give value to all players in a way that adjusted plus-minus does not, which makes sense because NHL coaches only play players who are contributing, at least in the medium to long run. But the amount of value being divided like a lot.

In baseball metrics of this type, it's common to subtract the win% of a team of replacement level players before dividing up value. That win% is probably lower in hockey than in baseball. We've seen teams like the 1992-93 Ottawa Senators that had some NHL players and still finished with an 0.150 winning percentage. But it might make sense to subtract 0.100 or 0.150 from the team win% before dividing up the value.

I just calculated the career results for all players (since I have the numbers in a spreadsheet, this takes a minute). The numbers are slightly different than the ones you posted, probably because I used a different method to estimate individual ESGA and ESGF.

I also included the two different parts of the calculations, labeled TeamEV and IndEV(for Individual EV). Not trying to name your stats for you, just giving the columns a label

 Player GP TeamEV IndEV EV Ray Bourque 1612 379 123 502 Larry Robinson 1384 408 50 458 Scott Stevens 1635 424 31 455 Wayne Gretzky 1487 395 53 449 Nicklas Lidstrom 1494 376 51 427 Chris Chelios 1651 390 25 415 Larry Murphy 1615 357 51 408 Jaromir Jagr 1273 296 100 395 Al MacInnis 1416 328 64 392 Paul Coffey 1409 377 8 385 Ron Francis 1731 290 62 352 Denis Potvin 1060 296 39 335 Steve Yzerman 1514 314 18 332 Brad Park 1115 296 33 329 Borje Salming 1148 238 84 322 Mark Recchi 1652 308 12 320 Mark Messier 1756 331 -11 320 Phil Housley 1495 293 26 319 Bryan Trottier 1279 269 49 318 Joe Sakic 1378 286 30 316 Teemu Selanne 1259 230 85 315 Scott Niedermayer 1263 305 6 311 Mike Modano 1499 268 40 308 Bobby Orr 596 211 95 306 Serge Savard 1038 315 -10 305 Brian Leetch 1205 281 22 303 Brad Mccrimmon 1222 260 43 303 Brendan Shanahan 1524 276 25 301 Luc Robitaille 1431 260 37 298 Guy Lafleur 1126 264 33 297 Doug Gilmour 1474 265 32 297 Bobby Clarke 1147 231 63 294 Marcel Dionne 1348 222 71 294 Chris Pronger 1154 249 45 294 Mike Gartner 1432 264 24 287 Sergei Zubov 1068 265 22 287 Mark Howe 929 207 78 284 Eric Desjardins 1143 279 5 284 Mathieu Schneider 1289 259 23 282 Adam Oates 1337 266 16 282 Jeremy Roenick 1363 247 32 279 Dave Andreychuk 1639 249 28 276 Pierre Turgeon 1294 237 39 276 Mats Sundin 1346 241 34 275 Sergei Fedorov 1249 234 38 272 Teppo Numminen 1372 235 36 271 John Leclair 967 209 56 266 Brett Hull 1269 270 -7 263 Roman Hamrlik 1311 252 10 262 Mario Lemieux 915 201 60 261

Looking at the results, I think it gives too much value for simply showing up. Defencemen dominate the results because they play more ice time, and the majority of the value in the metric comes from playing a lot for good teams. On the other hand, if we're comparing metrics, plus-minus (adjusted or not) penalizes players for below-average performance which isn't ideal.

It's definitely another way of looking at things, and worth pursuing.

08-20-2011, 08:35 PM
#169
Registered User

Join Date: Jan 2006
Location: bohemia
Country:
Posts: 4,841
vCash: 500
Quote:
 Originally Posted by overpass Interesting. Share of team success is giving the player full credit for what happened while they are on the ice, right? Similar to plus-minus. So a player who is on the ice for 1/3 of his team's ESGF and ESGA, on a team with a 0.500 team expected win% gets 41/3 points from this part. And if the team's expected win% was 0.450 after subtracting the player's on-ice GF and GA, he gets additional credit for (0.500 - 0.450)*82, or 4.1. My thoughts: The first part of the calculation (dividing team value) seems very generous. I get that you want to give value to all players in a way that adjusted plus-minus does not, which makes sense because NHL coaches only play players who are contributing, at least in the medium to long run. But the amount of value being divided like a lot. In baseball metrics of this type, it's common to subtract the win% of a team of replacement level players before dividing up value. That win% is probably lower in hockey than in baseball. We've seen teams like the 1992-93 Ottawa Senators that had some NHL players and still finished with an 0.150 winning percentage. But it might make sense to subtract 0.100 or 0.150 from the team win% before dividing up the value. I just calculated the career results for all players (since I have the numbers in a spreadsheet, this takes a minute). The numbers are slightly different than the ones you posted, probably because I used a different method to estimate individual ESGA and ESGF. I also included the two different parts of the calculations, labeled TeamEV and IndEV(for Individual EV). Not trying to name your stats for you, just giving the columns a label Looking at the results, I think it gives too much value for simply showing up. Defencemen dominate the results because they play more ice time, and the majority of the value in the metric comes from playing a lot for good teams. On the other hand, if we're comparing metrics, plus-minus (adjusted or not) penalizes players for below-average performance which isn't ideal. It's definitely another way of looking at things, and worth pursuing.
Thanks for calculating and posting that. I just came up with that in the last couple of days, and was surprised at the results.

I agree it seems to give players too much credit for "just showing up", especially defensemen who play a lot for good teams during long careers. Still, those tend to be very good players.

I'll have to take another look at this metric and try to improve it if possible.

A couple questions about your methodology: Is XEV+/- in a 200 ESG (or 400 ESGF + ESGA) environment? Do you adjust the XEV+/- to reflect how many ESGF + GA for which the player was on ice?

Also, could you post a top 50 Adjusted Plus-Minus without the regression to the mean? I think it might give some insight into how it could be improved, possibly without using an arbitrary "smoothing" factor. Thanks again for your help.

08-20-2011, 09:55 PM
#170
plusandminus
Registered User

Join Date: Mar 2011
Posts: 980
vCash: 500
This reply is to both overpass and czechyourmath.

Quote:
 Originally Posted by overpass ...checkyourmath...
Would it be possible for you to for example give the 2002-03 and/or 2010-11 values for comparison? Like listing the top 20 or so for each?

If I understand this right (I now nothing about baseball), guys having much ES ice time on teams that have good ES+/- would end up high?
Sort of:
 Poor team ES+/- Average team ES+/- Good team ES+/- Much ES icetime Average results Good results Very good results Average ES icetime Poor results Average results Good results Little ES icetime Very poor results Poor results Average results

If I'm right, I think we seem to have three components:
1. How good was the team?
2. How much did the player play on that team?
3. How to combine 1 and 2, to produce a result.
I think we'll end up with arbitrary formulas here too. For example, I think the exponent in the "pythogarian" calculation is arbitrary. We also more or less arbitrarily weight 1 and 2 against each other to produce the end result.
(This is not negatively meant, as I at first glance think the methods look promising.)

I analyzed the 2002-03 season pretty much, and had access to actual icetime, etc. Forsberg dominated the ES+/- very much, scored lots of ESpts, and was according to me seemingly in a class of his own that season ES wise. (See my thread about it, if you find it - it's about Naslund worthy of his MVP). I would expect Forsberg's ES domination to be shown here. (We all have our expectations. This is a case of one of mine.)

I did similar things as the ones mentioned here in this thread, doing them for PP and SH too, as well as combining them for an overall view. In Forsberg's case, he hardly played any SH, which made me think about handling those cases.

By calculating for example SH+/- per 60 min, I could rank the players. By dividing by league average, I could get values normalized to 0. A stat above 0 meant above league average, below 0 meant below league average. (Another problem was to handle the extreme results than small SH ice times might produce. So there an arbitraty "the lower, the more adjusted to the average" adjustment was needed.)
But what about those hardly playing SH at all? Should they really end up in the middle, having a stat of about 0? It didn't feel right.

I ended up thinking of a system basically combining "showing up" with "+/-" in those different situations.
But how?

One of the main problem I ended up with was "philosophical" thoughts about "What is best?" Below, when I write +/-, it's not the usual one, but a one for each situation. If during the time one plays SH, one is on the ice for 1 goals for, and 11 goals against, the player's SH+/- was -10.
For example, consider these SH stats:
A: 100 min SH, -9, = 11.11 min for every -1
B: 100 min SH, -10 = 10 min for every -1
C: 12 min SH, -1 = 12 min for every -1
D: 10 min SH, 0 = 10 min for every -1
How do we rank them? What "goodness" value should we give each player?

Each minute of SH played without allowing an SH goal is good and should be rewarded. For example, in a new system, give +1 for each such minute.
But what to we do with the minutes when a goal is being scored? A goal for should be rewarded by more than +1, perhaps +9, +10 or +11.
What about a goal against? Should one add something like -9, -10 or -11?
And then we comes up with a model that looks good in theory, and tries it out in practice and watch the results, just to find some strange unwanted effect.

I mention this because it can be used when dealing with ES too. Reward a player for every minute played, because it is an accomplishment just to play at NHL level without allowing a goal. Reward for goals forward. Deduct for goals against.
In theory I think it makes sense. In practice, it becomes harder.

Similar treatment could be done when "without" the player (team stats minus player stats).

I don't know how much I should attempt to guard myself regarding negative comments. I just cannot write all about everything, so it may appear a bit simplified.

08-20-2011, 11:25 PM
#171
Registered User

Join Date: Jan 2006
Location: bohemia
Country:
Posts: 4,841
vCash: 500
Quote:
 Originally Posted by plusandminus Would it be possible for you to for example give the 2002-03 and/or 2010-11 values for comparison? Like listing the top 20 or so for each?
I don't write code, nor do I have a comprehensive database, so I can't easily do what you ask. I will post separately some individual seasons for you.

Quote:
Originally Posted by plusandminus
If I understand this right (I now nothing about baseball), guys having much ES ice time on teams that have good ES+/- would end up high?
Sort of:
 Poor team ES+/- Average team ES+/- Good team ES+/- Much ES icetime Average results Good results Very good results Average ES icetime Poor results Average results Good results Little ES icetime Very poor results Poor results Average results
A couple points here. First, since ice time is not available for nearly as many years (and ES ice time to boot), I apportioned team value based purely on player's % of team's (ESGF + ESGA). Player's get nothing for showing up... unless there are goals scored while they're on the ice at even strength. They are assigned that portion of the team's "wins" based on how many ES goals for/against for which they were on the ice.

Second, player's gain/lose additional value based on whether their ES GF/GA ratio was better or worse than the team's. On a good team, a player will generally receive more from the team value component, but it's going to be harder for the player to perform better than the team and receive positive value from the player component.

Quote:
 Originally Posted by plusandminus If I'm right, I think we seem to have three components: 1. How good was the team? 2. How much did the player play on that team? 3. How to combine 1 and 2, to produce a result. I think we'll end up with arbitrary formulas here too. For example, I think the exponent in the "pythogarian" calculation is arbitrary. We also more or less arbitrarily weight 1 and 2 against each other to produce the end result. (This is not negatively meant, as I at first glance think the methods look promising.)
You basically have the first part of the metric correct.

The exponent in the pythagorean is somewhat arbitrary. The fact that there is an exponent, however, is NOT arbitrary. Also, there is evidence from other sports that ~2 is the right exponent. If there is a definitive study that shows a better exponent for the NHL (would probably have to be pre-lockout, so pre-shootout), then I have no objection to changing the exponent. In the absence of such a study, I would leave it at 2. I have read studies in the past that seemed less than conclusive.

Also, the two components (team and player) are not arbitrarily weighted against each other. Basically, the assumption for the purpose of this metric is that all value (wins) are attributed to even strength skaters. One measures the player's portion of the team's success as a whole (based solely on how MUCH he played). The other measures the player's portion of marginal success above/below the team average (based mainly on how WELL he played).

Both components (just like +/-) are about 5x higher than they should be in unit terms, since there are five skaters on the ice for each goal.

Using Forsberg 2003 for an example:

I calculated him being on ice for 94 ESGF and 38 ESGA.

Pythagorean formula gives them estimated ES win% of .670 which is 54.9 wins (note Colorado's total GF/GA estimates at .626 and actual was .640). Forsberg's on ice for 41.7% of the team's even strength goals for/against, so is credited with 22.9 of the team's 54.9 wins.

Deducting his on-ice ES goals from the team's ES totals gives 84 ESGF and 87 ESGA without him on the ice. So without him, using pythagorean formula, Colorado is estimated to have been a .483 win% team without him on the ice. Since their estimated ES win% with all players was .670, Forsberg is estimated to have added .187 to the team's ES win% through his stellar play. Multiplying .187 by 82 game season credits him with 15.3 wins.

Adding 22.9 and 15.3 gives 38.2 total wins. Wins is probably not the best unit, but want something to connote value. If you divided it by 5 (5 skaters on the ice), then 7.6 wins might be closer to reality.

Quote:
 Originally Posted by plusandminus I analyzed the 2002-03 season pretty much, and had access to actual icetime, etc. Forsberg dominated the ES+/- very much, scored lots of ESpts, and was according to me seemingly in a class of his own that season ES wise. (See my thread about it, if you find it - it's about Naslund worthy of his MVP). I would expect Forsberg's ES domination to be shown here. (We all have our expectations. This is a case of one of mine.)
I'm sure Forsberg will be at/near the top for 2003.

For that season I have him as 22.9 team + 15.3 player = 38.2 total
His next best season is 1999 at 27.4 total.

I calculated Naslund for this response, the others I already had calculated:

Naslund 19.7
Lidstrom 30.4
Lindros 18.5
Jagr 17.8
Lemieux 6.4 (.70 R-On, 1.10 R-Off)

I only developed this while thinking about Overpass' concern that his metric (adjusted plus-minus) tended to underrate players on great teams. My metric tends to do the opposite. I think they each have their flaws and it may not be possible to really eliminate those without compromising the whole metric, although I don't know that to be the case. I have a feeling that if you took his metric (and maybe eliminated the arbitrary exponent?) and my metric (and maybe deducted ~.12-.13 from the team EV win%?) and then averaged them you might come up with a list that was better than either at including most of the greats.

In making this post, I realized another other flaw (using Forsberg 2003 example). Player is double credited for some value. In this case, Forsberg is receiving 42% of team's ES value (wins), including those between wins between .483 and .670. He's also given full credit for the wins in that same range due to his performance. So this amount should be deducted from his total:

82 * .417 * (.670 - .483) = 82 * .417 * .187 = 6.4 wins

That's going to hurt the best players, especially those on weaker teams, which will probably make it even more important to use Overpass' suggestion of deducting replacement level win% from the team win% for the Team EV calculation. The worst team since expansion was 1975 Washington (.131 actual win % and .123 ES pythagorean win%). I guess I would deduct .125 win%, since that is between the two numbers and would mean that 1/4 of a .500 team's win% is not distributed to players for just showing up. This would give a metric of "ES value above replacement level".

Last edited by Czech Your Math: 08-21-2011 at 12:00 AM.

 08-21-2011, 02:08 AM #172 Czech Your Math Registered User     Join Date: Jan 2006 Location: bohemia Country: Posts: 4,841 vCash: 500 new & improved "ES value" I made the correction to prevent double counting of value. I have run some players' career ES Values with deductions for replacement level win% of 0, .125 and .250 (note: .250 is approx. the worst win% for non-expansion teams). Career ES Value --------------------- Bourque 451 Gretzky 446 Robinson 423 Lidstrom 413 Jagr 362 Potvin 318 Lafleur 282 Clarke 271 Orr 258 Lemieux 255 Lindros 232 Forsberg 210 Bossy 208 Ovechkin 136 Crosby 117 Career ES Value ARP .125 ------------------------ Bourque 365 Robinson 349 Gretzky 355 Lidstrom 336 Jagr 294 Potvin 260 Lafleur 233 Clarke 227 Orr 220 Lemieux 202 Lindros 191 Forsberg 176 Bossy 174 Ovechkin 111 Crosby 95 Career ES Value ARP .250 ------------------------- Bourque 280 Robinson 275 Gretzky 265 Lidstrom 258 Jagr 226 Potvin 201 Lafleur 185 Clarke 183 Orr 182 Lindros 149 Lemieux 148 Forsberg 142 Bossy 141 Ovechkin 85 Crosby 72 I'm not sure that even with a .250 team win% deduction (which is the most I would consider) it would really solve the issue of defensemen with long careers on good teams being possibly over-represented at the top of the list. One solution could be to separate d-men and forwards for comparison purposes. Another might be to use a player's ES points as a % of ESGF on-ice for, to give more talented offensive players their just due. Last edited by Czech Your Math: 08-21-2011 at 02:18 AM.
08-21-2011, 07:01 AM
#173
plusandminus
Registered User

Join Date: Mar 2011
Posts: 980
vCash: 500
Quote:
 Originally Posted by Czech Your Math A couple points here. First, since ice time is not available for nearly as many years (and ES ice time to boot), I apportioned team value based purely on player's % of team's (ESGF + ESGA). Player's get nothing for showing up... unless there are goals scored while they're on the ice at even strength. They are assigned that portion of the team's "wins" based on how many ES goals for/against for which they were on the ice.
OK.
I've studied correlations between ES icetime and ESGF+ESGA and they don't necessarily correspond very well, perhaps especially regarding forwards. For example, during 2002-03, I guy like Kowalchuk (who had a reputation as an offensive minded player) had about 1.5 times more ESGF+ESGA played than the average player.
So here we thus have our first bias. ESGF=60 and ESGA=50, will give higher "ice time" than ESGF=50 and ESGA=40, even if real ice time is the same. ?

Quote:
 Originally Posted by Czech Your Math Second, player's gain/lose additional value based on whether their ES GF/GA ratio was better or worse than the team's. On a good team, a player will generally receive more from the team value component, but it's going to be harder for the player to perform better than the team and receive positive value from the player component.
(OK, couldn't really include that easily in my two dimensial table.)
I'm a bit against dividing ESGF by ESGA. As i wrote in another reply some day ago, I think there are better ways to do it. To use the above example, I would say that 60-50 and 50-40 is equally good, despite the latter one getting a slightly higher ratio (if I understand you right).

Quote:
 Originally Posted by Czech Your Math The exponent in the pythagorean is somewhat arbitrary. The fact that there is an exponent, however, is NOT arbitrary. Also, there is evidence from other sports that ~2 is the right exponent. If there is a definitive study that shows a better exponent for the NHL (would probably have to be pre-lockout, so pre-shootout), then I have no objection to changing the exponent. In the absence of such a study, I would leave it at 2. I have read studies in the past that seemed less than conclusive.
I googled, and according to wikipedia I got the impression that rather 1.8 was the "right exponent", at least in baseball?

Wouldn't shootout goals be excluded from the stats?

Quote:
 Originally Posted by Czech Your Math Also, the two components (team and player) are not arbitrarily weighted against each other. Basically, the assumption for the purpose of this metric is that all value (wins) are attributed to even strength skaters. One measures the player's portion of the team's success as a whole (based solely on how MUCH he played). The other measures the player's portion of marginal success above/below the team average (based mainly on how WELL he played).
OK.
Getting as good "how much did he play" and "how good was he compared to team average" as possible seems important.
Also, to find the ideal way of combining them.

Quote:
 Originally Posted by Czech Your Math Both components (just like +/-) are about 5x higher than they should be in unit terms, since there are five skaters on the ice for each goal.
Yes. One might also want to consider that the lower times (games, minutes, etc) the players have, the more extreme their results tend to be.

Quote:
 Using Forsberg 2003 for an example: I calculated him being on ice for 94 ESGF and 38 ESGA. Colorado had 178 ESGF and 125 ESGA.
That's 94-38 = +56 with Forsberg. And 84-87 = -3 without.
Not only did he have the league's by far best ES+/-, and scored the highest amount of ESpts, on a team that without him (and the guys who were on the ice with him) had negative +/-.

Adding ESGF+ESGA, we get 94+38=132 for him. 178+125=303 for the team.
That's an "ice time" of 43.56 % according to ESGF+ESGA.

Quote:
 Originally Posted by Czech Your Math Pythagorean formula gives them estimated ES win% of .670 which is 54.9 wins (note Colorado's total GF/GA estimates at .626 and actual was .640). Forsberg's on ice for 41.7% of the team's even strength goals for/against, so is credited with 22.9 of the team's 54.9 wins.
I got 43.56 %, so at least one of us (perhaps I) may be wrong.

In reality, the correct answer seems to be 28.84 %. (if my data is correct)
So, the estimated percentage is about 1.5 times higher.

And I think three players with identical EStime, having ESGF-ESGA of 40-40 and 30-30 and 20-20 contributes equally much.

One can also study ESGF in itself, and ESGA in itself.
 What? ES ice time ESGF ESGA ESGF/time ESGA/time +/- /time Colorado without Forsberg .7116 84 87 118.04 122.26 -4.22 Colorado with Forsberg .2884 94 38 325.94 131.76 +194.18

Quote:
 Originally Posted by Czech Your Math Deducting his on-ice ES goals from the team's ES totals gives 84 ESGF and 87 ESGA without him on the ice. So without him, using pythagorean formula, Colorado is estimated to have been a .483 win% team without him on the ice. Since their estimated ES win% with all players was .670, Forsberg is estimated to have added .187 to the team's ES win% through his stellar play. Multiplying .187 by 82 game season credits him with 15.3 wins.
It would be interesting to this stat listed for all the players on a team.
(I can do it myself, but not right now.)
With the risk of being called an idiot, does the sum of all players equal 100??

Quote:
 Originally Posted by Czech Your Math Adding 22.9 and 15.3 gives 38.2 total wins. Wins is probably not the best unit, but want something to connote value. If you divided it by 5 (5 skaters on the ice), then 7.6 wins might be closer to reality.

Quote:
 I'm sure Forsberg will be at/near the top for 2003. For that season I have him as 22.9 team + 15.3 player = 38.2 total His next best season is 1999 at 27.4 total. I calculated Naslund for this response, the others I already had calculated: Naslund 19.7 Lidstrom 30.4 Lindros 18.5 Jagr 17.8 Lemieux 6.4 (.70 R-On, 1.10 R-Off)
Pittsburgh were a bit special that year, starting the season with some very good players, just to see them drop off one by one. So Mario's stats sank deeper and deeper during the season. (If I remember right.)

Quote:
 I only developed this while thinking about Overpass' concern that his metric (adjusted plus-minus) tended to underrate players on great teams. My metric tends to do the opposite. I think they each have their flaws and it may not be possible to really eliminate those without compromising the whole metric, although I don't know that to be the case. I have a feeling that if you took his metric (and maybe eliminated the arbitrary exponent?) and my metric (and maybe deducted ~.12-.13 from the team EV win%?) and then averaged them you might come up with a list that was better than either at including most of the greats.
A thought I have, is that one might want to seperate forwards and defencemen, since they may not be easily comparable.

An additional way of improving (or not) the method, could be to include ESpts in the calculations, to estimate how much different players contributed to their ESGF. Now I'm mainly thinking of doing it for forwards, to help seperate the offensive contributions of linemates, although it might be useful to apply (perhaps in a differnt form) to defencemen as well. To do something similar for ESGA would of course be basically impossible (unless one apply an assumption like "defencemen being more responsible for ESGA, while forwards being more responsible for ESGF").

Finally, which you likely are aware of, this stat only tells us about players' contributions during ES. So the "rankings" here are ES only.
Creating similar stats for PP and SH would rank players differently.
For example. While Forsberg had "much better" ES stats than Naslund in 2002-03, Naslund had better PP stats.

Last edited by plusandminus: 08-21-2011 at 01:39 PM.

08-21-2011, 08:26 AM
#174
overpass
Registered User

Join Date: Jun 2007
Posts: 3,907
vCash: 500
Quote:
 Originally Posted by Czech Your Math A couple questions about your methodology: Is XEV+/- in a 200 ESG (or 400 ESGF + ESGA) environment? Do you adjust the XEV+/- to reflect how many ESGF + GA for which the player was on ice? Also, could you post a top 50 Adjusted Plus-Minus without the regression to the mean? I think it might give some insight into how it could be improved, possibly without using an arbitrary "smoothing" factor. Thanks again for your help.
XEV+/- is in a 200 ESG environment. It's calculated using the player's (ESGF+ESGA) adjusted to the 200 ESG environment, holding that number constant, and applying the regressed R-OFF number. So yes, the number of ESGF+ESGA is a big factor in the player's XEV+/-.

 Player \$ESGF \$ESGA R-ON R-OFF XEV+/- EV+/- AEV+/- Ray Bourque 1933 1410 1.37 0.95 -83 523 606 Bobby Orr 1138 528 2.15 1.09 73 610 537 Jaromir Jagr 1630 1186 1.37 0.95 -76 444 519 Borje Salming 1375 1204 1.14 0.82 -258 172 430 Teemu Selanne 1255 1018 1.23 0.85 -184 237 421 Dave Taylor 997 763 1.31 0.84 -151 234 385 Eric Lindros 979 645 1.52 0.95 -45 334 379 Mark Howe 1065 718 1.48 0.97 -31 347 377 Marcel Dionne 1276 1166 1.09 0.81 -261 110 371 Mario Lemieux 1169 993 1.18 0.87 -147 176 323 Ron Francis 1573 1416 1.11 0.90 -162 157 319 Al MacInnis 1596 1130 1.41 1.12 150 467 317 Bobby Clarke 1028 574 1.79 1.20 142 454 312 Wayne Gretzky 2124 1721 1.23 1.05 98 404 305 Peter Forsberg 833 493 1.69 1.09 57 340 283 John Leclair 1007 674 1.49 1.07 58 333 275 Zigmund Palffy 700 570 1.23 0.80 -139 131 269 Larry Murphy 1771 1473 1.20 1.02 31 298 267 Nicklas Lidstrom 1795 1278 1.40 1.18 252 517 265 Joe Thornton 986 749 1.32 0.97 -27 237 265 Larry Robinson 1862 1166 1.60 1.33 433 696 263 Mike Bossy 830 464 1.79 1.18 105 365 260 Keith Tkachuk 1129 1026 1.10 0.87 -154 103 256 Steve Larmer 872 670 1.30 0.94 -45 202 247 Charlie Simmer 597 450 1.33 0.83 -98 147 245 Bryan Trottier 1223 823 1.49 1.17 160 400 240 Ron Stackhouse 1026 981 1.05 0.82 -193 45 238 Alex Ovechkin 578 402 1.44 0.90 -49 176 225 Jarome Iginla 1028 902 1.14 0.90 -99 126 225 Brian Rafalski 914 647 1.41 1.05 42 266 225 Chris Pronger 1189 976 1.22 0.99 -9 213 222 Lubomir Visnovsky 708 633 1.12 0.81 -139 75 214 Dmitri Khristich 693 542 1.28 0.90 -62 151 213 Michel Goulet 922 781 1.18 0.92 -69 141 210 Patrik Elias 831 565 1.47 1.09 57 267 209 Alex Tanguay 837 592 1.41 1.06 39 244 206 Joe Reekie 823 693 1.19 0.91 -74 130 203 Sergei Gonchar 1079 958 1.13 0.92 -82 121 203 Denis Potvin 1280 860 1.49 1.23 221 420 199 Guy Lafleur 1232 740 1.67 1.35 293 492 199 Brad McCrimmon 1192 830 1.44 1.18 163 362 199 Michael Nylander 799 650 1.23 0.93 -50 149 199 Milan Hejduk 873 656 1.33 1.03 20 217 197 Luc Robitaille 1382 1180 1.17 1.00 5 202 197 Sergei Fedorov 1160 873 1.33 1.09 91 287 196 Mike Foligno 758 633 1.20 0.90 -71 125 196 Steve Sullivan 752 606 1.24 0.93 -47 146 193 Brian Propp 873 595 1.47 1.12 85 277 192 Craig Ramsay 793 529 1.50 1.12 72 264 192 Marian Gaborik 570 454 1.26 0.87 -74 116 190

And here's the opposite, the top 50 with no team adjustment at all, just SH goals removed and scoring level adjusted.

 Player \$ESGF \$ESGA R-ON R-OFF XEV+/- EV+/- AEV+/- Larry Robinson 1862 1166 1.60 1.34 0 696 696 Bobby Orr 1138 528 2.15 1.09 0 610 610 Ray Bourque 1933 1410 1.37 0.95 0 523 523 Nicklas Lidstrom 1795 1278 1.40 1.18 0 517 517 Guy Lafleur 1232 740 1.67 1.35 0 492 492 Al Macinnis 1596 1130 1.41 1.12 0 467 467 Bobby Clarke 1028 574 1.79 1.20 0 454 454 Jaromir Jagr 1630 1186 1.37 0.95 0 444 444 Scott Stevens 1894 1452 1.31 1.19 0 443 443 Serge Savard 1385 959 1.44 1.52 0 425 425 Denis Potvin 1280 860 1.49 1.23 0 420 420 Wayne Gretzky 2124 1721 1.23 1.05 0 404 404 Steve Shutt 901 498 1.81 1.45 0 404 404 Bryan Trottier 1223 823 1.49 1.17 0 400 400 Brad Park 1398 998 1.40 1.20 0 400 400 Jacques Lemaire 945 565 1.67 1.51 0 379 379 Mike Bossy 830 464 1.79 1.18 0 365 365 Brad Mccrimmon 1192 830 1.44 1.18 0 362 362 Chris Chelios 1671 1312 1.27 1.18 0 359 359 Paul Coffey 1879 1522 1.23 1.21 0 357 357 Jean Ratelle 1052 699 1.51 1.27 0 353 353 Mark Howe 1065 718 1.48 0.97 0 347 347 Peter Forsberg 833 493 1.69 1.09 0 340 340 Eric Lindros 979 645 1.52 0.95 0 334 334 John Leclair 1007 674 1.49 1.07 0 333 333 Guy Lapointe 1123 799 1.41 1.66 0 324 324 Wayne Cashman 1018 702 1.45 1.43 0 316 316 Dallas Smith 1131 817 1.38 1.45 0 314 314 Yvan Cournoyer 797 488 1.63 1.50 0 308 308 Larry Murphy 1771 1473 1.20 1.02 0 298 298 Ken Hodge 873 581 1.50 1.35 0 292 292 Sergei Fedorov 1160 873 1.33 1.09 0 287 287 Bill Barber 767 490 1.57 1.50 0 278 278 Brian Propp 873 595 1.47 1.12 0 277 277 Patrik Elias 831 565 1.47 1.09 0 267 267 Brian Rafalski 914 647 1.41 1.05 0 266 266 Gary Roberts 1074 808 1.33 1.10 0 266 266 Scott Niedermayer 1339 1074 1.25 1.22 0 265 265 Craig Ramsay 793 529 1.50 1.12 0 264 264 Bill Hajt 925 665 1.39 1.19 0 260 260 Eric Desjardins 1210 955 1.27 1.25 0 255 255 Phil Esposito 1254 1002 1.25 1.25 0 253 253 Andre Dupont 841 592 1.42 1.33 0 250 250 Clark Gillies 744 495 1.50 1.33 0 249 249 Pavel Datsyuk 671 423 1.58 1.12 0 247 247 Alex Tanguay 837 592 1.41 1.06 0 244 244 Joe Nieuwendyk 1099 859 1.28 1.21 0 239 239 Joe Thornton 986 749 1.32 0.97 0 237 237 Sergei Zubov 1181 944 1.25 1.13 0 237 237 Teemu Selanne 1255 1018 1.23 0.85 0 237 237

Notice that the first list has 9 of the top 10 with an R-OFF below 1. On the second list, 9 of the top 10 have an R-OFF above 1.

 08-21-2011, 08:34 AM #175 overpass Registered User   Join Date: Jun 2007 Posts: 3,907 vCash: 500 I've edited the OP to add the most recent adjusted plus-minus numbers I have, using my current method of calculating, for anyone who is interested. Generally, the updated numbers are slightly more positive towards players on good teams, since the team adjustment is a little weaker. I've also changed the SHGF estimator to include actual SH points, so players who scored a lot of SH points have had those moved out of the ESGF column and into the SHGF column. This change is small to non-existent for most forwards, and probably only affects Paul Coffey and Mark Howe among defencemen. It has a major effect on Wayne Gretzky's numbers. matnor, I see what you mean about the Langway numbers being off. I don't have the 2008 numbers anymore to replace them with, so I just deleted him from the first table and included him in the second table. plusminus, I'm still planning to get to what you posted. Not ignoring you, just busy

Forum Jump

 Bookmarks

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules

All times are GMT -5. The time now is 09:58 PM.

monitoring_string = "e4251c93e2ba248d29da988d93bf5144"