By The NumbersHockey Analytics... the Final Frontier. Explore strange new worlds, to seek out new algorithms, to boldly go where no one has gone before.

First of all. A guy scores x amount of goals over a certain time frame in a specific set of circumstances. It is pure speculation rife with questionable assumptions to assume that he would score y amount of goals in a different time frame or set of circumstances.

This is the most pernicious misapprehension about adjusted statistics. It's not about transporting a player to a different time and place and assuming a particular level of production, it's just an adjustment to the scaling of the stats, to allow for more fair comparisons. That's all.

This is the most pernicious misapprehension about adjusted statistics. It's not about transporting a player to a different time and place and assuming a particular level of production, it's just an adjustment to the scaling of the stats, to allow for more fair comparisons. That's all.

But he has a good point, no?

Players have unique skill sets, often tailored for success in a given era. Adjusted stats immediately ignores that feature, doesn't it?

I'm not trying to be smart in my questioning by the way, honest questions.

Players have unique skill sets, often tailored for success in a given era. Adjusted stats immediately ignores that feature, doesn't it?

No, it doesn't. It doesn't take a player out of the context in which he played. It merely stretches or compresses that context a bit, to level the playing field so to speak.

Player stats are still based on the competition they actually played against, but the scale is tweaked a bit.

No, it doesn't. It doesn't take a player out of the context in which he played. It merely stretches or compresses that context a bit, to level the playing field so to speak.

Player stats are still based on the competition they actually played against, but the scale is tweaked a bit.

As you point out the player never actually accomplished what the adjusted stats claim. He never actually played the extra games or competed in the other era. The data is not real.

Levelling the playing field is the error. Means are not the best way to analyze human performance data. The error is several magnitudes greater than using a power curve. The method of calculating the imaginary data may be wrong.

This is the most pernicious misapprehension about adjusted statistics. It's not about transporting a player to a different time and place and assuming a particular level of production, it's just an adjustment to the scaling of the stats, to allow for more fair comparisons. That's all.

Questions remain - "Who defines fairness?" or the context of the comparison? Then there is the vague line between adjustments and projections.

Example Maurice Richard and his adjusted stats per HR.

Adjusted he gets credit for 6 fifty goal seasons. 1957-58, injured after his best career start,on a pace for app 115 points and 50 goals, the adjustments do not reflect this. Could be viewed as a projection but then context is lost.

Simply the methodology and definitions have to be tightened significantly to sustain accuracy and fairness.

There's no way of determining exactly how many goals/points a player would score in a different season. However, adjusted stats put the data in the context of value. 50 goals in a 7.5 gpg environment has equivalent value to 40 goals in a 6.0 gpg environment. You say KISS, and it's really that simple.

The principle is very simple, whether or not you choose to acknowledge it.

Did you account for the fact that there are 2.7-2.75 points awarded per goal? To say a player was involved in X% of the goals is not the same as saying that he was responsible for X% of the goals.

One or two extreme outliers are not going to make a huge difference in a 20+ team league. Yes, outliers (whether players or teams) do affect the mean. However, there have been much larger fluctuations in league gpg than can be attributed to one or two players, or even one or two teams.

Let's look at last season instead of speculating. Goals not points.

I dl'd all the skater stats from TSN.com. It appears that goalies weren't included.

I adjust nothing. I eliminate no data. So yes a sizeable number of players score very few points. I use the thinking presented in that study that a power curve applies and it also applies within smaller subsets. I use percentages. I hope my numbers and arithmetic are correct.

6 542 goals were scored by 892 players on 30 teams.
2 460 gp gives 5.31 gpg.
2 589 were scored by the top 10% (89) players. About 40% of the total.
That's a 2.1 gpg contribution.

The next 10% added 1 570 goals for a total of 4 159 goals. About 64% of the total. That's not quite the 20-80 rule but clearly the output of only 10% or 20% of the players is pretty dominant.

I looked at the top 10 players from Gretzky's 92 goal year. They scored about 9% of the league's goals that season. Last season the top ten goal scorers only scored about 6% of the league's totals. If the top 10 of last year had scored at the same pace as the top ten of 81-82 then we would have seen an increase in gpg in the NHL from 5.31 gpg to 5.43 gpg. I don't have the data to compare the top 10% or 20% of 81-82. But if the same multiplier were applied then the league would have seen a 1.2 gpg increase to over 6 gpg. Just from 20% of the players. The other 80% contribute nomore or less yet the average gs per player and game rises quite a bit.

578/6737=.0858
413/6542=.0631

The point is that a small number of players can affect the means. In other words 50 goals in a 7.75 gpg environment tells us nothing about it's value in a 6.0 gpg environment. More specifically 60 goals in a 8.02 season cannot be compared to 60 in a 5.31 season without looking at the impact of the offensive and defensive outliers in those seasons. Of course we haven't considered unbalanced schedules, pp, the effect of specific outliers in head to head competition etc. I'm just looking at a single variable. The effect of outliers.

Within the top ten scorers Gretzky's 92 represents .159 of the total in that season and 66 goals last season.
60 goals last year represents .145 of the total amongst the top 10 last season and 84 goals in 82-83.

Gretzky was more dominant compared to his peers in a season in which the top 10 scorers were more dominant against their peers than Stamkos was in a season in which the top 10 scorers were less dominant against their peers.

Taking outliers into account gives quite a different result than when one ignores them.

Using means-
In the 81-82 season 60 goals = .0089 of all gs. 92 = .0137
Last year 60 goals = .0092 of all gs.
.0137 = 89 gs last season while .0089 gives 60 goals in 81-82.

89 > 66 and 60 <84 of course. These results don't match up. It seems that when we take outliers into effect we add a context not present when we ignore them. Imagine the impact of dropping them from the data set altogether as some have proposed!

In your example using last seasons total gp 7.5 represents 9225 gs. 50 gs = .0054 of the total.

6 gpg = 7380 gs so 40gs = .0054 of the total.

But an important context is missing. How much of the difference in gpg was due to the impact of outliers? What if the 50 goal scorer accomplished this in a season in which the top scorers got 6 % of all gs but the 40 goal scorer accomplished his task in a season that his top ten scoring peers achieved 9% of the total gs?

6% = 554 of 9225, 50gs = .090
9% = 664 of 7380, 40gs = .060

These numbers are clearly not the same. The 50gs represents 60 gs in the 9% season while 40gs appears equivalent to 33 goals in the 6% season.

I understand that in different eras the opportunities to score were higher for some players than others but consider that looking within the outliers themselves as in the pct of gs by a player amongst the top 10 or top 10% whatever implicitly takes that into account. The top 10 scored x% but a player in that group scored y% of the groups total. In fact Gretzky (hypothetically) scores only 66 goals last season in that context as opposed to 89 when that context is removed. So

I've found that 92 has a value of 66 last season. I don't agree with that but I didn't set out to prove anything about Gretzky's 92 gs in the context of this year. IMHO that's not possible since we are talking about talent and people. I would argue that an average Gretzky would score 66 last year. Such was his talent. But if his ice time was reduced then perhaps that would be the result.

Don't get me started on goalie outliers which surely affect all this as well. I don't think looking at scorers alone tells the tale of two very different seasons.

I appreciate that I have been directed to sites to get data for dl-ing. TY to both of you. Sorry it took so long to respond. It took two hours to compose this post and I just can't do that every day. Hopefully I've added to the debate.

Edit: you would need to add 5 more goal scorers from last season to match the total of the top ten from 81-82. This means that last season with 30 teams would have to have 50% more players than the 1981-82 season with 21 teams for the top 10s to represent equal proportions of players.

Last edited by Dalton: 09-22-2012 at 11:40 AM.
Reason: Edit- see next post.

I looked at the top 10 players from Gretzky's 92 goal year. They scored about 9% of the league's goals that season. Last season the top ten goal scorers only scored about 6% of the league's totals. If the top 10 of last year had scored at the same pace as the top ten of 81-82 then we would have seen an increase in gpg in the NHL from 5.31 gpg to 5.43 gpg. I don't have the data to compare the top 10% or 20% of 81-82. But if the same multiplier were applied then the league would have seen a 1.2 gpg increase to over 6 gpg. Just from 20% of the players. The other 80% contribute nomore or less yet the average gs per player and game rises quite a bit.

578/6737=.0858
413/6542=.0631

I looked at the top 10 players from Gretzky's 92 goal year. They scored about 9% of the league's goals that season. Last season the top ten goal scorers only scored about 6% of the league's totals. If the top 10 of last year had scored at the same pace as the top ten of 81-82 then we would have seen an increase in gpg in the NHL from 5.31 gpg to 5.43 gpg. I don't have the data to compare the top 10% or 20% of 81-82. But if the same multiplier were applied then the league would have seen a 1.2 gpg increase to over 6 gpg. Just from 20% of the players. The other 80% contribute nomore or less yet the average gs per player and game rises quite a bit.

578/6737=.0858
413/6542=.0631

The point is that a small number of players can affect the means. In other words 50 goals in a 7.75 gpg environment tells us nothing about it's value in a 6.0 gpg environment. More specifically 60 goals in a 8.02 season cannot be compared to 60 in a 5.31 season without looking at the impact of the offensive and defensive outliers in those seasons. Of course we haven't considered unbalanced schedules, pp, the effect of specific outliers in head to head competition etc. I'm just looking at a single variable. The effect of outliers.

You're forgetting to consider the increase in league size from 1982 to 2012. 21 teams vs 30 teams. With 30 total teams, there will be many more games and thus many more total goals, so of course the top 10 players will score a significantly lower percentage of the total goals. A study like this needs to be based on average goals per game, not total goals.

Appreciate the effort - I do think the effect of outliers on the group is something worth studying

You're forgetting to consider the increase in league size from 1982 to 2012. 21 teams vs 30 teams. With 30 total teams, there will be many more games and thus many more total goals, so of course the top 10 players will score a significantly lower percentage of the total goals. A study like this needs to be based on average goals per game, not total goals.

Appreciate the effort - I do think the effect of outliers on the group is something worth studying

TY. That's another reason why I like percentages. 10% is 10% regardless of the size each group. It's unfortunate that I didn't have the data to use a percentage instead of a number to compare those seasons. But my point was to illustrate the value of looking at outliers. IMHO looking at and within the top 10 or top 10% of a single season should make that point.

As you point out the player never actually accomplished what the adjusted stats claim. He never actually played the extra games or competed in the other era. The data is not real.

You're rebutting things that have never been claimed. No one's claiming these things are true (or I suppose I should say, those who do claim it don't really know what they're talking about).

Again, it has nothing to do with assuming the player played in another era - that the pernicious misapprehension that needs to be squashed. The player's statistics are those compiled against the players he actually played against, looked at through a different lens.

Quote:

Originally Posted by Dalton

Levelling the playing field is the error. Means are not the best way to analyze human performance data.

Again, rebutting a non-existent claim. Who has said it's the best way? But it is one way, and for the great majority of players it works quite well.

Questions remain - "Who defines fairness?" or the context of the comparison?

Whoever's doing the work, surely. And then if the definition is off-base, the work is open to criticism.

Quote:

Originally Posted by Canadiens1958

Then there is the vague line between adjustments and projections.

That's a big thick black line. Adjustments are not projections, assuming you understand what each means.

Quote:

Originally Posted by Canadiens1958

Adjusted he gets credit for 6 fifty goal seasons. 1957-58, injured after his best career start,on a pace for app 115 points and 50 goals, the adjustments do not reflect this. Could be viewed as a projection but then context is lost.

No, if the adjustments did consider his injury, then it would be a projection (what might have happened if he had not been injured). The adjustment only considers what he did when he actually played, not what might have happened had circumstances been different.

As it is, he is not credited with 50 adjusted goals in 57/58, but 17.

And he has seven adjusted 50-goal seasons. However, I would recommend not focusing on arbitrary lines like 50 goals. There's no effective difference between a 50-goal season and a 49-goal season, and two 49-goal seasons is surely better than a 50-goal season and a 30-goal season.

Quote:

Originally Posted by Canadiens1958

Simply the methodology and definitions have to be tightened significantly to sustain accuracy and fairness.

Go ahead! If you've got some improvements feel free to suggest them. I don't personally use adjusted stats in my work because of the inherent limitations, which are fairly obvious but usually misstated. The biggest of which is the assumption that "average" does not change in meaning through NHL history. It obviously does not mean the same in a 6-team league as it does in a 30-team league.

You're rebutting things that have never been claimed. No one's claiming these things are true (or I suppose I should say, those who do claim it don't really know what they're talking about).

Again, it has nothing to do with assuming the player played in another era - that the pernicious misapprehension that needs to be squashed. The player's statistics are those compiled against the players he actually played against, looked at through a different lens.

Again, rebutting a non-existent claim. Who has said it's the best way? But it is one way, and for the great majority of players it works quite well.

Hopefully this forum sheds some light on what is being claimed and what isn't. It is not all easy to figure out what some of these adjustment methods are saying or even what methods you are referring to. Perhaps you need to include examples instead of talking in such sweeping terms.

It is or should be clear that simply using league gpg to compare players results over eras is in no way accurate. That is the method I'm debating.

I don't agree that one can state with any confidence that x goals in one era is equivalent to y goals in another. I think it's more accurate to look at players in diminishing subsets of their peers while looking at the performance of each subset within the whole set. My 'number' is actually a curve.

Let me try another example. I've said this before and I'd remind readers that I am not the best one to argue this POV. LOL

I've dl'd 2010-11 stats from NHL.com. Just skaters and all skaters. I'll compare to TSN's 2011-2012 data. I have to assume the data is accurate. I'll risk that my calculations are accurate. I'm 0 for 2 so far when posting spreadsheet results.

The league scored more goals in 2010-11 than last season. 6721 to 6542. Each season had a 50 goal scorer. Using means one would be tempted to lower the value of Perry's 50 since the gpg was higher in 2010-11.

But looking closer we can see that the top 5, 10 and 20% of skaters last season scored more than the same groups of the 2010-11 season. The top 5% of 2010-11 scored 21.5% of the league's goals. The 2011-12 group scored 23%. The Top 10% of skaters are 39.6% to 36.8% and the top 20% of skaters 63.6% to 60%. Both in favour of the 2011-12 season. Stamkos' 60 goals are not responsible but obviously contributed. However I notice that the difference increases as the groups get larger.

So while scoring went down over all last season the top 20% of skaters in 2011-12 actually out produced the top 20% of 2010-11 skaters. In this context the value of Perry's gs is higher than the value of Malkin's. It seems it was easier to score for top 20% of skaters last season rather than the previous season as gpg suggests. In fact Perry's 50 represent .0348 of the top 5%, then .0202 and .0124. Malkins 50 represent .0327, .0193 and .0120.

To use the mean to adjust 2010-11 downward would not acurately reflect what happened. Perry scored 50 in a season in which his peers scored less overall compared to Malkin who scored his 50 in a season that all his peers scored more. The value of Perry's 50 should be more than Malkin's not less.

I would argue that any formula that uses means ignores the effect of outliers and gives unreliable results compared to calculations that take outliers into account. If you are not talking about using means then I'm not sure how to respond to you since that is what I'm talking about.

The fact that Perry didn't score more this year simply reflects the value of taking these comparisons seriously. Scoring x goals in a season really has no bearing on what the player would score in another season or era. No matter how the data is presented. To state that gx goals in season sx is comparable to gy goals in season gy just isn't accurate. To say that player px in season sx compared similarily to his peers as player py in season sy has more meaning IMHO. In my example Perry performed at a higher level compared to his peers than Malkin. It appears that it was somewhat easier to score 50 goals last year then it was in the previous season. For what it's worth using percentages Perry would score 48.6 goals last season according the percentage of all the league's goals that he scored or maybe 53 just looking at his percentage of the top 5% of all skaters. Almost 5 goals difference. Adjusting raw data just doesn't work IMHO.

I should also point out that for about 70% of the players using a mean would not be acurate because they scored less as a subgroup of the league last year than the previous season. Applying means to adjust 2010-11 players to the lower scoring season of 2011-12 would not be accurate. Their results would be inflated compared to the 2011-12 reality just as Perry's results would be deflated. Means just don't work IMHO.

Hopefully this forum sheds some light on what is being claimed and what isn't. It is not all easy to figure out what some of these adjustment methods are saying or even what methods you are referring to. Perhaps you need to include examples instead of talking in such sweeping terms.

It is or should be clear that simply using league gpg to compare players results over eras is in no way accurate. That is the method I'm debating.

I don't agree that one can state with any confidence that x goals in one era is equivalent to y goals in another. I think it's more accurate to look at players in diminishing subsets of their peers while looking at the performance of each subset within the whole set. My 'number' is actually a curve.

Let me try another example. I've said this before and I'd remind readers that I am not the best one to argue this POV. LOL

I've dl'd 2010-11 stats from NHL.com. Just skaters and all skaters. I'll compare to TSN's 2011-2012 data. I have to assume the data is accurate. I'll risk that my calculations are accurate. I'm 0 for 2 so far when posting spreadsheet results.

The league scored more goals in 2010-11 than last season. 6721 to 6542. Each season had a 50 goal scorer. Using means one would be tempted to lower the value of Perry's 50 since the gpg was higher in 2010-11.

But looking closer we can see that the top 5, 10 and 20% of skaters last season scored more than the same groups of the 2010-11 season. The top 5% of 2010-11 scored 21.5% of the league's goals. The 2011-12 group scored 23%. The Top 10% of skaters are 39.6% to 36.8% and the top 20% of skaters 63.6% to 60%. Both in favour of the 2011-12 season. Stamkos' 60 goals are not responsible but obviously contributed. However I notice that the difference increases as the groups get larger.

So while scoring went down over all last season the top 20% of skaters in 2011-12 actually out produced the top 20% of 2010-11 skaters. In this context the value of Perry's gs is higher than the value of Malkin's. It seems it was easier to score for top 20% of skaters last season rather than the previous season as gpg suggests. In fact Perry's 50 represent .0348 of the top 5%, then .0202 and .0124. Malkins 50 represent .0327, .0193 and .0120.

To use the mean to adjust 2010-11 downward would not acurately reflect what happened. Perry scored 50 in a season in which his peers scored less overall compared to Malkin who scored his 50 in a season that all his peers scored more. The value of Perry's 50 should be more than Malkin's not less.

I would argue that any formula that uses means ignores the effect of outliers and gives unreliable results compared to calculations that take outliers into account. If you are not talking about using means then I'm not sure how to respond to you since that is what I'm talking about.

The fact that Perry didn't score more this year simply reflects the value of taking these comparisons seriously. Scoring x goals in a season really has no bearing on what the player would score in another season or era. No matter how the data is presented. To state that gx goals in season sx is comparable to gy goals in season gy just isn't accurate. To say that player px in season sx compared similarily to his peers as player py in season sy has more meaning IMHO. In my example Perry performed at a higher level compared to his peers than Malkin. It appears that it was somewhat easier to score 50 goals last year then it was in the previous season. For what it's worth using percentages Perry would score 48.6 goals last season according the percentage of all the league's goals that he scored or maybe 53 just looking at his percentage of the top 5% of all skaters. Almost 5 goals difference. Adjusting raw data just doesn't work IMHO.

I should also point out that for about 70% of the players using a mean would not be acurate because they scored less as a subgroup of the league last year than the previous season. Applying means to adjust 2010-11 players to the lower scoring season of 2011-12 would not be accurate. Their results would be inflated compared to the 2011-12 reality just as Perry's results would be deflated. Means just don't work IMHO.

Is there any rational basis for why it would be easier for elite players to score in 2011-12 (as compared to 2010-11), and yet more difficult for players as a whole?

If not, it's probably just randomness.

Last edited by Master_Of_Districts: 09-23-2012 at 02:49 AM.

It is or should be clear that simply using league gpg to compare players results over eras is in no way accurate.

Accurate? Fair is a better word, I would suggest. There's no accuracy involved.

Quote:

Originally Posted by Dalton

I don't agree that one can state with any confidence that x goals in one era is equivalent to y goals in another.

You can say it with a good degree on confidence. The lower the average goals per game, the greater value each goal has in terms of winning games, which of course is the point of scoring goals in the first place. The number of goals required to add a win for an average team is easily calculated.

Quote:

Originally Posted by Dalton

But looking closer we can see that the top 5, 10 and 20% of skaters last season scored more than the same groups of the 2010-11 season. The top 5% of 2010-11 scored 21.5% of the league's goals. The 2011-12 group scored 23%. The Top 10% of skaters are 39.6% to 36.8% and the top 20% of skaters 63.6% to 60%. Both in favour of the 2011-12 season. Stamkos' 60 goals are not responsible but obviously contributed. However I notice that the difference increases as the groups get larger.

You'll need some more work to develop this idea. Have you considered whether forwards and defencemen should be analyzed separately? Have you considered the effect of ice time (should even out over a large enough number of players, but you never know)?

Quote:

Originally Posted by Dalton

To use the mean to adjust 2010-11 downward would not acurately reflect what happened.

Again, the only accurate thing was what actually happened. I can't see how you can call any adjustment accurate. The adjustment is not to make the numbers more "accurate", just more comparable, or perhaps more meaningful.

Quote:

Originally Posted by Dalton

Perry scored 50 in a season in which his peers scored less overall compared to Malkin who scored his 50 in a season that all his peers scored more. The value of Perry's 50 should be more than Malkin's not less.

Since goals were easier to come by in 2010/11, arguably Perry's 50 goals were worth less than Malkin's, since Malkin's did more to help his team win.

Quote:

Originally Posted by Dalton

I would argue that any formula that uses means ignores the effect of outliers and gives unreliable results compared to calculations that take outliers into account. If you are not talking about using means then I'm not sure how to respond to you since that is what I'm talking about.

That's why I said it works well for the great majority of players. Outliers do not fall within the great majority.

Quote:

Originally Posted by Dalton

To state that gx goals in season sx is comparable to gy goals in season gy just isn't accurate.

You keep saying that, and I still don't know what you mean. Since we're not dealing with true values, I don't see how accuracy is an issue.

However, again, comparing goals in one season, where it takes 5.00 goals to earn a win for your team to another season where it takes 4.00 goals to win a game is certainly valid. The easier goals are to come by, the less value they have in winning games.

Quote:

Originally Posted by Dalton

For what it's worth using percentages Perry would score 48.6 goals last season according the percentage of all the league's goals that he scored or maybe 53 just looking at his percentage of the top 5% of all skaters. Almost 5 goals difference. Adjusting raw data just doesn't work IMHO.

Almost five goals difference is almost nothing. If you assign a large degree of confidence to a 53-goal player being better than a 48-goal player, you're making too fine a distinction.

I can certainly see how from your perspective, your method is more accurate. That does not mean that using means is not accurate, it just means it's somewhat less accurate.

Quote:

Originally Posted by Dalton

I should also point out that for about 70% of the players using a mean would not be acurate because they scored less as a subgroup of the league last year than the previous season. Applying means to adjust 2010-11 players to the lower scoring season of 2011-12 would not be accurate. Their results would be inflated compared to the 2011-12 reality just as Perry's results would be deflated. Means just don't work IMHO.

Inflated by what amount? Is it large enough that mean-based analysis is completely usless? I doubt it.

Is there any rationale basis for why it would be easier for elite players to score in 2011-12 (as compared to 2010-11), and yet more difficult for players as a whole?

If not, it's probably just randomness.

Possibly, and this is a danger of only using two seasons' worth of data.

I can believe, in a situation where goal-scoring as a whole declines, it declines less for the elite players than for the scrubs. But it is incongruous for one to go up while the other goes down, except for an effect of the distribution of ice time, and especially PP time, which is possible.

Is there any rational basis for why it would be easier for elite players to score in 2011-12 (as compared to 2010-11), and yet more difficult for players as a whole?

If not, it's probably just randomness.

I am not trying to explain it. I am just pointing it out. For my purposes the reason why doesn't really matter.

Accurate? Fair is a better word, I would suggest. There's no accuracy involved.

You can say it with a good degree on confidence. The lower the average goals per game, the greater value each goal has in terms of winning games, which of course is the point of scoring goals in the first place. The number of goals required to add a win for an average team is easily calculated.

You'll need some more work to develop this idea. Have you considered whether forwards and defencemen should be analyzed separately? Have you considered the effect of ice time (should even out over a large enough number of players, but you never know)?

Again, the only accurate thing was what actually happened. I can't see how you can call any adjustment accurate. The adjustment is not to make the numbers more "accurate", just more comparable, or perhaps more meaningful.

Since goals were easier to come by in 2010/11, arguably Perry's 50 goals were worth less than Malkin's, since Malkin's did more to help his team win.

That's why I said it works well for the great majority of players. Outliers do not fall within the great majority.

You keep saying that, and I still don't know what you mean. Since we're not dealing with true values, I don't see how accuracy is an issue.

However, again, comparing goals in one season, where it takes 5.00 goals to earn a win for your team to another season where it takes 4.00 goals to win a game is certainly valid. The easier goals are to come by, the less value they have in winning games.

Almost five goals difference is almost nothing. If you assign a large degree of confidence to a 53-goal player being better than a 48-goal player, you're making too fine a distinction.

I can certainly see how from your perspective, your method is more accurate. That does not mean that using means is not accurate, it just means it's somewhat less accurate.

Inflated by what amount? Is it large enough that mean-based analysis is completely usless? I doubt it.

A human resources study that I've referred to has demonstrated that means have an error several magnitudes above a method that looks at the influence of outliers. One of the groups they studied was hockey players. Goals by RWs IIRC.

I think the fact that a small percentage of players are responsible for so much of the output falls in line with the conclusions of the HR study I've referenced. It says nothing at all about methods that don't use or rely on means or bell curving.

I think breaking things down by position, TOI, etc is premature at this point. I am slowly collecting data to compare seasons and see if my idea has value in comparing players across seasons and eras. Also I was just looking at goals so IMHO all skaters must be accounted for.

I am not trying to explain it. I am just pointing it out. For my purposes the reason why doesn't really matter.

It very much does matter, if the reasons turns out to be simple random variance.

Quote:

Originally Posted by Dalton

I think breaking things down by position, TOI, etc is premature at this point. I am slowly collecting data to compare seasons and see if my idea has value in comparing players across seasons and eras.

If you're still collecting data perhaps you should stay away from declarations that other methods don't work, and your idea is better. With more data perhaps your method won't work either.

TOI may be important because you're working with raw numbers, which are heavily influenced by ice time. Have you at least considered games played, if not ice time? If your top 5% sees their percentage of goals increase by 3% from the previous year, and it turns out that the same group also received 3% more ice time (or played 3% more games) than the previous year, that explains something.

Possibly, and this is a danger of only using two seasons' worth of data.

I can believe, in a situation where goal-scoring as a whole declines, it declines less for the elite players than for the scrubs. But it is incongruous for one to go up while the other goes down, except for an effect of the distribution of ice time, and especially PP time, which is possible.

This is why, for comparing elite scorers, some people use vs2 (comparing calculating the player's total as a percentage of the 2nd best scorer in the league) instead of adjusted points. You can also use vs5, if you think #2 fluctuates too much

With respect to adjusted scoring mean-based, I don't think that's actually accurate. References to bell curves are out of place as well. Adjusted scoring does not care about the distribution of goals among players. If it's a bell curve or a power-law curve, the same adjustment will be applied.

For instance:

Here the actual results are in blue. The red is the curve transformed by traditional adjustment (assuming the scoring level is adjusted downward). The purple curve is what should happen with percentile-based adjustment, where the best players are less affected than other players.

Adjusted scoring does not assume a bell curve, and it doesn't use a bell curve. Its adjustments are distribution-neutral.

It very much does matter, if the reasons turns out to be simple random variance.

If you're still collecting data perhaps you should stay away from declarations that other methods don't work, and your idea is better. With more data perhaps your method won't work either.

TOI may be important because you're working with raw numbers, which are heavily influenced by ice time. Have you at least considered games played, if not ice time? If your top 5% sees their percentage of goals increase by 3% from the previous year, and it turns out that the same group also received 3% more ice time (or played 3% more games) than the previous year, that explains something.

This looks like it might explain the effect.

Dalton makes much of the fact that the league's elite players scored more in 2011-12 as compared to 2010-11, even though the league as a whole saw a slight decrease in scoring. And he's right - the top 100 players collectively scored 5949 points in 2011-12 versus 5888 in 2010-11.

But the top 100 players also collectively played more games in 2011-12: 7754 versus 7668. So if you look at points per game, there is no significant difference between the two groups (0.767 versus 0.768).

To be fair, scoring as a whole decreased in 2011-12, so elite players still scored more proportionately.

But the difference is easily explained by random variation.

Dalton makes much of the fact that the league's elite players scored more in 2011-12 as compared to 2010-11, even though the league as a whole saw a slight decrease in scoring. And he's right - the top 100 players collectively scored 5949 points in 2011-12 versus 5888 in 2010-11.

But the top 100 players also collectively played more games in 2011-12: 7754 versus 7668. So if you look at points per game, there is no significant difference between the two groups (0.767 versus 0.768).

To be fair, scoring as a whole decreased in 2011-12, so elite players still scored more proportionately.

But the difference is easily explained by random variation.

Random variation when the sample is 100 players vs the rest of the league (600 or so more players)? I don't expect some big revelation about why top scorers in 2011-12 maintained their pace while the rest of the league declined somewhat, but that's a large number of players to be random.

Random variation when the sample is 100 players vs the rest of the league (600 or so more players)? I don't expect some big revelation about why top scorers in 2011-12 maintained their pace while the rest of the league declined somewhat, but that's a large number of players to be random.

It looks like I misread your post when I read it the first time.

And in thinking about things more carefully, perhaps it's more of an issue then I acknowledged.

It's not so much that the subset of each group (100 players) is too large to preclude random variation as an explanation - it's that those two groups accounted for a disproportionate percentage of league scoring (~40%).

With respect to adjusted scoring mean-based, I don't think that's actually accurate. References to bell curves are out of place as well. Adjusted scoring does not care about the distribution of goals among players. If it's a bell curve or a power-law curve, the same adjustment will be applied.

For instance:

Here the actual results are in blue. The red is the curve transformed by traditional adjustment (assuming the scoring level is adjusted downward). The purple curve is what should happen with percentile-based adjustment, where the best players are less affected than other players.

Adjusted scoring does not assume a bell curve, and it doesn't use a bell curve. Its adjustments are distribution-neutral.

Thank FSM that Iain is here, talking to Dalton.

This is the exact post I had buried in my logic somewhere but was unable to properly demonstrate or verbalize.