HFBoards

Go Back   HFBoards > General Hockey Discussion > By The Numbers
Mobile Hockey's Future Become a Sponsor Site Rules Support Forum vBookie Page 2
By The Numbers Hockey Analytics... the Final Frontier. Explore strange new worlds, to seek out new algorithms, to boldly go where no one has gone before.

Adjusted stats - how valuable?

Reply
 
Thread Tools
Old
10-23-2012, 09:34 PM
  #76
Czech Your Math
Registered User
 
Czech Your Math's Avatar
 
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 3,699
vCash: 500
I did a study of a fixed group of players and examined how the adjusted PPG changed from season to season for a median group within that fixed group.

I'd like to see someone do a similar study with a fixed group of goalies (using a min. career adjusted GP or something), looking at their adjusted GAA from season to season. That would help in determining how quality of goaltending has changed over the years.

Czech Your Math is offline   Reply With Quote
Old
10-24-2012, 10:50 AM
  #77
Dalton
Registered User
 
Dalton's Avatar
 
Join Date: Aug 2009
Location: Ho Chi Minh City
Country: Vietnam
Posts: 2,096
vCash: 500
Quote:
Originally Posted by Iain Fyffe View Post
This is a joke, right? It has to be. Please tell me it is.

Look at the red line in my graph. That's the actual number of goals scored. And yet it also has this "bell curve" that you see. Notice that in this case, the adjusted stats actually reduce the apparent bell curve. Which is, of course, not a bell curve at all, but a figment created by a relatively small sample size.

Please, please, please explain how the adjusted scoring method used by HR "adds a bell curve to a power curve". Demonstrate how it does that. You keep saying it, without showing how. Tell us. Stop asserting and start proving.

If you see a bell curve, it's apparently because you want to see a bell curve. You accuse my analysis of being biased, but your bias is showing in great big neon letters.


If you drop a very large number of observations at the bottom and a few at the top, yes you probably could as it happens for this set of data. But of course, this would also happen if you did the same to the raw data. As such it has nothing to do with the adjusted scoring method, and would merely be a reflection of intentionally-misleading data manipulation.

If you define "outlier" broadly enough, you can turn any power curve into a bell curve. But that's disingenuous. It's not what "outlier" means. If you think that you can remove a few outliers and transform my graph into a normal distribution, I fear you don't know what outlier means. If you consider the mode of a population (the most frequent observation) to be an outlier, you're going to get wonky results.

A definition of an outlier provided by a statistician is "An outlying observation...is one that appears to deviate markedly from other members of the sample in which it occurs." (Emphasis added). It does not mean a "tail value". That is, you can't just remove the zero values in my graph as outliers, because they clearly do not deviate markedly from other members of the sample - they are in fact the most common member of the sample. The bottom and top values are not automatically outliers.

As such, you assertion has no merit.

I think we're done here.
Look closely at your graph. It follows the raw data at the extremes of course. First it clearly decreases compared to the raw data. That means your data takes on smaller values. Next it clearly increases compared to the raw data. That means your data takes on bigger values. Then it decreases compared to the raw data. That means your data takes on smaller values. A bell curve. Lower values then higher values then lower values.

Its fricken obvious man. This is the kind of behaviour one would expect when applying a function based on a bell curve ( normalization) to another curve. You equation is based on average scoring as if the distribution of goals amongst the players was uniform. I've already shown that it isn't. I think most of us including you knew that already without math.

QED. Prove me wrong. Observing the behaviour of graphs is math.

I know the verbal argument. Use math.

Edit: A single season could show any behaviour.


Last edited by Dalton: 10-24-2012 at 10:57 AM.
Dalton is offline   Reply With Quote
Old
10-24-2012, 11:00 AM
  #78
Iain Fyffe
Hockey fact-checker
 
Iain Fyffe's Avatar
 
Join Date: Feb 2009
Location: Fredericton, NB
Country: Canada
Posts: 2,795
vCash: 500
Quote:
Originally Posted by Dalton View Post
I know the verbal argument. Use math.
I'll obviously have more to say on this later, but for now I'll just say this:

Ditto. I know the verbal argument. Use math.

Show that HR's adjusted scoring applies a normalization. You keep saying it does, without demonstrating that is actually does. Show us the math. Show us the normalization.

Stop asking for something you're not willing to do yourself.

Iain Fyffe is offline   Reply With Quote
Old
10-24-2012, 11:00 AM
  #79
Dalton
Registered User
 
Dalton's Avatar
 
Join Date: Aug 2009
Location: Ho Chi Minh City
Country: Vietnam
Posts: 2,096
vCash: 500
Quote:
Originally Posted by Iain Fyffe View Post
That would be nonsense, if that's what adjusted scoring said. But it's not.


Then do it already, and be prepared to demonstrate that the results are "better" than adjusted scoring. I'd wager they're simply not going to be much different.
I have and they are. Read the thread.

Dalton is offline   Reply With Quote
Old
10-24-2012, 12:43 PM
  #80
barneyg
HFBoards Sponsor
 
Join Date: Apr 2007
Posts: 2,282
vCash: 500
Quote:
Originally Posted by Dalton View Post
Look closely at your graph. It follows the raw data at the extremes of course. First it clearly decreases compared to the raw data. That means your data takes on smaller values. Next it clearly increases compared to the raw data. That means your data takes on bigger values. Then it decreases compared to the raw data. That means your data takes on smaller values. A bell curve. Lower values then higher values then lower values.

Its fricken obvious man. This is the kind of behaviour one would expect when applying a function based on a bell curve ( normalization) to another curve. You equation is based on average scoring as if the distribution of goals amongst the players was uniform. I've already shown that it isn't. I think most of us including you knew that already without math.


If it's uniform it's not normal. There is no normalization. Unless you're mathematically illiterate, you can easily convince yourself by looking at the damn formula.

Adj(1952-53) = Raw(1952-53) * F(1952-53)

Where's the "normalization" here?

Any shape in the original distribution will be reproduced in the adjusted distribution. There is no bell curve. If Iain had 2 graphs, one with the original data and the original "bins" (16-19 etc), and a 2nd one with adjusted data and scaled bins (16*F to 19*F etc), your bell curve hallucination would vanish.

barneyg is offline   Reply With Quote
Old
10-24-2012, 01:33 PM
  #81
Iain Fyffe
Hockey fact-checker
 
Iain Fyffe's Avatar
 
Join Date: Feb 2009
Location: Fredericton, NB
Country: Canada
Posts: 2,795
vCash: 500
Quote:
Originally Posted by Dalton View Post
Prove me wrong.
Prove yourself right first. You're making a specific claim about adjusted scoring, it's up to you to prove that it's right.

All NHL games are decided by invisible ice gnomes who magically control where the puck goes. Prove me wrong.

Iain Fyffe is offline   Reply With Quote
Old
10-24-2012, 01:35 PM
  #82
Iain Fyffe
Hockey fact-checker
 
Iain Fyffe's Avatar
 
Join Date: Feb 2009
Location: Fredericton, NB
Country: Canada
Posts: 2,795
vCash: 500
Quote:
Originally Posted by Dalton View Post
I have and they are. Read the thread.
Where? I see a couple of examples you've calculated, but I see nothing systematic.

Iain Fyffe is offline   Reply With Quote
Old
10-24-2012, 03:47 PM
  #83
Iain Fyffe
Hockey fact-checker
 
Iain Fyffe's Avatar
 
Join Date: Feb 2009
Location: Fredericton, NB
Country: Canada
Posts: 2,795
vCash: 500
Quote:
Originally Posted by Dalton View Post
Look closely at your graph. It follows the raw data at the extremes of course.
No it doesn't. At the zero end it follows very closely, because the adjustment essentially applies a coefficient to the raw data (which is, of course, not a normalization), and the lower the raw value, the less effect the coefficient has.

The adjusted curve actually fits the worst at the upper extreme, because at higher raw values the coefficient will have a greater effect.

That is, 1 times 1.1 is 1.1, a change of 0.1 (which will show as no change in adjusted scoring, which displays results in integers), while 50 times 1.1 is 55, a change of 5.0.

Quote:
Originally Posted by Dalton View Post
First it clearly decreases compared to the raw data. That means your data takes on smaller values. Next it clearly increases compared to the raw data. That means your data takes on bigger values.
And if it happened to be a season that generally decreased raw values in the adjustment (such as the early 80s), you'd see the adjusted data take on larger values, then smaller ones. If same-smaller-larger-smaller is a bell curve, then what is larger-smaller? If the former is the result of the adjustment applying a bell curve (it's not), what is the latter a result of?

Quote:
Originally Posted by Dalton View Post
Then it decreases compared to the raw data. That means your data takes on smaller values. A bell curve. Lower values then higher values then lower values.
You have, of course, left off the final instance where the adjusted data is higher than the raw. Meaning that even from this bizarre perspective, it's lower, then higher, then lower, then higher. How is that a bell curve?

Quote:
Originally Posted by Dalton View Post
Its fricken obvious man.
If it's so fricken obvious man, you should be able to give us a simple function to demonstrate that it's true. Something, anything other than an assertion.

If you respond to nothing else, please finally answer this question: how does adjusted scoring normalize scoring numbers? You have still not shown this to be true.

Quote:
Originally Posted by Dalton View Post
You equation is based on average scoring as if the distribution of goals amongst the players was uniform.
No, it doesn't. The adjustment doesn't care about distribution. It applies a coefficient to a player's raw totals to get an adjusted one. The coefficient varies a small amount from player to player, but it is not calculated to move players closer to the mean, which is what normalization means.

There are times that normalization is used in hockey analysis (regressing single-season scoring percentages to the mean, for example). But adjusted stats are not one of those times.

Quote:
Originally Posted by Dalton View Post
I've already shown that it isn't. I think most of us including you knew that already without math.
The distribution or raw stats is irrelevant to adjusted stats. So this is a moot point.

If adjusted scoring actually normalized the data, then players with low goal totals would be adjusted upward, and players with high goal totals would be adjusted downward, both toward the mean. This does not happen.

In general terms, in high-scoring-environment seasons, all players have their totals adjusted downward (not toward the mean, just down regardless of where the mean is), and in low-scoring-environment seasons, all players have their totals adjusted upward (not toward the mean, just up regardless of where the mean is).

This is not normalization.

Iain Fyffe is offline   Reply With Quote
Old
10-24-2012, 03:50 PM
  #84
Iain Fyffe
Hockey fact-checker
 
Iain Fyffe's Avatar
 
Join Date: Feb 2009
Location: Fredericton, NB
Country: Canada
Posts: 2,795
vCash: 500
Quote:
Originally Posted by Dalton View Post
Edit: A single season could show any behaviour.
And yet, you're using this single season to "prove" that adjusted scoring adds a bell curve to the raw data.

If a single season can show any behaviour, you're going to need a hell of lot more than that to prove your assertion.

Iain Fyffe is offline   Reply With Quote
Old
10-24-2012, 08:42 PM
  #85
Morgoth Bauglir
Master Of The Fates
 
Morgoth Bauglir's Avatar
 
Join Date: Aug 2012
Location: Angband via Utumno
Posts: 3,258
vCash: 500
Quote:
Originally Posted by Iain Fyffe View Post
All NHL games are decided by invisible ice gnomes who magically control where the puck goes. Prove me wrong.
I have to admit I spit my drink when I read that

Morgoth Bauglir is offline   Reply With Quote
Old
10-27-2012, 06:04 AM
  #86
Dalton
Registered User
 
Dalton's Avatar
 
Join Date: Aug 2009
Location: Ho Chi Minh City
Country: Vietnam
Posts: 2,096
vCash: 500
Quote:
Originally Posted by Iain Fyffe View Post
And yet, you're using this single season to "prove" that adjusted scoring adds a bell curve to the raw data.

If a single season can show any behaviour, you're going to need a hell of lot more than that to prove your assertion.
That's absurd. I present an argument, you present a single example you purport to refute my argument. I show your data is in error and you think I have something to prove?

Dalton is offline   Reply With Quote
Old
10-27-2012, 06:22 AM
  #87
Dalton
Registered User
 
Dalton's Avatar
 
Join Date: Aug 2009
Location: Ho Chi Minh City
Country: Vietnam
Posts: 2,096
vCash: 500
Quote:
Originally Posted by Iain Fyffe View Post
Prove yourself right first. You're making a specific claim about adjusted scoring, it's up to you to prove that it's right.

All NHL games are decided by invisible ice gnomes who magically control where the puck goes. Prove me wrong.
I can't prove anything to someone who has made up their mind. Read the study. NHL goal scoring data and penalties were used to prove the author's thesis. They proved it. Prove them wrong. You are just ignoring everything I say anyway.

"The researchers amassed a database of more than 600,000 individuals and conducted separate studies applying normal and power-law distributions to assess performers in four carefully chosen fields:

Academics in 50 disciplines, based on publishing frequency in the most pre-eminent discipline-specific journals.
Entertainers, such as actors, musicians and writers, and the number of prestigious awards, nominations or distinctions received.
Politicians in 10 nations and election/re-election results.
Collegiate and professional athletes looking at the most individualized measures available, such as the number of home runs, receptions in team sports and total wins in individual sports.

"We saw a clear and consistent power-law distribution unfold in each study, regardless of how narrowly or broadly we analyzed the data," said Aguinis, who also is director of the Institute for Global Organizational Effectiveness at Kelley. "For example, with the athletes we could look at performance within leagues, within teams or specific positions, but the shape of the distribution was constant."

Aguinis and O'Boyle believed that the power-law distribution would also identify outliers at the other end of the performance spectrum -- those likely to engage in unethical or illegal behavior. "Counterproductive work behaviors" are often covert and thus challenging to assess, so they again used sports samples, examining such elements as yellow cards in soccer and first-base errors in baseball to find negative performance largely attributable to an individual. Here too, the results conformed to the power law.

"All five of our studies suggest that organizational success depends on tending to the few who fall at the 'tails' of this distribution, rather than worrying too much about the productivity of the 'necessary many' in the middle," Aguinis said. "


http://info.music.indiana.edu/news/p...mal/21237.html

Your bumpy curve does not conform to the power curve of the raw data. It is not a simple translation of the raw data. It is a different curve. The end points don't even conform.

Stop arguing with me and try out the methods the author suggested. I'm just a messenger.

Dalton is offline   Reply With Quote
Old
10-27-2012, 07:27 AM
  #88
Dalton
Registered User
 
Dalton's Avatar
 
Join Date: Aug 2009
Location: Ho Chi Minh City
Country: Vietnam
Posts: 2,096
vCash: 500
Comparing 2011/12 Stamkos 60 gs to Selanne's 52 in 1997/98.

First off IMHO 60 gs is an outstanding, rare achievement. Few have done this. I consider it irrelevant if stats show that it was easier to do in this year than that year. I'd bet the same analysis would reveal how many failed to do so under equal circumstances.

I am ignoring the psychological value of the number and simply comparing the performance of both players to their peers in specific subsets.

52 goals represents .041% the production of the top 5% of players who scored 22.5% of the leagues goals that season.

60 gs represents .040% of the production of the top 5% of players who scored 23.0% of the leagues goals that season.

They look pretty equivalent.

10%- Stamkos .024% of .39%
Selanne .025% 0f .37%

Still appear equivalent but a pattern is emerging. Selanne has scored a higher percentage of fewer goals.

20%- Stamkos has .015% of 63%
Selanne has .016% of 59%

Does this make sense?

In the Selanne season 21 players scored at least 30 goals, 75 scored at least 20. 763 players appeared in 38,363 games scoring 5,624 goals. His productivity is comparable to 63 goals amongst the top 10% in Stamkos' season.

In the Stamkos season 31 players scored at least 30 goals, 102 scored at least 20. 894 players appeared in 44,268 games scoring 6,542 goals. His productivity is equivalent to 49 goals amongst the top 10% in Selanne's season.

Selanne was more productive than Stamkos among the top 10% of scorers. It will differ somewhat within 5%, 20% and other subsets of top performers.

What does adjusted scoring say about these two seasons?

I haven't looked at goalies at all, let alone schedules and opponents.

Whatever else this debate is about we can clearly do much better than adjusted stats with respect to analyzing raw data. Adjusted stats are a construct with only the slimmest connection to reality. That connection is just that raw data is the independant variable. Any old wild formula will bear some resemblance to the input. Adjusted data has some careful thought put into it but clearly it is off. Work needs to be done.

Caveat: I tend to make errors. Hopefully I haven't done that here. Feel free to check my calculations.

Dalton is offline   Reply With Quote
Old
10-27-2012, 08:08 AM
  #89
Dalton
Registered User
 
Dalton's Avatar
 
Join Date: Aug 2009
Location: Ho Chi Minh City
Country: Vietnam
Posts: 2,096
vCash: 500
Quote:
Originally Posted by Iain Fyffe View Post
No it doesn't. At the zero end it follows very closely, because the adjustment essentially applies a coefficient to the raw data (which is, of course, not a normalization), and the lower the raw value, the less effect the coefficient has.

The adjusted curve actually fits the worst at the upper extreme, because at higher raw values the coefficient will have a greater effect.

That is, 1 times 1.1 is 1.1, a change of 0.1 (which will show as no change in adjusted scoring, which displays results in integers), while 50 times 1.1 is 55, a change of 5.0.


And if it happened to be a season that generally decreased raw values in the adjustment (such as the early 80s), you'd see the adjusted data take on larger values, then smaller ones. If same-smaller-larger-smaller is a bell curve, then what is larger-smaller? If the former is the result of the adjustment applying a bell curve (it's not), what is the latter a result of?


You have, of course, left off the final instance where the adjusted data is higher than the raw. Meaning that even from this bizarre perspective, it's lower, then higher, then lower, then higher. How is that a bell curve?


If it's so fricken obvious man, you should be able to give us a simple function to demonstrate that it's true. Something, anything other than an assertion.

If you respond to nothing else, please finally answer this question: how does adjusted scoring normalize scoring numbers? You have still not shown this to be true.


No, it doesn't. The adjustment doesn't care about distribution. It applies a coefficient to a player's raw totals to get an adjusted one. The coefficient varies a small amount from player to player, but it is not calculated to move players closer to the mean, which is what normalization means.

There are times that normalization is used in hockey analysis (regressing single-season scoring percentages to the mean, for example). But adjusted stats are not one of those times.


The distribution or raw stats is irrelevant to adjusted stats. So this is a moot point.

If adjusted scoring actually normalized the data, then players with low goal totals would be adjusted upward, and players with high goal totals would be adjusted downward, both toward the mean. This does not happen.

In general terms, in high-scoring-environment seasons, all players have their totals adjusted downward (not toward the mean, just down regardless of where the mean is), and in low-scoring-environment seasons, all players have their totals adjusted upward (not toward the mean, just up regardless of where the mean is).

This is not normalization.
I have answered these questions.

This is normalization:

"Adjusted Statistics

In order to account for different schedule lengths, roster sizes, and scoring environments, some statistics have been adjusted. All statistics have been adjusted to an 82-game schedule with a maximum roster size of 18 skaters and league averages of 6 goals per game and 1.67 assists per goal."

A bell curve. Everyone's stats are adjusted to suit a 60% average so to speak. You apply this to a power curve and get a blip in the middle of the data where players results are increased at a greater rate then those on either side of the median.

The blip is clearly visible.


Last edited by Dalton: 10-27-2012 at 08:19 AM.
Dalton is offline   Reply With Quote
Old
10-27-2012, 09:28 AM
  #90
Doctor No
Mod Supervisor
Retired?
 
Doctor No's Avatar
 
Join Date: Sep 2005
Posts: 24,295
vCash: 500
Quote:
Originally Posted by Dalton View Post
That's absurd. I present an argument, you present a single example you purport to refute my argument. I show your data is in error and you think I have something to prove?
Without speaking to the specifics here...if you present an claim that "X is true", then all it takes is a single counterexample where "X is not true" to prove that it's flawed. That's how logic works.

Doctor No is offline   Reply With Quote
Old
10-27-2012, 10:39 AM
  #91
Dalton
Registered User
 
Dalton's Avatar
 
Join Date: Aug 2009
Location: Ho Chi Minh City
Country: Vietnam
Posts: 2,096
vCash: 500
I've compared Gretzky's 92 to Howe's 49.

Howe scored .23% of the .22% that the top 5% scored in his season.
Gretzky achieved .066% of the .21% that the top 5% scored in his season.

Howe scored .14% of the .35% that his top 10% peers acieved.
Gretzky scored .038% of the .36% that his top 10% peers achieved.

92 gs is the gold standard of a single season but clearly Howe was much more dominant against his peers. It has been pointed out that using percentages would see odd numbers as the number of players decreased. But that is the nature of outliers. This just shows how (likely) impossible it is to state that player x would score y goals in season z.

Howe's production compared to his peers was equivalent to scoring 540 goals within the top 5% in Gretzky's season. While Gretzky's production compared to the top 5% of peers is worth 14 goals in Howe's season.

Of course the top 5% is only about 7-8 players in Howe's era and 31 in Gretzky's. But adding more players so both are compared to about 1.5 players per team doesn't do much considering Howe scored .049% of a group representing 70% of his peers. Gretzky scored .013.

One should also notice how the percentage of goals scored by each grouping is pretty consistant across the eras. 70% of the players scored 99% of the goals in each era.

In Howe's season nobody scored at a pace of half a goal a game. In Gretzky's season I stopped counting around 30, about 5%. The same as 7 players in Howe's era.

This dominance cannot be quantified and converted to reasonable looking values in other seasons. 65 goals is only realistic if his peers (every NHL player ever?) all scored less than 45 I'd guess. Number 65 does not in any way capture just what Howe did that year.

Compared to the number 2 scorer, Howe scored just over 50% more goals while Gretzky scored slightly less than. An argument for making the sample size smaller maybe but Howe still outperforms compared to peers.

I can certainly see why people would want to normalize the data to bring these outlying numbers into a more reasonable range but that approach has been discredited.

The magnitude of Howe's performance just serves to prove that even more. A single outlier can have a huge impact on averaging. Every player in his season must see their gs increase thanks to him. Haven't even looked at goalies or the impact of scheduling.

Dalton is offline   Reply With Quote
Old
10-27-2012, 10:50 AM
  #92
barneyg
HFBoards Sponsor
 
Join Date: Apr 2007
Posts: 2,282
vCash: 500
Quote:
Originally Posted by Dalton View Post
I have answered these questions.

This is normalization:

"Adjusted Statistics

In order to account for different schedule lengths, roster sizes, and scoring environments, some statistics have been adjusted. All statistics have been adjusted to an 82-game schedule with a maximum roster size of 18 skaters and league averages of 6 goals per game and 1.67 assists per goal."

A bell curve. Everyone's stats are adjusted to suit a 60% average so to speak. You apply this to a power curve and get a blip in the middle of the data where players results are increased at a greater rate then those on either side of the median.

The blip is clearly visible.
Why the hell do you think the bolded creates a bell curve? Please explain. You keep arguing two points at once. One of those is completely wrong, the other is likely right:

a) the adjustment leads to a bell curve (completely wrong)
b) the adjustment shouldn't be uniform (likely right).

You should drop a) because it's just not true. If anything, the ONLY thing we can be mathematically sure of is that the adjustment retains the shape of the distribution (relevant). Iain's adjusted graph is not any closer to a bell curve than the original data is.

Again, the only reason why your "mathematical eye" was able to con you into thinking the adjustment creates a bell curve (other than perhaps your mind looking for a bell curve) is the arbitrary bins used to create the frequency graph. If the adjusted data graph used [4*F,7*F], [8*F,11*F] and so on instead of [4,7], [8,11], you would clearly see that the shape of the adjusted graph is exactly the same as the original.

barneyg is offline   Reply With Quote
Old
10-27-2012, 11:35 AM
  #93
Dalton
Registered User
 
Dalton's Avatar
 
Join Date: Aug 2009
Location: Ho Chi Minh City
Country: Vietnam
Posts: 2,096
vCash: 500
Quote:
Originally Posted by Taco MacArthur View Post
Without speaking to the specifics here...if you present an claim that "X is true", then all it takes is a single counterexample where "X is not true" to prove that it's flawed. That's how logic works.
You mean 'all (or every) x is true' can be disproven with a single example.

In any event this isn't about logic, it's about evaluating and predicting productivity.

Math and Logic are distinct studies. Just as reasoning, rhetoric and argument are not the same as Logic. Logicians attempted to codify math and reasoning and came to the conclusion that it can't be done.

Logic can't really be applied here at all since there are no well formed formulae or system such as can be found in Linguistics or Math. We're just a bunch of guys spitballin'.

Besides that, the averaging is being done to all the seasons as a set of seasons. The effect on the players is just a consequence of that. A single season does not address this at all and so it cannot be that single counter example. That's reasoning not logic. Imperfect or otherwise.


Last edited by Dalton: 10-27-2012 at 11:48 AM.
Dalton is offline   Reply With Quote
Old
10-27-2012, 12:36 PM
  #94
Dalton
Registered User
 
Dalton's Avatar
 
Join Date: Aug 2009
Location: Ho Chi Minh City
Country: Vietnam
Posts: 2,096
vCash: 500
Quote:
Originally Posted by barneyg View Post
Why the hell do you think the bolded creates a bell curve? Please explain. You keep arguing two points at once. One of those is completely wrong, the other is likely right:

a) the adjustment leads to a bell curve (completely wrong)
b) the adjustment shouldn't be uniform (likely right).

You should drop a) because it's just not true. If anything, the ONLY thing we can be mathematically sure of is that the adjustment retains the shape of the distribution (relevant). Iain's adjusted graph is not any closer to a bell curve than the original data is.

Again, the only reason why your "mathematical eye" was able to con you into thinking the adjustment creates a bell curve (other than perhaps your mind looking for a bell curve) is the arbitrary bins used to create the frequency graph. If the adjusted data graph used [4*F,7*F], [8*F,11*F] and so on instead of [4,7], [8,11], you would clearly see that the shape of the adjusted graph is exactly the same as the original.
If the graph is not acurate for whatever then perhaps that should have been the initial response to my complaint about it. I guess its useless in this debate.

You take every NHL season ever and adjust all the stats to 82gp and 6gpg (skaters). What is that? What have you done to the seasons?

I alter every result in a set of test scores so everyone gets 60%. Someone asks to audit the tests. Now I have to go in and alter the mark for each question. What happens to the 0's and 100%'s achieved for individual questions? If the original result was 40% on a particular test, do I just add 50% to each individual question?

I can show you that graph.
---------------------------- 60%
____________________ 40%

It doesn't matter how I shape it, the two graphs would be parallel in some geometry. But now I'm giving value to 0's. I'm creating value where none existed before.

If I use some multiplier so as not alter the 0's and perhaps the 100's then I've completely changed the relationship between the grades on individual questions. My graphed results for a single test would look like two uniquely formed pieces of string with the ends tied together. Perhaps even a circle. There may be some visible correlation since the new curve comes from the values of the first curve but the shape would clearly be distorted.

Both these methods fail to maintain either the integrity or relationship of the grades recieved for each question on each individual test.

Now suppose that to prevent cheating I have more than one test. They have different questions and even different numbers of questions. Some of the questions are not changed however except perhaps the wording.

I need different formulae if I want to keep the 0's. In that case people who got identical results on questions that every test had might now have different results.

If I don't care about 0's then many would have different results on questions that they originally had identical results.

Making all the tests have the same final result only leads to errors on the individual test questions.

Call it whatever you want but making all the seasons exactly equal with respect to gp, gpg, players per team can only lead to errors.

The only reason to even do this is a misplaced notion of bell curving. The belief that there exists an average. The refusal to accept extreme outliers as a possible real outcome. The 'bell curve' is a way of thinking that permeates evaluation and prediction of productivity.

You need to read that study or read it differently. Ignore the data and read what the authors purpose was and how the results support it.

If you are averaging and making all the seasons have the same values, if you are forcing the outliers to conform to average then you are bell curving whether you see the physical thing or not.

Dalton is offline   Reply With Quote
Old
10-27-2012, 02:06 PM
  #95
barneyg
HFBoards Sponsor
 
Join Date: Apr 2007
Posts: 2,282
vCash: 500
I'd love to respond and continue this debate but I don't understand the test score analogy at all.

Forcing all seasons to have a constant number of GPG does not alter the distribution across players in a given season to make it closer to a bell curve (which I thought was your point all along since your beef was with the frequency graph for 1952-53). It doesn't make outliers systematically closer to the average either. Of course average numbers for all seasons will then look similar, that's the point.

barneyg is offline   Reply With Quote
Old
10-27-2012, 02:08 PM
  #96
Morgoth Bauglir
Master Of The Fates
 
Morgoth Bauglir's Avatar
 
Join Date: Aug 2012
Location: Angband via Utumno
Posts: 3,258
vCash: 500
Quote:
Originally Posted by Taco MacArthur View Post
Without speaking to the specifics here...if you present an claim that "X is true", then all it takes is a single counterexample where "X is not true" to prove that it's flawed. That's how logic works.
In logic this is a truism: If a sweeping generalized statement is made a single specific example contradicting it is all that's needed to prove the general statement invalid.

Morgoth Bauglir is offline   Reply With Quote
Old
10-27-2012, 03:13 PM
  #97
Czech Your Math
Registered User
 
Czech Your Math's Avatar
 
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 3,699
vCash: 500
Quote:
Originally Posted by Dalton View Post
You take every NHL season ever and adjust all the stats to 82gp and 6gpg (skaters). What is that? What have you done to the seasons?
Adjusting to 82 games/season gives each season equal value, regardless of actual schedule length. Assuming each game has equal value, adjusting to a constant, fixed number of games gives each season equal value.

Adjusting to 6 gpg is a way of expressing each goal/point's value in proportion to the league scoring context, using a fixed (although arbitrary) reference. I.e., if the league avg. gpg increases by 50%, a player's output would need to increase by 50% to be of equal value in the new scoring environment.

-----------------

I've read some of your other recent posts and the numbers don't even make sense to me. I'm not even sure what you're trying to say, to be honest. I think you should better explain what the numbers mean (in as simple terms as possible), if you expect much meaningful feedback.

One thing that seems to be ignored is that the talent pool has changed dramatically over time. Howe was not facing the same top end competition as Gretzky, and Gretzky in his prime wasn't facing the same top end competition as has been present for most of the past two decades.

Using top X% of players is reliant on size of the league... as league size increases, the pool of top X% players increases proportionately. Using top Y players is reliant on the composition of the talent pool... as the talent pool increases, the quality of the top Y players increases.

It's important to remember that simple adjustment of data (for schedule, league scoring avg., etc.) is justified based on equivalent value. Once the data is properly adjusted, 40 adjusted goals has the same value in any season. This must be understood before moving on to alternative questions, such as how difficult/impressive is 40 adjusted goals in one year compared to another, because that is a much different question. The latter relies on factors such as the quality of players in the league, the concentration/dilution of talent in the league, the disparity of talent between teams, how different types of players are utilized in different eras, etc. If one does not or will not understand the basis of the former, it will only make it that much more difficult to understand how to approach and resolve the latter IMO.

Czech Your Math is offline   Reply With Quote
Old
10-27-2012, 03:26 PM
  #98
Iain Fyffe
Hockey fact-checker
 
Iain Fyffe's Avatar
 
Join Date: Feb 2009
Location: Fredericton, NB
Country: Canada
Posts: 2,795
vCash: 500
Quote:
Originally Posted by Dalton View Post
I can't prove anything to someone who has made up their mind. Read the study. NHL goal scoring data and penalties were used to prove the author's thesis. They proved it. Prove them wrong. You are just ignoring everything I say anyway.
The irony is just dripping off of this paragraph.

The authors' thesis was that NHL production takes on a power-law curve, which is absolutely true, and is demonstrated in the graph I posted.

But you have moved your thesis beyond this, claiming that adjusted scoring assumes a normal distribution, and is therefore flawed. My graph shows that both actual raw scoring and adjusted scoring both take the shape of a power-law curve.

Therefore, your claims that adjusted scoring is flawed because it assumes a normal distribution are defeated. You have failed to demonstrate that adjusted scoring assumes a bell curve. If you can't demonstrate that (which you can't, because it doesn't) then your assertion is invalid, and you'd be well-served to stop making it.

Quote:
Originally Posted by Dalton View Post
Your bumpy curve does not conform to the power curve of the raw data.
What's wrong with it being bumpy? The raw data is bumpy. These are not fitted lines.

Quote:
Originally Posted by Dalton View Post
It is not a simple translation of the raw data.
Yes it bloody well is. Every player's goal-scoring total is multiplied by a coefficient within a very narrow range of coefficients. That's all there is to it, nothing more.

Prove me wrong.

Quote:
Originally Posted by Dalton View Post
It is a different curve. The end points don't even conform.
Why would they conform? The adjusted stats are, on the whole, slightly higher than the raw stats due to the relatively low scoring environment that season. That's the whole point of the adjustment.

Quote:
Originally Posted by Dalton View Post
Stop arguing with me and try out the methods the author suggested. I'm just a messenger.
I'm not arguing with their point, which is that NHL production follows a power-law curve. This is demonstrably true and uncontroversial.

I'm arguing with your point, which is that adjusted scoring is flawed because it does not follow a power-law curve. It does follow such a curve, and therefore your objection is invalid.

Quote:
Originally Posted by Dalton View Post
I have answered these questions.

This is normalization:

"Adjusted Statistics

In order to account for different schedule lengths, roster sizes, and scoring environments, some statistics have been adjusted. All statistics have been adjusted to an 82-game schedule with a maximum roster size of 18 skaters and league averages of 6 goals per game and 1.67 assists per goal."

A bell curve. Everyone's stats are adjusted to suit a 60% average so to speak.
No, no, no. You really don't get what normalization means. If the data were normalized, the big bottom tail (the very low scorers) would be pulled away from zero torward the mean. It isn't. Even if you're looking at all players from all time, all you'll see is the curve shifting a bit to the left or to the right. The bottom tail will stay right where it is, maintaining the shape of a power-law curve.

Quote:
Originally Posted by Dalton View Post
You apply this to a power curve and get a blip in the middle of the data where players results are increased at a greater rate then those on either side of the median.
The blip exists in the raw data, and that's the only reason it exists in the adjusted data. Why aren't you jumping on the raw data for being flawed, because you see a bell curve in it when it should be a power-law curve (which it is).

Quote:
Originally Posted by barneyg View Post
Why the hell do you think the bolded creates a bell curve? Please explain. You keep arguing two points at once. One of those is completely wrong, the other is likely right:

a) the adjustment leads to a bell curve (completely wrong)
b) the adjustment shouldn't be uniform (likely right).
Precisely. The first is flatly wrong, the second is likely right, and indeed HR's adjusted scoring doesn't quite use a uniform adjustment. The individual player's numbers are removed from the equation when doing the adjustment, meaning (for example) that a high-scoring player is adjusted downward less than other players are in a high-scoring-environment season. This seems to be what Dalton is arguing for in general, so his objections to adjusted scoring are baffling.

Iain Fyffe is offline   Reply With Quote
Old
10-28-2012, 04:58 AM
  #99
pdd
Registered User
 
Join Date: Feb 2010
Posts: 5,576
vCash: 500
Quote:
Originally Posted by habsfanatics View Post
I never really liked adjustment stats, for the reason Dalton suggests and many others. I believe the outliers definitely throw the use of means into a grey area.

If the league gets tougher to score, does it hurt Wayne/Mario in the same way it hurts the Craig Simpson or others. I don't think so. My problem isn't the stats themselves, it's the mischaracterization and misrepresentation I've seen on these boards.

Iain says it doesn't happen, but it most certainly does. I've debated in the "How many points would Gretzky score today threads" and posters have told me he would score precisely 158 points or whatever the number was, because there was a formula for that, LOL. Not only that, but certain posters use these numbers in every thread as their only source of reasoning. I won't mention his name, but one in particular poster uses them without even understanding them at all.

For an adjustment that is suppose to be used for putting things in context, it often lacks context itself.

My opinion.
One thing that is definitely left out (due to no tangible accounting method) of adjusted scoring is actual offensive talent level vs. defensive/goaltending talent level. This is not the same as "is scoring higher or lower?" as we all know. And there are always outliers; Gretzky and Lemieux in 1988-89 caused their teams (and teammates) to put up significantly higher offensive numbers than they would have otherwise. If you simply remove them from the league and replace them with fourth-line centers, the amount of scoring removed would be significant (on the order of 450-500 points) and the adjusted stats for players from that year who did not benefit from those two would look much, much better.

Adjusted stats are a good starting point. They're not a good ending point. Unless you think the best players from the pre-O6 era were easily the greatest scorers ever.

pdd is offline   Reply With Quote
Old
10-29-2012, 04:28 AM
  #100
Czech Your Math
Registered User
 
Czech Your Math's Avatar
 
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 3,699
vCash: 500
Quote:
Originally Posted by eva unit zero View Post
One thing that is definitely left out (due to no tangible accounting method) of adjusted scoring is actual offensive talent level vs. defensive/goaltending talent level. This is not the same as "is scoring higher or lower?" as we all know.
That's true, but adjusted scoring is still a massive improvement over raw data. What is your proposed solution?

One can measure how the scoring of top tier or that of various tiers of players changes compared to adjusted numbers or how the scoring of various tiers' changes in relation to each other. One can also try to approximate the talent level (overall or by position) in the NHL at various times, either the total pool or the the average talent per team (the latter because it may affect various tiers in different ways).

I've tried to do a little of both, as have others. There seem to be some eras that are given a bit of a raw deal by raw numbers (mid-late 60s until expansion, 80s, probably the last decade or so) and some are treated more favorably (from 60s expansion to WHA merger, before the early-mid 60s... primarily due to smaller talent pool before 60s & diluted/uneven talent in 70s).

Quote:
Originally Posted by eva unit zero View Post
And there are always outliers; Gretzky and Lemieux in 1988-89 caused their teams (and teammates) to put up significantly higher offensive numbers than they would have otherwise. If you simply remove them from the league and replace them with fourth-line centers, the amount of scoring removed would be significant (on the order of 450-500 points) and the adjusted stats for players from that year who did not benefit from those two would look much, much better.
Gretzky & Lemieux are not causing 450-500 extra goals to be scored in the NHL. First, they only scored 367 points in '89, the most they ever combined for in one season. Second, their teammates when those goals were scored would still generate a significant amount of offense on their own (even with a 4th line center). Third, they wouldn't be directly replaced with a 4th line center. Rather, players would in many cases move up a line and simultaneously get increased ice time (including more PP time), allowing for even more lost production to be replaced.

Sure, there is some effect from outliers, but it isn't the predominant effect causing league scoring to fluctuate. In a 21 team league playing 80 games, it would take 168 fewer goals for league total gpg to see a .20 gpg decrease. That's 46% of the goals on which Gretzky and/or Lemieux scored points, which is probably higher than the actual decrease would be (I haven't looked at the numbers, but I doubt when Gretzky/Lemieux left their teams that those teams lost that much offense). If there were 125 fewer goals (31% less), that's a decrease of .15 gpg. I'd guess the real number is in that range, which would mean the two best offensive players at/near their peaks only change the league gpg by ~.15-.20 gpg. Meanwhile, league gpg went from 7-8 gpg in the 80s to 5-6.5 over nearly the last two decades.

Quote:
Originally Posted by eva unit zero View Post
Adjusted stats are a good starting point. They're not a good ending point. Unless you think the best players from the pre-O6 era were easily the greatest scorers ever.
"Simple" adjusted stats are like a meeting place, which is most of the way from the starting point (raw data) to the destination (perfectly equivalent/comparable data).

The reason pre-O6 (or early O6) era players' simple adjusted data is favorable for them is that the quality of the league was significantly less than it was in the latter part of the O6, when the Canadian hockey-age population was substantially larger.


Last edited by Czech Your Math: 10-29-2012 at 04:45 AM.
Czech Your Math is offline   Reply With Quote
Reply

Forum Jump


Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -5. The time now is 12:46 AM.

monitoring_string = "e4251c93e2ba248d29da988d93bf5144"
Contact Us - HFBoards - Archive - Privacy Statement - Terms of Use - Advertise - Top - AdChoices

vBulletin Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
HFBoards.com is a property of CraveOnline Media, LLC, an Evolve Media, LLC company. 2014 All Rights Reserved.