View Single Post
Old
10-31-2012, 03:46 PM
  #130
Rhiessan71
Just a Fool
 
Rhiessan71's Avatar
 
Join Date: Feb 2003
Location: Guelph, Ont
Country: Canada
Posts: 9,952
vCash: 500
Quote:
Originally Posted by Czech Your Math View Post
It would seem difficult to assess the value of adjusted stats without first trying to determine what flaws exist and their magnitude (and the reasons for those flaws). This would seem to pertain to whatever other system is being compared to adjusted stats to determine relative value as well. To me, it's all inter-related, but I am confident that in the vast majority of cases adjusted stats are much more accurate than raw stats. Since simple adjusted stats tell us the comparative value of each goal (but possibly not exactly how difficult it was to attain such value, etc.), that makes it much more useful than raw data IMO.



It would be difficult to find any system of attributing value to or comparing players that did not have some significant flaw(s). However, the mere fact that some are discussing potential flaws does not prove that such flaw(s) exist. Also, the mathematical basis of simple adjusted stats is very sound, because it tells us the approximate value of the goals/points based on the league scoring context. Again, that's different than telling us how difficult it was to attain that level of value, given the many changing conditions and factors in the league, but it's a rather important piece of information, much more than raw data usually is.



I (and others, such as Overpass) use % or percentile type tiers that are more fixed in proportion to the number of players in the league. I don't know what purpose using tiers based on arbitrary levels of production serves. E.g., Overpass has done some studies of scoring changes between tiers, and uses 1st liner, 2nd liner, etc. to group forwards into tiers, which sort of corresponds to the 25% tiers I used in my example. I've looked at the 1st N, 2nd N, etc. # of players (where N = # of teams). I've also looked at fixed numbers of players (e.g. #1-6, 7-12, etc.), which is the basis of comparing players to their peers (e.g. 2nd place or avg. of top 10). However, as previously stated, this method has its own (much larger IMO) pitfalls, since A) it has no basis of fixed value in proportion to the league scoring context, B) it is using a very small sample for comparison purposes, and C) it usually ignores the vast changes in the talent towards the top end of the spectrum of players.



The group of players in the NHL is already at the very far left of the spectrum of hockey players as a whole, but I understand your point. However, just because we are most often examining players at the very far left of the NHL spectrum does not automatically mean that adjusted data yields flawed results for those players. Yes, it's very possible that it can, but it's still a vast improvement on raw data, with a foundation in actual value based on quantified measurements. Any further adjustment should be thoroughly justified based on quantified and reasoned evidence as to the reasons and magnitude of distortions that occur from using the simple adjusted process.



Maybe adjusted stats are not as valuable as some claim them to be, due to potential inaccuracies when trying to equate the difficulty of attaining certain levels of adjusted production in different seasons. However, it is again important to remember that they are based on a foundation of value in proportion to the league scoring environment. I would also point out that whatever flaws or distortions simple adjusted data may have, it is likely no greater and probably less than that for most other systems: raw data, comparing players rankings amongst their peers or to a very small subset of their peers, using the results of awards/AS voting, quotes from writers/managers/coaches/players/fans, etc.



First, studying potential causes for distortion in isolation may result in each potential source to appear to have a larger effect than it actually does. Such distortions may often negate each other to a large degree. Second, no matter the size of the alleged "flaw", without knowing the reason for such a flaw, further adjustment may only cause further distortion. I previously gave the example of the influx of overseas players being composed of a disproportionately higher group of scoring forwards/d-men. When measuring scoring, this is like adding a bunch of high quality students to the classroom. It makes it look like it's suddenly substantially easier to get an A, when actually the student population became much higher quality on average (and particularly at the top). If one used a curve to further "adjust" those students' grades downward, it would unfairly penalize their achievements for the sake of making the distribution of grades look more "normal."



I wouldn't say adjusted stats should be taken at absolute face value and are the final answer, but they are still a heck of a lot better than most alternatives (raw data, peer rankings, award voting, etc.). I've actually studied and presented the results of a lot of relevant topics: scoring of a fixed group of high quality players over 60+ seasons... scoring of various % tiers over time... estimating the effective NHL talent pool over time. I also have read studies of others on various relevant topics. Actually, I was one of the first people that I know of to create and use adjusted stats (along with others like HockeyOutsider), long before HR.com existed. So to imply or state that I don't understand or know how to use adjusted stats is going a bit off the deep end, don't ya think?

It's hard for me to explain why adjusted stats are useful even when comparing players across the same range of seasons. Basically, as league scoring goes down, it becomes much more difficult to separate from the pack in raw point (not %) terms. So, if one player is 50% better than avg. and then becomes 20% above avg., and the other player is 20% player above avg. and becomes 50% above avg., changes in the league scoring context will distort that in raw point terms:

Year 1
--------
league avg. 50
player A 75 (50% above)
player B 60 (20% above)

Year 2
---------
league avg. 100
player A 120 (20% above)
player B 150 (50% above)

Each player was once 20% above and once 50% above league avg., yet their totals are: Player A 196, Player B 210. Because Player B was better at a time when the league avg. was much higher, he appears to be significantly better than Player B based on a sum of raw point totals over the same seasons, when that wasn't the case.
Honestly dude...most of this post is a lot of bla bla bla where you don't even answer to the points I made half the time.
Most of it is you skirting around admitting there is a big flaw (and there is a big flaw that you will see if you do the exercise I mentioned earlier) while at the same time saying we need to examine possible solutions.
And it's not whether or not further adjustments could make it worse.

Ok, now all that completely aside.
The single biggest thing I keep taking from your posts is that you keep saying that adjusted stats are a replacement for raw stats, that one should use one or the other but not both.
THEY ARE NOT AND YOU SHOULD NOT EXCLUDE EITHER OF THEM!

This whole thing is about what value to assign Adjusted Stats. They are not a replacement for anything! They are not an alternative! They are just another tool to be used to find a reasonable answer.
It sure as hell isn't about excluding anything.

Rhiessan71 is offline   Reply With Quote