HFBoards

Go Back   HFBoards > General Hockey Discussion > By The Numbers
By The Numbers Hockey Analytics... the Final Frontier. Explore strange new worlds, to seek out new algorithms, to boldly go where no one has gone before.

Using Regression to Adjust "Adjusted Points" for Top Tier Players '68-12

Reply
 
Thread Tools
Old
11-20-2012, 04:23 AM
  #26
Czech Your Math
Registered User
 
Czech Your Math's Avatar
 
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 3,490
vCash: 50
I added a variable (Xc), along the lines of what barney suggested, to capture the "concentration effect" of the top ~1-3 teams in GF. Xc is defined as the % of players in the top 1N who were on the top 0.1N teams in GF.

More recently, variables which measure expansion (% of new teams in past 1 or 2 seasons) and the effect of non-Canadians on the top 1N scoring average.

1968-2012
=========
R^2 = .691
SEy = 2.48 (avg. Y = 89.9)


Coeff: value, t-score
B0 = 81.4, 65
Bn = (0.39), 17
Bh= (6.45), 7
Bi= 7.56, 9
Bg = (0.15), 1.4
Bp =2.47, 20
Bf = 420, 24
Ba = (118), 10
Bt = 1.93, 4
Bc = (9.66), 8
Be = 40.3, 19

Y: avg. simple adjusted points (gms, GPG, A/G) of top 1N players (N=number of teams)
B0: Y-intercept (constant)
Xn: Number of teams
Xh: Fraction of new teams vs. previous season
Xi: Fraction of new teams vs. two seasons previous
Xg: League GPG
Xp: PP opportunities/game
Xf: Standard deviation of teams' GF, divided by avg. team GF
Xa: Standard deviation of teams' GA, divided by avg. team GA
Xt: Excess above avg. GF of top 0.2N teams in GF, divided by std dev of team GF
Xc: Ratio of players in top 1N which were on teams in the top 0.1N in GF
Xe: Fractional increase in avg. of top 1N due to non-Canadian players

One important factor that may still be missing is the presence/absence of some of the very top Canadian players (i.e., Gretzky and/or Lemieux). It will probably take a lot of trial and error to determine how to best define the proper variable to capture this causality.


Last edited by Czech Your Math: 02-24-2013 at 11:10 AM.
Czech Your Math is offline   Reply With Quote
Old
01-03-2013, 08:43 AM
  #27
Czech Your Math
Registered User
 
Czech Your Math's Avatar
 
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 3,490
vCash: 50
Here are the predicted Y values and the difference between actual & predicted Y values, using the model & coefficients in the previous post.

Act1N = actual average of simple adjusted points of top 1N players in scoring (N= # of teams in league)
Pred1N = predicted value for Act1N based on variables in model
Diff = (Act1N) - (Pred1N); so a positive value means the actual value has higher than predicted
%Diff = % difference in comparison to Act1N

Year Act1N Pred1N Diff %Diff
1968 90.7 91.5 (0.8) -0.9%
1969 99.7 99.4 0.3 0.3%
1970 91.0 91.7 (0.7) -0.8%
1971 95.0 96.9 (1.9) -2.0%
1972 98.8 95.3 3.4 3.5%
1973 92.5 90.7 1.8 1.9%
1974 90.5 91.3 (0.8) -0.9%
1975 92.1 92.3 (0.2) -0.2%
1976 92.6 92.0 0.6 0.6%
1977 87.7 90.5 (2.8) -3.2%
1978 88.2 89.2 (1.0) -1.2%
1979 90.0 88.4 1.6 1.8%
1980 89.3 85.3 4.0 4.5%
1981 86.4 87.8 (1.4) -1.6%
1982 87.7 87.2 0.5 0.6%
1983 84.8 87.6 (2.8) -3.3%
1984 86.3 88.8 (2.5) -2.9%
1985 88.4 87.1 1.2 1.4%
1986 88.1 88.6 (0.5) -0.6%
1987 82.6 85.6 (3.0) -3.6%
1988 89.3 91.3 (2.0) -2.3%
1989 90.6 88.9 1.6 1.8%
1990 88.6 85.6 3.1 3.5%
1991 91.0 89.5 1.5 1.7%
1992 88.2 90.6 (2.4) -2.7%
1993 94.5 93.7 0.8 0.8%
1994 89.2 89.0 0.1 0.2%
1995 91.6 92.1 (0.5) -0.6%
1996 99.9 94.3 5.7 5.7%
1997 91.2 88.1 3.2 3.5%
1998 89.4 90.9 (1.5) -1.7%
1999 93.5 93.4 0.2 0.2%
2000 84.6 88.0 (3.4) -4.0%
2001 93.0 93.7 (0.7) -0.8%
2002 85.2 87.9 (2.8) -3.2%
2003 92.1 91.0 1.2 1.3%
2004 86.1 88.8 (2.7) -3.1%
2006 88.7 91.6 (2.9) -3.2%
2007 92.0 88.0 3.9 4.3%
2008 90.8 87.1 3.7 4.1%
2009 86.6 87.5 (0.9) -1.1%
2010 88.3 88.0 0.3 0.4%
2011 83.5 84.2 (0.7) -0.8%
2012 84.9 84.7 0.2 0.3%


Last edited by Czech Your Math: 02-23-2013 at 05:54 PM.
Czech Your Math is offline   Reply With Quote
Old
01-03-2013, 09:47 AM
  #28
Czech Your Math
Registered User
 
Czech Your Math's Avatar
 
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 3,490
vCash: 50
The variations are relatively small, but what likely caused the larger variations between the model and actual measurements? Sometimes extreme values for one or more of the variables aren't fully captured by the model, or it could be another factor that is difficult to quantify at all. Let's briefly examine the largest differences:

1972 (+3.5%): The GAG line of Ratelle-Hadfield-Gilbert finished 3-4-5 behind Espo & Orr. Xt (which captures offensive powerhouse teams) has its second highest value during the 44 seasons since O6 expansion, despite Xf (stdev of team GF, which is the denominator for Xt) being at its third highest level in the study. Basically, it's difficult to fully capture just how top-heavy the league was that year.

1977 (-4.2%): It appears there was a real lack of depth in the top 1N. There's Lafleur and Dionne at the top, but no longer Espo & Orr & Co. and not yet Trottier & Bossy. Some indications of the weak depth are Shutt, MacLeish and Tim Young 3-4-5, and the top 1N containing Ratelle & Espo in their mid-30s and d-men Robinson & Potvin.

1980 (+4.0%): While the WHA teams were only 4/21 of the new NHL, 4 of the top 11 point producers were from the former WHA (including of course Gretzky who tied for the lead.

1983 (-3.1%): Tough one to explain. PPO/game were at the lowest level for the period '81-'09, so that may not have been fully captured in the variable.

1987 (-3.2%): Previously elite players like Dionne, Trottier, Bossy and Stastny were no longer near the top, while Lemieux and Yzerman were yet to hit their peaks. It was also a time when parity hit its heights, as Xf (stdev of team GF) was the lowest and Xa (stdev of team GA) was the second lowest value in the 44 seasons since O6 expansion.

1990 (+3.2%): The only thing that strikes me is that the old guard (Gretzky, Lemieux, Yzerman, Messier) were still strong, while the new guard (Hull & Oates, Turgeon, Lafontaine, Sakic) emerged.

1992 (-3.6%): This is one of toughest variations to explain. Basically, Gretzky finally passes his peak, Lemieux misses 20% of the season (but that's typical) and the other stars didn't really step up and have career years (as they would in '93). While the American players had really become a force, the non-N.A. players were not really a factor in the top 1N (Fedorov snuck in at #22 and Mogilny just outside at #24), they and the American players (as well as only one added team since WHA merger) were providing more depth outside the top 1N, which probably caused league GPG to be higher than it otherwise would have been (which lowers the adjusted numbers for the top players).

1996 (+5.7%) and 1997 (+3.7%): None of the values of the variables stands out, but what does is the number of superstars who were entering or still in their prime during these years. Just look at the top 10 in total points in '96+'97: Lemieux, Jagr, Selanne, Francis, Kariya, Forsberg, Gretzky, LeClair, Lindros and Sakic. Francis was playing with Jagr at ES (and with Lemieux at ES in '97 and on the PP both years) and LeClair was playing with Lindros. Rounding out the top 1N was a mix of players from the US (Weight, Tkachuk, Hull? and Modano), overseas (Mogilny, Palffy, Sundin, Fedorov, Nedved) and the usual Canadians (Messier, Turgeon, Yzerman, Damphousse, Oates, Shanahan and Fleury). Just missing the cut were players such as Recchi, Bondra, Gilmour, Kamensky, Brind'Amour, Amonte and Roenick. So it's no surprise that the top 1N outperformed expectations in these years.

2000 (-4.1%) & 2002 (-3.4%): Power plays were the lowest and third lowest, respectively, for the period '86-'09. Lemieux and Gretzky were gone, Messier and Hull were no longer factors. Injuries started to take their toll on Lindros, Forsberg, Mogilny, Palffy, etc. There wasn't yet a strong crop of younger players to take the places of all these retired, aging and injured stars.

2006 (-3.3%): The only extreme value among our variables is the historically high level of power plays. This may have exaggerated expectations, at least in part for the following reasons (which had various effects on their own as well): This was a very dynamic season, as it followed a lockout season and there was a dramatic change in rules enforcement. Many players either retired during the lockout, did so during the season, were with new teams, or were rusty from playing little or no hockey during the lost season (and what hockey they did play was with different players in a different environment). When conditions change so drastically overnight, it's neither surprising nor concerning that there would be a variation between predicted and actual performances.

2007 (+4.6%) & 2008 (+4.1%): Power plays were at a more moderate level, especially by '08, yet play may was probably still more open due to the crackdown in '06. While there were new stars (Ovechkin & Malkin) at the top, there were also many players in their prime (Dastyuk, Thornton, Lecavalier, Spezza, Zetterberg, Kovalcuk, Gaborik) and some older 30+ players having very good seasons (Iginla, Alfredsson, Kovalev, St. Louis).

I believe the variations are generally surprisingly small, given the randomness inherent in such data and the many effects that are very difficult or impossible to properly quantify. It seems that most of the relatively larger variations between predicted and actual have reasonable, logical explanations. I'm satisified with the results of this study at this point and believe it provides solid support that adjusted points are very practical for comparing offense across seasons in the post-expansion era.


Last edited by Czech Your Math: 02-23-2013 at 11:36 AM.
Czech Your Math is offline   Reply With Quote
Old
02-07-2013, 07:18 AM
  #29
Czech Your Math
Registered User
 
Czech Your Math's Avatar
 
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 3,490
vCash: 50
These are index numbers for each of the 44 seasons, with a mean of 1.00. The "predicted 1N" has been used, except Xe (which measures the influence of non-Canadians) has been omitted. This gives us a predicted value which reflects the conditions in that season, free from the influence of non-Canadians bringing up the average for the very top tier of players.

Year Pred1N* Index
1968 91.2 1.04
1969 99.8 1.14
1970 91.6 1.05
1971 95.8 1.10
1972 95.3 1.09
1973 91.2 1.04
1974 91.0 1.04
1975 92.8 1.06
1976 92.0 1.05
1977 91.3 1.05
1978 89.0 1.02
1979 88.5 1.01
1980 85.5 0.98
1981 85.2 0.98
1982 86.4 0.99
1983 85.7 0.98
1984 87.3 1.00
1985 86.3 0.99
1986 86.2 0.99
1987 84.6 0.97
1988 88.9 1.02
1989 88.1 1.01
1990 85.5 0.98
1991 88.8 1.02
1992 89.0 1.02
1993 90.2 1.03
1994 87.3 1.00
1995 88.2 1.01
1996 88.3 1.01
1997 82.4 0.94
1998 84.9 0.97
1999 84.7 0.97
2000 82.7 0.95
2001 86.2 0.99
2002 83.9 0.96
2003 84.5 0.97
2004 82.1 0.94
2006 86.6 0.99
2007 84.1 0.96
2008 82.2 0.94
2009 82.9 0.95
2010 81.7 0.94
2011 80.2 0.92
2012 81.4 0.93

'68-'77: avg. 1.07, median 1.05, range 1.04-1.14
'78-'96: avg. 1.00, median 1.00, range 0.97-1.03
'97-'12: avg. 0.95, median 0.95, range 0.92-0.99


Last edited by Czech Your Math: 02-23-2013 at 11:50 AM.
Czech Your Math is offline   Reply With Quote
Old
02-23-2013, 11:59 AM
  #30
Czech Your Math
Registered User
 
Czech Your Math's Avatar
 
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 3,490
vCash: 50
I've added a key variable, which is a vast improvement in measuring the effect of non-Canadian players. The variable (Xe) is the increase in the scoring avg. of the top 1N due to the presence of non-Canadian players. Players such as Mikita, Hodge, Nolan, Thomas, and Heatley were considered Canadian.

This had some important effects on the model and study in general:

1. It increased the R^2 from .61 to .69. I figured the limits of a model such as this would be to explain close to 70% of the dependent variable, and that has now been accomplished.

2. It reduced Xg (league GPG), which was previously an important and significant variable in this model, to possibly insignificant. I ran the new model with and without Xg, and it didn't really seem to matter. Effects which were previously captured by Xg have now been captured by more accurate and refined variables, such as the newest variable Xe.

3. It allows us to back out this variable when calculating the index numbers, so that the presence of a stronger player pool does not increase the index number (i.e. more better players don't make it look like it was easier for top players to score adjusted points).

Czech Your Math is offline   Reply With Quote
Old
02-24-2013, 10:46 AM
  #31
unknown33
Registered User
 
unknown33's Avatar
 
Join Date: Dec 2009
Location: Europe
Country: Marshall Islands
Posts: 3,024
vCash: 500
Sorry for my lack of understanding, but could you explain in a few words what the results of this study tells us?

unknown33 is offline   Reply With Quote
Old
02-24-2013, 12:03 PM
  #32
Czech Your Math
Registered User
 
Czech Your Math's Avatar
 
Join Date: Jan 2006
Location: bohemia
Country: Czech_ Republic
Posts: 3,490
vCash: 50
Quote:
Originally Posted by unknown33 View Post
Sorry for my lack of understanding, but could you explain in a few words what the results of this study tells us?
There are numerous things that the results can tell us. I consider some of the most important things as follows:

A) Since expansion in '68, adjusted points have become progressively more difficult to score. As shown in the post with index numbers, the first decade after expansion it was rather easy to score adjusted points. From the time shortly before the WHA merger until the mid-90s, it was more difficult (but typically about average for the entire post-expansion period to date). Since the mid-90s (often referred to as the "dead puck era") it's been more difficult still to score adjusted points.

B) The reasons for the increasingly difficulty in scoring adjusted points appear to have been identified and quantified to a large degree. For instance, let's compare the three main eras as identified by the index numbers: the first decade after expansion ('68-'77), the typical non-expansion period surrounding the '80s ('78-'93), and the last two decades after the fall of the Berlin Wall ('94-'12).

The post-expansion period had an average predicted 1N (which is avg. adjusted scoring of top N players, where N is number of teams) of over 93. "The '80s" period had an average predicted 1N of over 88, a decrease of almost 5 points. The main reasons for the decrease were as follows:

- Parity increased significantly (variables Xf & Xa), which caused a drop of over 4 points.

- Expansion slowed significantly (variables Xh & Xi), which caused a drop of almost 1 point.

- The number of teams was larger (generally more difficult for larger group of players to maintain same average), which caused a drop of over 2 points.

- Power play opportunities increased substantially, which caused an increase of almost 2 points.

- The increased presence of non-Canadian players in the top 1N (variable Xe) caused an increase of over 1 point.

Those factors sum to a total decrease of almost 5 points (it may appear more like 4 points due to rounding errors). It was mainly expansion-related factors (new teams and lack of parity) which made it so much easier to score in the post-expansion period.

Now let's compare "the '80s" period to the "dead puck era". The predicted 1N actually increased from over 88 to over 89, almost a 1 point increase. Let's again look at the various factors:

- The larger number of teams caused a drop of 3 points.

- The increased presence of non-Canadian players in the top 1N caused an increase of 4 points.

- Other factors were rather minor, causing offsetting changes of about 1/2 point or less.

In this case the better talent pool including subsantially more non-Canadian stars obscured the fact that it became increasingly hard to score adjusted points.

C) Once the index numbers are more firmly established (I have to give more thought over time to whether/which other factors, besides presence of non-Canadians, should be factored out), then they can be used to calculate "adjusted adjusted" numbers, which we should have more confidence in using when comparing across seasons.


Last edited by Czech Your Math: 02-24-2013 at 12:09 PM.
Czech Your Math is offline   Reply With Quote
Old
02-24-2013, 12:31 PM
  #33
unknown33
Registered User
 
unknown33's Avatar
 
Join Date: Dec 2009
Location: Europe
Country: Marshall Islands
Posts: 3,024
vCash: 500
So basically the goal is to improve the adjusted points method.
Very intersting, thanks.

unknown33 is offline   Reply With Quote
Reply

Forum Jump


Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -5. The time now is 05:24 PM.

monitoring_string = "e4251c93e2ba248d29da988d93bf5144"
Contact Us - HFBoards - Archive - Privacy Statement - Terms of Use - Advertise - Top - AdChoices

vBulletin Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
HFBoards.com is a property of CraveOnline Media, LLC, an Evolve Media, LLC company. 2014 All Rights Reserved.