HFBoards

Go Back   HFBoards > General Hockey Discussion > By The Numbers
Mobile Hockey's Future Become a Sponsor Site Rules Support Forum vBookie Page 2
By The Numbers Hockey Analytics... the Final Frontier. Explore strange new worlds, to seek out new algorithms, to boldly go where no one has gone before.

Heads-Up: HR Games GP Data Is Flawed

Reply
 
Thread Tools
Old
07-30-2012, 01:07 AM
  #1
Canadiens1958
Registered User
 
Canadiens1958's Avatar
 
Join Date: Nov 2007
Posts: 12,027
vCash: 500
Heads-Up: HR Games GP Data Is Flawed

The alarm went off after a comment by Theokritos and I have started checking other instances.

Simply a 76 game regular season NHL schedule with a roster of 16 skaters produces 1216 games, 76 x 16 = 1216.

In the Estimated Time on Ice Thread, looking at the 1968-69 Bruins season reveals a 35 game shortage in games played.

http://www.hockey-reference.com/teams/BOS/1969.html


Checking the 1968-69 Canadiens data reveals another discrepancy from 1216.

http://www.hockey-reference.com/teams/MTL/1969.html

Govern your studies accordingly.

Canadiens1958 is offline   Reply With Quote
Old
07-30-2012, 02:32 AM
  #2
ssh
Registered User
 
Join Date: May 2008
Posts: 94
vCash: 500
Thanks for the heads up. Those aren't the only cases though. It's not until the 80's when teams regularly have full rosters credited for each game. In the 70's most teams come up short. Even in the last couple of decades some teams are missing a game or two.

Some of the missing games are probably caused by players being dressed but sitting on the bench the whole game. Then of course there's the possibility of bad or missing official data. How much of a problem that is is very difficult to estimate since (nearly?) all websites and books around use the same data.

ssh is offline   Reply With Quote
Old
07-30-2012, 07:01 AM
  #3
Canadiens1958
Registered User
 
Canadiens1958's Avatar
 
Join Date: Nov 2007
Posts: 12,027
vCash: 500
Overage

Quote:
Originally Posted by ssh View Post
Thanks for the heads up. Those aren't the only cases though. It's not until the 80's when teams regularly have full rosters credited for each game. In the 70's most teams come up short. Even in the last couple of decades some teams are missing a game or two.

Some of the missing games are probably caused by players being dressed but sitting on the bench the whole game. Then of course there's the possibility of bad or missing official data. How much of a problem that is is very difficult to estimate since (nearly?) all websites and books around use the same data.
Does not explain instances where overages happen. Also assuming that there are no compensating mistakes that create an illusion of accuracy.

Publishing the same data or a close proximity is one thing. Using the data and drawing conclusions as if it were official NHL data is a different topic altogether.

Canadiens1958 is offline   Reply With Quote
Old
07-30-2012, 07:24 AM
  #4
ssh
Registered User
 
Join Date: May 2008
Posts: 94
vCash: 500
Quote:
Originally Posted by Canadiens1958 View Post
Does not explain instances where overages happen. Also assuming that there are no compensating mistakes that create an illusion of accuracy.

Publishing the same data or a close proximity is one thing. Using the data and drawing conclusions as if it were official NHL data is a different topic altogether.
Sadly there's not much one can do without having to do unreasonable amounts of legwork. NHL has been very poor at publishing official data, at least online.
Also, the quality of the official data is unknown. IIRC there have been threads here regarding clear errors in official numbers. Not to mention, of course, all the subjective errors done by game officials, such as wrong goal and assist credit, shot counts, ice time etc.

ssh is offline   Reply With Quote
Old
07-30-2012, 08:28 AM
  #5
overpass
Registered User
 
Join Date: Jun 2007
Posts: 3,639
vCash: 500
The worst error I've seen is the failure to credit Sprague Cleghorn for the three games he played for Ottawa at the beginning of the 1920-21 season. The whole Cleghorn saga was one of the biggest stories of the season - how could the stats miss it? Unless the NHL decided that Cleghorn never should have played for Ottawa in those games and erased them from the record (he had been transferred by the league to Hamilton, but refused to report.)

Newspaper reports all agree that Doug Young scored the winning goal in Game 3 of the 1934 Cup Finals on a long shot, but Young's official stats for those playoffs show 0 G, 0 A.

overpass is offline   Reply With Quote
Old
07-30-2012, 08:59 AM
  #6
Canadiens1958
Registered User
 
Canadiens1958's Avatar
 
Join Date: Nov 2007
Posts: 12,027
vCash: 500
True

Quote:
Originally Posted by overpass View Post
The worst error I've seen is the failure to credit Sprague Cleghorn for the three games he played for Ottawa at the beginning of the 1920-21 season. The whole Cleghorn saga was one of the biggest stories of the season - how could the stats miss it? Unless the NHL decided that Cleghorn never should have played for Ottawa in those games and erased them from the record (he had been transferred by the league to Hamilton, but refused to report.)

Newspaper reports all agree that Doug Young scored the winning goal in Game 3 of the 1934 Cup Finals on a long shot, but Young's official stats for those playoffs show 0 G, 0 A.
True. Will not get into the attributing motive or conspiracy theory games but what you posted is just a tip of the iceberg.

Basic issue is getting everyone in the chain on the same page when it comes to doing things properly in a standardized format.

First and second assists are very vulnerable to this since initially they were reported based on the referees verbal call to the scorer in the penalty box,the passed on upstairs,to the local papers and wire services,followed by newspaper box scores. So you have a chain with at least five opportunities for interchanging the order.


Last edited by Canadiens1958: 07-30-2012 at 09:00 AM. Reason: typo
Canadiens1958 is offline   Reply With Quote
Old
07-31-2012, 01:52 PM
  #7
Iain Fyffe
Hockey fact-checker
 
Iain Fyffe's Avatar
 
Join Date: Feb 2009
Location: Fredericton, NB
Country: Canada
Posts: 3,078
vCash: 500
Quote:
Originally Posted by Canadiens1958 View Post
Does not explain instances where overages happen. Also assuming that there are no compensating mistakes that create an illusion of accuracy.
Are there any cases of overages? The two examples you provided are shortages, which are understandable. Either a team doesn't dress the maximum, or doesn't play players who are dressed. We only need to explain overages if there are any overages to explain.

Quote:
Originally Posted by Canadiens1958 View Post
Publishing the same data or a close proximity is one thing. Using the data and drawing conclusions as if it were official NHL data is a different topic altogether.
Does this disagree with official NHL data? IIRC hockey-reference was first build out of the Total Hockey data set, and of course Total Hockey is "The Official Encyclopedia of the National Hockey League (tm)".

We are of course constrained by the information we have. Even the official stats are not 100% reliable (see Rick Tocchet having two assists added to his record years later). The only illusion of accuracy that exists is the one that one lets oneself believe.

Iain Fyffe is offline   Reply With Quote
Old
07-31-2012, 02:00 PM
  #8
Iain Fyffe
Hockey fact-checker
 
Iain Fyffe's Avatar
 
Join Date: Feb 2009
Location: Fredericton, NB
Country: Canada
Posts: 3,078
vCash: 500
Quote:
Originally Posted by overpass View Post
The worst error I've seen is the failure to credit Sprague Cleghorn for the three games he played for Ottawa at the beginning of the 1920-21 season. The whole Cleghorn saga was one of the biggest stories of the season - how could the stats miss it? Unless the NHL decided that Cleghorn never should have played for Ottawa in those games and erased them from the record (he had been transferred by the league to Hamilton, but refused to report.)
That's apparently not an NHL error. I'm looking at the aforementioned Total Hockey (first edition), and Cleghorn has those games in his record. So you're presumably looking at an input error.

Quote:
Originally Posted by overpass View Post
Newspaper reports all agree that Doug Young scored the winning goal in Game 3 of the 1934 Cup Finals on a long shot, but Young's official stats for those playoffs show 0 G, 0 A.
Whereas this is presumably a difference between what the reporters saw and what was credited on the official scoresheet. All of Detroit's 18 playoff goals are accounted for, so someone else must have been credited with that goal. That's a separate concern from unreliable transcription of data.

Iain Fyffe is offline   Reply With Quote
Old
07-31-2012, 03:58 PM
  #9
Canadiens1958
Registered User
 
Canadiens1958's Avatar
 
Join Date: Nov 2007
Posts: 12,027
vCash: 500
Overages

Quote:
Originally Posted by Iain Fyffe View Post
Are there any cases of overages? The two examples you provided are shortages, which are understandable. Either a team doesn't dress the maximum, or doesn't play players who are dressed. We only need to explain overages if there are any overages to explain.


Does this disagree with official NHL data? IIRC hockey-reference was first build out of the Total Hockey data set, and of course Total Hockey is "The Official Encyclopedia of the National Hockey League (tm)".

We are of course constrained by the information we have. Even the official stats are not 100% reliable (see Rick Tocchet having two assists added to his record years later). The only illusion of accuracy that exists is the one that one lets oneself believe.
Overages - see the goalie stats - GP:

http://www.hockey-reference.com/teams/STL/1969.html

76 scheduled games. Total GP surpasses 76.

Canadiens1958 is offline   Reply With Quote
Old
07-31-2012, 04:02 PM
  #10
Trebek
Mod Supervisor
 
Trebek's Avatar
 
Join Date: Sep 2005
Posts: 2,841
vCash: 500
Quote:
Originally Posted by Canadiens1958 View Post
Overages - see the goalie stats - GP:

http://www.hockey-reference.com/teams/STL/1969.html

76 scheduled games. Total GP surpasses 76.
You surely realize that, on occasion, more than one goaltender plays in a game? It's pretty common.

It wasn't as common back then, but surely the Robbie Irons story is well-known?

Trebek is offline   Reply With Quote
Old
07-31-2012, 05:49 PM
  #11
seventieslord
Moderator
 
seventieslord's Avatar
 
Join Date: Mar 2006
Location: Regina, SK
Country: Canada
Posts: 26,026
vCash: 500
Quote:
Originally Posted by Canadiens1958 View Post
Overages - see the goalie stats - GP:

http://www.hockey-reference.com/teams/STL/1969.html

76 scheduled games. Total GP surpasses 76.
The minutes do add up, which is infinitely more important. Wouldn't you agree?

seventieslord is online now   Reply With Quote
Old
07-31-2012, 06:32 PM
  #12
Canadiens1958
Registered User
 
Canadiens1958's Avatar
 
Join Date: Nov 2007
Posts: 12,027
vCash: 500
Suspended Playoff Games

Suspended playoff games are another example. 1988 and 1951:

!988 featured a suspended game 4 in the finals.

http://www.hockey-reference.com/teams/BOS/1988.html

http://www.hockey-reference.com/team...988_games.html

Note that the players get credit for the extra game as some Bruins top at 23 games but the team gets credit for 22. The series is portrayed as a 4-0 Edmonton sweep even though there was a suspended tie.

1951 featured a suspended game 2 - curfew in the semi-finals:

http://www.hockey-reference.com/teams/BOS/1951.html

http://www.hockey-reference.com/team...951_games.html

Note that the Bruin players get credit for 6 games where applicable but the team gets credit for 5 games. The series is portrayed as a 4-2 Toronto victory, when in fact is was 4-1 with a suspended tie set aside. The Bruin goalies show a record of 1W and 4 L.

In one instance a suspended tie is set aside completely. In another instance a suspended tie is attributed as a W for one team and a L for another. However the respective goalies do not get credit for a win or a tie.

Used the Bruins to illustrate the overages and incongruencies but in both instances the respective Oiler and Leaf stats mirror the situation.


Last edited by Canadiens1958: 07-31-2012 at 06:41 PM.
Canadiens1958 is offline   Reply With Quote
Old
07-31-2012, 07:17 PM
  #13
seventieslord
Moderator
 
seventieslord's Avatar
 
Join Date: Mar 2006
Location: Regina, SK
Country: Canada
Posts: 26,026
vCash: 500
Quote:
Originally Posted by Canadiens1958 View Post
Suspended playoff games are another example. 1988 and 1951:

!988 featured a suspended game 4 in the finals.

http://www.hockey-reference.com/teams/BOS/1988.html

http://www.hockey-reference.com/team...988_games.html

Note that the players get credit for the extra game as some Bruins top at 23 games but the team gets credit for 22. The series is portrayed as a 4-0 Edmonton sweep even though there was a suspended tie.

1951 featured a suspended game 2 - curfew in the semi-finals:

http://www.hockey-reference.com/teams/BOS/1951.html

http://www.hockey-reference.com/team...951_games.html

Note that the Bruin players get credit for 6 games where applicable but the team gets credit for 5 games. The series is portrayed as a 4-2 Toronto victory, when in fact is was 4-1 with a suspended tie set aside. The Bruin goalies show a record of 1W and 4 L.

In one instance a suspended tie is set aside completely. In another instance a suspended tie is attributed as a W for one team and a L for another. However the respective goalies do not get credit for a win or a tie.

Used the Bruins to illustrate the overages and incongruencies but in both instances the respective Oiler and Leaf stats mirror the situation.
This is news to me, but I donít think itís really a problem. At some point it had to have been decided that the players get credit for playing in an incomplete game, but the teams donít see the games on their record, because the games werenít finished. That is fair. No real impact to the teams, who had to replay the game, whereas you could see players wanting to be sure the points they collected in theses incomplete games were recorded.

seventieslord is online now   Reply With Quote
Old
07-31-2012, 07:20 PM
  #14
Trebek
Mod Supervisor
 
Trebek's Avatar
 
Join Date: Sep 2005
Posts: 2,841
vCash: 500
Quote:
Originally Posted by Canadiens1958 View Post
another example.
When your first example has already been debunked, you aren't allowed to dodge the question and then refer to "another example".

Even in this example, you haven't demonstrated yet that it matters.

Trebek is offline   Reply With Quote
Old
07-31-2012, 07:49 PM
  #15
Canadiens1958
Registered User
 
Canadiens1958's Avatar
 
Join Date: Nov 2007
Posts: 12,027
vCash: 500
Either / Or

Quote:
Originally Posted by seventieslord View Post
The minutes do add up, which is infinitely more important. Wouldn't you agree?
If a choice has to be made then the minutes balancing is the better alternative. Point is that until Ron Andrews moved the NHL in the minutes direction when recording GAA average and reflecting goalies participation in games the data was presented in terms of games.

The basic issue is moving forward and getting the best statistical description possible.Yes the minutes balance, reflecting split games and it advances the understanding of split games under the two goalie sytem and the replacement phenomena under the previous one goalie system. However the presentation raises additional questions - namely which goalie started more games. Performance as a starter vs performance as a second goalie under the two goalie system. This additional data could provide a deeper understanding of coaches decisions - looking at Keenan's quick hook and its benefits, etc.

Canadiens1958 is offline   Reply With Quote
Old
07-31-2012, 08:04 PM
  #16
Canadiens1958
Registered User
 
Canadiens1958's Avatar
 
Join Date: Nov 2007
Posts: 12,027
vCash: 500
Ron Andrews

Quote:
Originally Posted by Taco MacArthur View Post
When your first example has already been debunked, you aren't allowed to dodge the question and then refer to "another example".

Even in this example, you haven't demonstrated yet that it matters.
The two main objections Ron Andrews ran into when he introduced the various changes and methodology to NHL statistics were sharing internal knowledge which the press and fans were anxious to have and the importance of the new methodology. The newspapers wanted it both ways. They wanted the data but were unwilling to dedicate the extra space to report the knowledge as part of the stats package.

Most evident was the reluctance of newspapers to publish proper box scores including shots on goal, assists as awarded - 1st and 2nd. Games played were viewed as insignificant until the forties. Today minutes and seconds matter.

The purpose of this thread is to perpetuate the spirit and methodology that Ron Andrews brought to NHL statistics.

Canadiens1958 is offline   Reply With Quote
Old
07-31-2012, 08:12 PM
  #17
Canadiens1958
Registered User
 
Canadiens1958's Avatar
 
Join Date: Nov 2007
Posts: 12,027
vCash: 500
Shortages

Quote:
Originally Posted by Iain Fyffe View Post
Are there any cases of overages? The two examples you provided are shortages, which are understandable. Either a team doesn't dress the maximum, or doesn't play players who are dressed. We only need to explain overages if there are any overages to explain.


Does this disagree with official NHL data? IIRC hockey-reference was first build out of the Total Hockey data set, and of course Total Hockey is "The Official Encyclopedia of the National Hockey League (tm)".

We are of course constrained by the information we have. Even the official stats are not 100% reliable (see Rick Tocchet having two assists added to his record years later). The only illusion of accuracy that exists is the one that one lets oneself believe.
Actually shortages could provide very important data. Baseball has data about the number of games a player spends on a team's game roster. Similar data about the NHL players would allow for a better understanding of a player's role and value to a team, coaching decisions and philosophies about how game roster players are used. Example knowing that a goon was dressed for multiple games but stepped on the ice for only one provides a better picture of his actual hockey value or ability to do anything else.

Canadiens1958 is offline   Reply With Quote
Old
07-31-2012, 08:51 PM
  #18
Iain Fyffe
Hockey fact-checker
 
Iain Fyffe's Avatar
 
Join Date: Feb 2009
Location: Fredericton, NB
Country: Canada
Posts: 3,078
vCash: 500
Quote:
Originally Posted by Canadiens1958 View Post
Overages - see the goalie stats - GP:

http://www.hockey-reference.com/teams/STL/1969.html

76 scheduled games. Total GP surpasses 76.
Wow. You realize that all NHL teams have "overages" now in the goaltending category, yes? Goaltenders share games. This is not an overage in the sense you meant it in your original comment. By this logic they also have an overage at, for example, centre because their centres add up to more than 76 GP.

Quote:
Originally Posted by Canadiens1958 View Post
Suspended playoff games are another example. 1988 and 1951:

!988 featured a suspended game 4 in the finals.
This is not a discrepancy, but an official scoring choice (not to mention an extreme outlier). Again it's not an overage in the sense you meant it. Remember that you started this thread with the claim that the data is flawed. Your purported examples suggest nothing of the sort.

Quote:
Originally Posted by Canadiens1958 View Post
Actually shortages could provide very important data.
Indeed they can. Which in turn suggests that the GP data is not flawed, as you claimed and titled this thread, but is in fact useful.

So which is it: the shortages provide important data, or the shortages are themselves indicative of flawed data?

Iain Fyffe is offline   Reply With Quote
Old
07-31-2012, 09:04 PM
  #19
Canadiens1958
Registered User
 
Canadiens1958's Avatar
 
Join Date: Nov 2007
Posts: 12,027
vCash: 500
Incomplete

Quote:
Originally Posted by Iain Fyffe View Post
Wow. You realize that all NHL teams have "overages" now in the goaltending category, yes? Goaltenders share games. This is not an overage in the sense you meant it in your original comment. By this logic they also have an overage at, for example, centre because their centres add up to more than 76 GP.


This is not a discrepancy, but an official scoring choice (not to mention an extreme outlier). Again it's not an overage in the sense you meant it. Remember that you started this thread with the claim that the data is flawed. Your purported examples suggest nothing of the sort.


Indeed they can. Which in turn suggests that the GP data is not flawed, as you claimed and titled this thread, but is in fact useful.

So which is it: the shortages provide important data, or the shortages are themselves indicative of flawed data?
Flawed incorporates elements such as a lack of completeness which create an opportunity for improvement.

Today the centers are viewed in the context of actual minutes/seconds played today which is the consequence of Ron Andrews listing goalies in terms of minutes/seconds. The shortages have the flaw of lacking completeness since they are not balanced with game roster data.

Canadiens1958 is offline   Reply With Quote
Old
07-31-2012, 09:20 PM
  #20
Trebek
Mod Supervisor
 
Trebek's Avatar
 
Join Date: Sep 2005
Posts: 2,841
vCash: 500
You keep quoting posts like you're intending to respond to them, and then completely changing the subject.

Are you here to monologue, or to dialogue?

Trebek is offline   Reply With Quote
Old
07-31-2012, 09:22 PM
  #21
Iain Fyffe
Hockey fact-checker
 
Iain Fyffe's Avatar
 
Join Date: Feb 2009
Location: Fredericton, NB
Country: Canada
Posts: 3,078
vCash: 500
Quote:
Originally Posted by Canadiens1958 View Post
The shortages have the flaw of lacking completeness since they are not balanced with game roster data.
No, that's not incompleteness because the stat in question is games played, not games dressed. I agree that a games dressed number could be useful (which European stats for goaltenders usually include), however not that useful, because it's fairly rare for a player to dress but not see a second of ice time. Somewhat useful, but not nearly enough to suggest that an entire data set is flawed because of its lack.

You're now complaining that the stat doesn't represent something it was never meant to represent. You're all over the ice in this thread. You still haven't shown why "HR Games GP Data Is Flawed" in any meaningful way.

Iain Fyffe is offline   Reply With Quote
Old
08-01-2012, 07:05 PM
  #22
BM67
Registered User
 
BM67's Avatar
 
Join Date: Mar 2002
Location: In "The System"
Country: Canada
Posts: 4,595
vCash: 500
Quote:
Originally Posted by Iain Fyffe View Post
That's apparently not an NHL error. I'm looking at the aforementioned Total Hockey (first edition), and Cleghorn has those games in his record. So you're presumably looking at an input error.
It is missing in the 2nd edition though, and it's also missing from the HHoF site and other official NHL sources.

I pointed out an error in Leo Reise Sr's record and it was corrected, but I've gotten no response when I reported the Cleghorn error.

BM67 is offline   Reply With Quote
Old
08-01-2012, 09:04 PM
  #23
Iain Fyffe
Hockey fact-checker
 
Iain Fyffe's Avatar
 
Join Date: Feb 2009
Location: Fredericton, NB
Country: Canada
Posts: 3,078
vCash: 500
Quote:
Originally Posted by BM67 View Post
It is missing in the 2nd edition though
I wonder if we should blame Ernie or James for that one? Since it's a change from the previous edition, that sounds like an unintentional deletion.

Quote:
Originally Posted by BM67 View Post
and it's also missing from the HHoF site and other official NHL sources.
The HHoF stats are full of holes. Use them at your peril.

If you want good numbers, you really ought to join SIHR to get access to the database. An error such as this doesn't last long there.

Iain Fyffe is offline   Reply With Quote
Reply

Forum Jump


Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -5. The time now is 01:33 AM.

monitoring_string = "e4251c93e2ba248d29da988d93bf5144"
Contact Us - HFBoards - Archive - Privacy Statement - Terms of Use - Advertise - Top - AdChoices

vBulletin Copyright ©2000 - 2015, Jelsoft Enterprises Ltd.
HFBoards.com is a property of CraveOnline Media, LLC, an Evolve Media, LLC company. ©2015 All Rights Reserved.