HFBoards

HFBoards (http://hfboards.hockeysfuture.com/index.php)
-   By The Numbers (http://hfboards.hockeysfuture.com/forumdisplay.php?f=241)
-   -   Heads-Up: HR Games GP Data Is Flawed (http://hfboards.hockeysfuture.com/showthread.php?t=1238931)

Canadiens1958 07-30-2012 01:07 AM

Heads-Up: HR Games GP Data Is Flawed
 
The alarm went off after a comment by Theokritos and I have started checking other instances.

Simply a 76 game regular season NHL schedule with a roster of 16 skaters produces 1216 games, 76 x 16 = 1216.

In the Estimated Time on Ice Thread, looking at the 1968-69 Bruins season reveals a 35 game shortage in games played.

http://www.hockey-reference.com/teams/BOS/1969.html


Checking the 1968-69 Canadiens data reveals another discrepancy from 1216.

http://www.hockey-reference.com/teams/MTL/1969.html

Govern your studies accordingly.

ssh 07-30-2012 02:32 AM

Thanks for the heads up. Those aren't the only cases though. It's not until the 80's when teams regularly have full rosters credited for each game. In the 70's most teams come up short. Even in the last couple of decades some teams are missing a game or two.

Some of the missing games are probably caused by players being dressed but sitting on the bench the whole game. Then of course there's the possibility of bad or missing official data. How much of a problem that is is very difficult to estimate since (nearly?) all websites and books around use the same data.

Canadiens1958 07-30-2012 07:01 AM

Overage
 
Quote:

Originally Posted by ssh (Post 53106753)
Thanks for the heads up. Those aren't the only cases though. It's not until the 80's when teams regularly have full rosters credited for each game. In the 70's most teams come up short. Even in the last couple of decades some teams are missing a game or two.

Some of the missing games are probably caused by players being dressed but sitting on the bench the whole game. Then of course there's the possibility of bad or missing official data. How much of a problem that is is very difficult to estimate since (nearly?) all websites and books around use the same data.

Does not explain instances where overages happen. Also assuming that there are no compensating mistakes that create an illusion of accuracy.

Publishing the same data or a close proximity is one thing. Using the data and drawing conclusions as if it were official NHL data is a different topic altogether.

ssh 07-30-2012 07:24 AM

Quote:

Originally Posted by Canadiens1958 (Post 53108233)
Does not explain instances where overages happen. Also assuming that there are no compensating mistakes that create an illusion of accuracy.

Publishing the same data or a close proximity is one thing. Using the data and drawing conclusions as if it were official NHL data is a different topic altogether.

Sadly there's not much one can do without having to do unreasonable amounts of legwork. NHL has been very poor at publishing official data, at least online.
Also, the quality of the official data is unknown. IIRC there have been threads here regarding clear errors in official numbers. Not to mention, of course, all the subjective errors done by game officials, such as wrong goal and assist credit, shot counts, ice time etc.

overpass 07-30-2012 08:28 AM

The worst error I've seen is the failure to credit Sprague Cleghorn for the three games he played for Ottawa at the beginning of the 1920-21 season. The whole Cleghorn saga was one of the biggest stories of the season - how could the stats miss it? Unless the NHL decided that Cleghorn never should have played for Ottawa in those games and erased them from the record (he had been transferred by the league to Hamilton, but refused to report.)

Newspaper reports all agree that Doug Young scored the winning goal in Game 3 of the 1934 Cup Finals on a long shot, but Young's official stats for those playoffs show 0 G, 0 A.

Canadiens1958 07-30-2012 08:59 AM

True
 
Quote:

Originally Posted by overpass (Post 53109185)
The worst error I've seen is the failure to credit Sprague Cleghorn for the three games he played for Ottawa at the beginning of the 1920-21 season. The whole Cleghorn saga was one of the biggest stories of the season - how could the stats miss it? Unless the NHL decided that Cleghorn never should have played for Ottawa in those games and erased them from the record (he had been transferred by the league to Hamilton, but refused to report.)

Newspaper reports all agree that Doug Young scored the winning goal in Game 3 of the 1934 Cup Finals on a long shot, but Young's official stats for those playoffs show 0 G, 0 A.

True. Will not get into the attributing motive or conspiracy theory games but what you posted is just a tip of the iceberg.

Basic issue is getting everyone in the chain on the same page when it comes to doing things properly in a standardized format.

First and second assists are very vulnerable to this since initially they were reported based on the referees verbal call to the scorer in the penalty box,the passed on upstairs,to the local papers and wire services,followed by newspaper box scores. So you have a chain with at least five opportunities for interchanging the order.

Iain Fyffe 07-31-2012 01:52 PM

Quote:

Originally Posted by Canadiens1958 (Post 53108233)
Does not explain instances where overages happen. Also assuming that there are no compensating mistakes that create an illusion of accuracy.

Are there any cases of overages? The two examples you provided are shortages, which are understandable. Either a team doesn't dress the maximum, or doesn't play players who are dressed. We only need to explain overages if there are any overages to explain.

Quote:

Originally Posted by Canadiens1958 (Post 53108233)
Publishing the same data or a close proximity is one thing. Using the data and drawing conclusions as if it were official NHL data is a different topic altogether.

Does this disagree with official NHL data? IIRC hockey-reference was first build out of the Total Hockey data set, and of course Total Hockey is "The Official Encyclopedia of the National Hockey League (tm)".

We are of course constrained by the information we have. Even the official stats are not 100% reliable (see Rick Tocchet having two assists added to his record years later). The only illusion of accuracy that exists is the one that one lets oneself believe.

Iain Fyffe 07-31-2012 02:00 PM

Quote:

Originally Posted by overpass (Post 53109185)
The worst error I've seen is the failure to credit Sprague Cleghorn for the three games he played for Ottawa at the beginning of the 1920-21 season. The whole Cleghorn saga was one of the biggest stories of the season - how could the stats miss it? Unless the NHL decided that Cleghorn never should have played for Ottawa in those games and erased them from the record (he had been transferred by the league to Hamilton, but refused to report.)

That's apparently not an NHL error. I'm looking at the aforementioned Total Hockey (first edition), and Cleghorn has those games in his record. So you're presumably looking at an input error.

Quote:

Originally Posted by overpass (Post 53109185)
Newspaper reports all agree that Doug Young scored the winning goal in Game 3 of the 1934 Cup Finals on a long shot, but Young's official stats for those playoffs show 0 G, 0 A.

Whereas this is presumably a difference between what the reporters saw and what was credited on the official scoresheet. All of Detroit's 18 playoff goals are accounted for, so someone else must have been credited with that goal. That's a separate concern from unreliable transcription of data.

Canadiens1958 07-31-2012 03:58 PM

Overages
 
Quote:

Originally Posted by Iain Fyffe (Post 53153919)
Are there any cases of overages? The two examples you provided are shortages, which are understandable. Either a team doesn't dress the maximum, or doesn't play players who are dressed. We only need to explain overages if there are any overages to explain.


Does this disagree with official NHL data? IIRC hockey-reference was first build out of the Total Hockey data set, and of course Total Hockey is "The Official Encyclopedia of the National Hockey League (tm)".

We are of course constrained by the information we have. Even the official stats are not 100% reliable (see Rick Tocchet having two assists added to his record years later). The only illusion of accuracy that exists is the one that one lets oneself believe.

Overages - see the goalie stats - GP:

http://www.hockey-reference.com/teams/STL/1969.html

76 scheduled games. Total GP surpasses 76.

Chalupa Batman 07-31-2012 04:02 PM

Quote:

Originally Posted by Canadiens1958 (Post 53159111)
Overages - see the goalie stats - GP:

http://www.hockey-reference.com/teams/STL/1969.html

76 scheduled games. Total GP surpasses 76.

You surely realize that, on occasion, more than one goaltender plays in a game? It's pretty common.

It wasn't as common back then, but surely the Robbie Irons story is well-known?

seventieslord 07-31-2012 05:49 PM

Quote:

Originally Posted by Canadiens1958 (Post 53159111)
Overages - see the goalie stats - GP:

http://www.hockey-reference.com/teams/STL/1969.html

76 scheduled games. Total GP surpasses 76.

The minutes do add up, which is infinitely more important. Wouldn't you agree?

Canadiens1958 07-31-2012 06:32 PM

Suspended Playoff Games
 
Suspended playoff games are another example. 1988 and 1951:

!988 featured a suspended game 4 in the finals.

http://www.hockey-reference.com/teams/BOS/1988.html

http://www.hockey-reference.com/team...988_games.html

Note that the players get credit for the extra game as some Bruins top at 23 games but the team gets credit for 22. The series is portrayed as a 4-0 Edmonton sweep even though there was a suspended tie.

1951 featured a suspended game 2 - curfew in the semi-finals:

http://www.hockey-reference.com/teams/BOS/1951.html

http://www.hockey-reference.com/team...951_games.html

Note that the Bruin players get credit for 6 games where applicable but the team gets credit for 5 games. The series is portrayed as a 4-2 Toronto victory, when in fact is was 4-1 with a suspended tie set aside. The Bruin goalies show a record of 1W and 4 L.

In one instance a suspended tie is set aside completely. In another instance a suspended tie is attributed as a W for one team and a L for another. However the respective goalies do not get credit for a win or a tie.

Used the Bruins to illustrate the overages and incongruencies but in both instances the respective Oiler and Leaf stats mirror the situation.

seventieslord 07-31-2012 07:17 PM

Quote:

Originally Posted by Canadiens1958 (Post 53163937)
Suspended playoff games are another example. 1988 and 1951:

!988 featured a suspended game 4 in the finals.

http://www.hockey-reference.com/teams/BOS/1988.html

http://www.hockey-reference.com/team...988_games.html

Note that the players get credit for the extra game as some Bruins top at 23 games but the team gets credit for 22. The series is portrayed as a 4-0 Edmonton sweep even though there was a suspended tie.

1951 featured a suspended game 2 - curfew in the semi-finals:

http://www.hockey-reference.com/teams/BOS/1951.html

http://www.hockey-reference.com/team...951_games.html

Note that the Bruin players get credit for 6 games where applicable but the team gets credit for 5 games. The series is portrayed as a 4-2 Toronto victory, when in fact is was 4-1 with a suspended tie set aside. The Bruin goalies show a record of 1W and 4 L.

In one instance a suspended tie is set aside completely. In another instance a suspended tie is attributed as a W for one team and a L for another. However the respective goalies do not get credit for a win or a tie.

Used the Bruins to illustrate the overages and incongruencies but in both instances the respective Oiler and Leaf stats mirror the situation.

This is news to me, but I donít think itís really a problem. At some point it had to have been decided that the players get credit for playing in an incomplete game, but the teams donít see the games on their record, because the games werenít finished. That is fair. No real impact to the teams, who had to replay the game, whereas you could see players wanting to be sure the points they collected in theses incomplete games were recorded.

Chalupa Batman 07-31-2012 07:20 PM

Quote:

Originally Posted by Canadiens1958 (Post 53163937)
another example.

When your first example has already been debunked, you aren't allowed to dodge the question and then refer to "another example".

Even in this example, you haven't demonstrated yet that it matters.

Canadiens1958 07-31-2012 07:49 PM

Either / Or
 
Quote:

Originally Posted by seventieslord (Post 53162671)
The minutes do add up, which is infinitely more important. Wouldn't you agree?

If a choice has to be made then the minutes balancing is the better alternative. Point is that until Ron Andrews moved the NHL in the minutes direction when recording GAA average and reflecting goalies participation in games the data was presented in terms of games.

The basic issue is moving forward and getting the best statistical description possible.Yes the minutes balance, reflecting split games and it advances the understanding of split games under the two goalie sytem and the replacement phenomena under the previous one goalie system. However the presentation raises additional questions - namely which goalie started more games. Performance as a starter vs performance as a second goalie under the two goalie system. This additional data could provide a deeper understanding of coaches decisions - looking at Keenan's quick hook and its benefits, etc.

Canadiens1958 07-31-2012 08:04 PM

Ron Andrews
 
Quote:

Originally Posted by Taco MacArthur (Post 53165255)
When your first example has already been debunked, you aren't allowed to dodge the question and then refer to "another example".

Even in this example, you haven't demonstrated yet that it matters.

The two main objections Ron Andrews ran into when he introduced the various changes and methodology to NHL statistics were sharing internal knowledge which the press and fans were anxious to have and the importance of the new methodology. The newspapers wanted it both ways. They wanted the data but were unwilling to dedicate the extra space to report the knowledge as part of the stats package.

Most evident was the reluctance of newspapers to publish proper box scores including shots on goal, assists as awarded - 1st and 2nd. Games played were viewed as insignificant until the forties. Today minutes and seconds matter.

The purpose of this thread is to perpetuate the spirit and methodology that Ron Andrews brought to NHL statistics.

Canadiens1958 07-31-2012 08:12 PM

Shortages
 
Quote:

Originally Posted by Iain Fyffe (Post 53153919)
Are there any cases of overages? The two examples you provided are shortages, which are understandable. Either a team doesn't dress the maximum, or doesn't play players who are dressed. We only need to explain overages if there are any overages to explain.


Does this disagree with official NHL data? IIRC hockey-reference was first build out of the Total Hockey data set, and of course Total Hockey is "The Official Encyclopedia of the National Hockey League (tm)".

We are of course constrained by the information we have. Even the official stats are not 100% reliable (see Rick Tocchet having two assists added to his record years later). The only illusion of accuracy that exists is the one that one lets oneself believe.

Actually shortages could provide very important data. Baseball has data about the number of games a player spends on a team's game roster. Similar data about the NHL players would allow for a better understanding of a player's role and value to a team, coaching decisions and philosophies about how game roster players are used. Example knowing that a goon was dressed for multiple games but stepped on the ice for only one provides a better picture of his actual hockey value or ability to do anything else.

Iain Fyffe 07-31-2012 08:51 PM

Quote:

Originally Posted by Canadiens1958 (Post 53159111)
Overages - see the goalie stats - GP:

http://www.hockey-reference.com/teams/STL/1969.html

76 scheduled games. Total GP surpasses 76.

Wow. You realize that all NHL teams have "overages" now in the goaltending category, yes? Goaltenders share games. This is not an overage in the sense you meant it in your original comment. By this logic they also have an overage at, for example, centre because their centres add up to more than 76 GP.

Quote:

Originally Posted by Canadiens1958 (Post 53163937)
Suspended playoff games are another example. 1988 and 1951:

!988 featured a suspended game 4 in the finals.

This is not a discrepancy, but an official scoring choice (not to mention an extreme outlier). Again it's not an overage in the sense you meant it. Remember that you started this thread with the claim that the data is flawed. Your purported examples suggest nothing of the sort.

Quote:

Originally Posted by Canadiens1958 (Post 53166681)
Actually shortages could provide very important data.

Indeed they can. Which in turn suggests that the GP data is not flawed, as you claimed and titled this thread, but is in fact useful.

So which is it: the shortages provide important data, or the shortages are themselves indicative of flawed data?

Canadiens1958 07-31-2012 09:04 PM

Incomplete
 
Quote:

Originally Posted by Iain Fyffe (Post 53167787)
Wow. You realize that all NHL teams have "overages" now in the goaltending category, yes? Goaltenders share games. This is not an overage in the sense you meant it in your original comment. By this logic they also have an overage at, for example, centre because their centres add up to more than 76 GP.


This is not a discrepancy, but an official scoring choice (not to mention an extreme outlier). Again it's not an overage in the sense you meant it. Remember that you started this thread with the claim that the data is flawed. Your purported examples suggest nothing of the sort.


Indeed they can. Which in turn suggests that the GP data is not flawed, as you claimed and titled this thread, but is in fact useful.

So which is it: the shortages provide important data, or the shortages are themselves indicative of flawed data?

Flawed incorporates elements such as a lack of completeness which create an opportunity for improvement.

Today the centers are viewed in the context of actual minutes/seconds played today which is the consequence of Ron Andrews listing goalies in terms of minutes/seconds. The shortages have the flaw of lacking completeness since they are not balanced with game roster data.

Chalupa Batman 07-31-2012 09:20 PM

You keep quoting posts like you're intending to respond to them, and then completely changing the subject.

Are you here to monologue, or to dialogue?

Iain Fyffe 07-31-2012 09:22 PM

Quote:

Originally Posted by Canadiens1958 (Post 53168183)
The shortages have the flaw of lacking completeness since they are not balanced with game roster data.

No, that's not incompleteness because the stat in question is games played, not games dressed. I agree that a games dressed number could be useful (which European stats for goaltenders usually include), however not that useful, because it's fairly rare for a player to dress but not see a second of ice time. Somewhat useful, but not nearly enough to suggest that an entire data set is flawed because of its lack.

You're now complaining that the stat doesn't represent something it was never meant to represent. You're all over the ice in this thread. You still haven't shown why "HR Games GP Data Is Flawed" in any meaningful way.

BM67 08-01-2012 07:05 PM

Quote:

Originally Posted by Iain Fyffe (Post 53154297)
That's apparently not an NHL error. I'm looking at the aforementioned Total Hockey (first edition), and Cleghorn has those games in his record. So you're presumably looking at an input error.

It is missing in the 2nd edition though, and it's also missing from the HHoF site and other official NHL sources.

I pointed out an error in Leo Reise Sr's record and it was corrected, but I've gotten no response when I reported the Cleghorn error.

Iain Fyffe 08-01-2012 09:04 PM

Quote:

Originally Posted by BM67 (Post 53201037)
It is missing in the 2nd edition though

I wonder if we should blame Ernie or James for that one? Since it's a change from the previous edition, that sounds like an unintentional deletion.

Quote:

Originally Posted by BM67 (Post 53201037)
and it's also missing from the HHoF site and other official NHL sources.

The HHoF stats are full of holes. Use them at your peril.

If you want good numbers, you really ought to join SIHR to get access to the database. An error such as this doesn't last long there.


All times are GMT -5. The time now is 11:42 AM.

vBulletin Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
HFBoards.com, A property of CraveOnline, a division of AtomicOnline LLC ©2009 CraveOnline Media, LLC. All Rights Reserved.