: Confirmed with Link:
Desharnais signed to a contract extension (4 years @ $3.5M/yr)
View Single Post
03-18-2013, 04:43 AM
Join Date: Jun 2008
Originally Posted by
you have a basis for one game.
and then click on Play-by-play on the right.
From this you can extract the info:
Trick is to extract the headers and keep the raw data (report scrapping).
Then put it in a database.
The regenerate the play-by-play (to validate the process).
Then use the database to analyze the data.
Best would be to get raw data from NHL (in table format).
All their game reports comes from a database that is almost real time (probably materialized views).
It is not so simple but quite simple, a few tables in a database (maybe 20) should do the job (teams, seasons, games, players, players-date, event-type, events, events-players (seems NHL model only handle two players per event at the moment so this may not be needed) etc.).
It represents significant work (probably hundreds of hours not thousands).
Thing is that when you produce more reports of different style, it adds more time.
Also validation of the model is adding more work.
EDIT: BTW, I am not saying how it is done but how I would do it...
Thanks so much. I'm not a techie, so I'll just trust your estimate of perhaps thousands of hours. A heady undertaking for an individual.
It looks as if the players for each event are listed, but when there are significant time gaps between events, then it's not so easy to determine who was on the ice and who wasn't. There seems to be enough inherent error due to this that-at the very least- the stats derived from this need to be interpreted in a directional rather than in any absolute manner. To what extent I wouldn't even hazard a guess at this point.
Last edited by Cyclones Rock: 03-18-2013 at
View Public Profile
Cyclones Rock's albums
Find More Posts by Cyclones Rock