HFBoards

Go Back   HFBoards > General Hockey Discussion > By The Numbers
Mobile Hockey's Future Become a Sponsor Site Rules Support Forum vBookie Page 2
By The Numbers Hockey Analytics... the Final Frontier. Explore strange new worlds, to seek out new algorithms, to boldly go where no one has gone before.

Database of all plays

Reply
 
Thread Tools
Old
06-09-2013, 05:50 PM
  #1
notdave
Registered User
 
notdave's Avatar
 
Join Date: Apr 2013
Location: Halifax, NS
Country: Canada
Posts: 26
vCash: 500
Database of all plays

Hi all,

I've been working on compiling a database of all plays for all games, using the NHL.com html Play By Play reports.

My plan is to get this all into an online database that is very user friendly and easily filterable, so anyone can extract, analyze and calculate whatever they want.

I exported one game's worth of the work-in-progress database to Excel and have attached it to this post. Could you guys take a look at this and suggest additional fields that should be added or changes that should be made?

I'm already working on cleaning up the player (PL1 to PL12) fields, so don't worry about those for now.

Thanks,
Dave
Attached Files
File Type: xlsx Game 989.xlsx‎ (68.0 KB, 51 views)

notdave is offline   Reply With Quote
Old
06-09-2013, 09:25 PM
  #2
Cunneen
Registered User
 
Join Date: May 2013
Posts: 94
vCash: 500
looks good so far, keep up the good work.

Cunneen is offline   Reply With Quote
Old
06-09-2013, 11:30 PM
  #3
SaskRinkRat
Registered User
 
Join Date: Apr 2010
Posts: 390
vCash: 500
Looks really good. Incidentally, I've been working on basically the exact same project over the past few weeks.

One thing I noticed - and perhaps you have a more sophisticated reformatting process than mine which would make this point irrelevant - is that sometimes the "player" columns get shifted, so that the first 6 don't always belong to one team and the next 6 the other team. I was using a fairly rudimentary scraping technique with excel, so if your's is more sophisticated, this might not be an issue.

The only other column I'd like to see you add, off the top of my head, is the shot distance column.

Also, I know you said you're still working on the player columns, so you might have this covered, but it might be wise to keep the formatting of the players in those columns the same as the players in the previous columns. So, for example, if you use [Team] [Number] [Last Name] format it the hit column, you might also want to use that same format in the PL1 column.

Great work though.

SaskRinkRat is offline   Reply With Quote
Old
06-10-2013, 12:58 AM
  #4
ponder
Registered User
 
ponder's Avatar
 
Join Date: Jul 2007
Location: Vancouver
Country: Canada
Posts: 11,645
vCash: 500
In the fields describing the shooter, scorer, etc., it might be nice to transform the data before uploading it to a database so that every player has a unique id. Using data in the format in this spreadsheet will mean that one player will have different ids at different times if he changes teams, changes his number, etc. If you give each player a unique id in these fields, it will be much easier to look up any of these stats (penalties drawn, hits, etc.) for ever player over any amount of time.

ponder is offline   Reply With Quote
Old
06-10-2013, 03:53 AM
  #5
HolyShot*
Sniper
 
Join Date: Feb 2012
Location: Glendale, CA
Posts: 2,015
vCash: 500
One thing that you could do is include hyperlinks to the big stuff like goals, assists, hits, saves, blocks, etc. that link to videos. That would be bad ass.

HolyShot* is offline   Reply With Quote
Old
06-10-2013, 02:21 PM
  #6
notdave
Registered User
 
notdave's Avatar
 
Join Date: Apr 2013
Location: Halifax, NS
Country: Canada
Posts: 26
vCash: 500
Thanks for the input, everyone.

Sask: Added Shot Distance per your suggestion, along with shot type.

Ponder: I'm working on a master list of players, and will be sure to add a unique ID. Wonder if I could just iterate through the NHL.com player pages (ex http://www.nhl.com/ice/player.htm?id=8470121) easily and use their identifiers.

Hitman: That's a good idea. I'll look into it for sure. The NHL.com boxscores link to the goals using javascript, which isn't my area of expertise, but I'll put it on the list.

notdave is offline   Reply With Quote
Old
06-11-2013, 01:07 PM
  #7
66871
Registered User
 
66871's Avatar
 
Join Date: May 2009
Location: Maine
Country: United States
Posts: 1,673
vCash: 500
Looks pretty good. However, a few comments.

I'm unclear about the shot data that is coming out. It looks like there are shots taken and then a subset of that is shots on goal (so you have miss types including out of play or wide right or, I think, blocked)

Secondly, maybe think about two forms of output. One would be a straight table and the other would be a normalized database with player IDs, team IDs, game IDs and a player_game relationship table which also indicated what team the player was on for that particular game, his position and what number he wore (trades). Such a set up would lend itself better to querying than repeating the team and sweater number info in every field

Since you have a category for strength, why not pack some more info into the data you are storing. Rather than EV or blank, maybe characterize the play with a number 3.3 for three on three, 4.3 if the home team has four men on the ice and the away team has three. etc etc. So your values would be 3.3, 4.4, 5.5, 3.5, 4.5, 5.3, 5.4. So if I was interested in all the five on three face-off data I would only need to filter for 3.5 and 5.3. Also, in gives a better indication of 4 on 4 and 3 on 3 situations.

I notice that the DESCR field would generally serve as Stoppage Type field except that sometimes there is mention of a TV time-out or a team calling timeout.

If it was me, I would take those timeouts and simply add them in as the next row. In reality they are a separate event. It's just that they are lumped in because the clock time is the same.

Are you going to try to parse out if a team has its goalie pulled?

Finally, perhaps a field which lists the penalty that is called (holding, boarding etc.).

66871 is offline   Reply With Quote
Old
06-20-2013, 08:09 AM
  #8
number72
Registered User
 
Join Date: Oct 2011
Posts: 5,883
vCash: 500
Quote:
Originally Posted by notdave View Post
Hi all,

I've been working on compiling a database of all plays for all games, using the NHL.com html Play By Play reports.

My plan is to get this all into an online database that is very user friendly and easily filterable, so anyone can extract, analyze and calculate whatever they want.

I exported one game's worth of the work-in-progress database to Excel and have attached it to this post. Could you guys take a look at this and suggest additional fields that should be added or changes that should be made?

I'm already working on cleaning up the player (PL1 to PL12) fields, so don't worry about those for now.

Thanks,
Dave
Is it possible to break out shots into multiple fields.
Shot distance
shot type

That is, if I want to look at shot distances I have no easy way to get that information from your table without editing it. As a stand alone column it would be more useful.

number72 is offline   Reply With Quote
Old
06-20-2013, 04:51 PM
  #9
supahdupah
Registered Boozer
 
supahdupah's Avatar
 
Join Date: Apr 2010
Location: Ottawa
Posts: 2,996
vCash: 500
Not to derail but, I can create any format/view you want based on the way I import the data. I store all the games as streams of events. Mine also combines the json & html reports(includes event location data), links the nhl.com player id. I have a DB of 17000 players, as well as every draft. I am almost done. I am just cleaning up the handling of shootouts.

If anyone thinks this is interesting I will keep working on it. I am also doing all the shift reports.


Last edited by supahdupah: 06-20-2013 at 04:56 PM.
supahdupah is online now   Reply With Quote
Old
06-20-2013, 04:59 PM
  #10
hockeyjack89
R.I.P. Thrashers
 
hockeyjack89's Avatar
 
Join Date: Mar 2012
Country: United States
Posts: 1,423
vCash: 500
This sounds pretty cool!! Keep working on it and improving it!

hockeyjack89 is offline   Reply With Quote
Old
06-20-2013, 07:14 PM
  #11
DL44
Registered User
 
DL44's Avatar
 
Join Date: Sep 2006
Location: Left Coast
Posts: 5,503
vCash: 133
300 (at least) 'plays' a game x 1230 gms/yr... that would be a pretty insane amount of data and info!

Its how the information can be used once it's all in there thats intriguing!

i.e.
-Player X was involved in Y plays the most.... in the 2nd period..
-Player W's average shot distance was Z..
-or W led the league in backhand goals...
-or most goals from within 5 feet...
-or player G led the league in hits in the 1st period..
- players are injured the most at the X point of the game.
-most offsides occur at X point of the game. hell.. just Offsides/gm alone..
- most givaways in the Ozone... or most giveaways in the Dzone?
-what teams put the puck out of the rink the most..

endless!


Last edited by DL44: 06-20-2013 at 07:20 PM.
DL44 is offline   Reply With Quote
Reply

Forum Jump


Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -5. The time now is 12:54 AM.

monitoring_string = "e4251c93e2ba248d29da988d93bf5144"
Contact Us - HFBoards - Archive - Privacy Statement - Terms of Use - Advertise - Top - AdChoices

vBulletin Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
HFBoards.com is a property of CraveOnline Media, LLC, an Evolve Media, LLC company. 2014 All Rights Reserved.