I posted it on my blog shawnthehockeykid.wordpress.com (It is new so it is the only post right now if you visit it) Please give me your feedback.

In this post I attempt to post a debate on the state of statistics. My theories stem from the simple ideas and contents of hockey.

Goals are the number one offensive statistic in hockey. It is how you win games. However, one can’t forget how important the process leading up to a goal. Yes assists theoretically show this, but every play in hockey is motivated to come one step closer to scoring a goal. Every play has a purpose of putting the puck in the back of the net. Since every play can’t be measured by a statistic, I fooled around with zone time as a reflector. A good defensive forward, for example, should have a higher offensive and neutral zone time because they should be skilled at forechecking and backchecking. However, you run into the usual problem: team skill. So I try to make proportions.

Players average offensive/neutral/defensive zone time per game

__________________________________________________ _______

(team’s average offensive/neutral/defensive zone time per game * k)

K = player’s average time on ice / team’s average time on ice

*offensive/neutral/defensive are three different stats

The hope here is that defensive forwards and good puck moving defensemen will pop out with these statistics. It should be used more or less for finding good depth players.

Where the play starts also has an effect on a player’s zone time. Faceoffs taken in the defensive zone should be a positive statistic and negative for the offensive zone. In order to modify it we need to add the variable b in the numerator for offensive zone and defensive zone proportional time on ice.

B = number of zone starts (offensive/defensive) * +/- .10

Please give me feedback and thank you for reading.

]]>In this post I attempt to post a debate on the state of statistics. My theories stem from the simple ideas and contents of hockey.

Goals are the number one offensive statistic in hockey. It is how you win games. However, one can’t forget how important the process leading up to a goal. Yes assists theoretically show this, but every play in hockey is motivated to come one step closer to scoring a goal. Every play has a purpose of putting the puck in the back of the net. Since every play can’t be measured by a statistic, I fooled around with zone time as a reflector. A good defensive forward, for example, should have a higher offensive and neutral zone time because they should be skilled at forechecking and backchecking. However, you run into the usual problem: team skill. So I try to make proportions.

Players average offensive/neutral/defensive zone time per game

__________________________________________________ _______

(team’s average offensive/neutral/defensive zone time per game * k)

K = player’s average time on ice / team’s average time on ice

*offensive/neutral/defensive are three different stats

The hope here is that defensive forwards and good puck moving defensemen will pop out with these statistics. It should be used more or less for finding good depth players.

Where the play starts also has an effect on a player’s zone time. Faceoffs taken in the defensive zone should be a positive statistic and negative for the offensive zone. In order to modify it we need to add the variable b in the numerator for offensive zone and defensive zone proportional time on ice.

B = number of zone starts (offensive/defensive) * +/- .10

Please give me feedback and thank you for reading.

Does anybody know where I could find data on what the score was at the time of penalty calls? I would like to see what trends exist for tied, leading by x, and trailing by y. It would be great if I didn't have to sift through the sheets on NHL.com.

]]>As the title implies, I am new to sabremetrics/advances stats in hockey. This forum has been very helpful so far in the learning process, however I am having a difficult time with these stats. I understand what these metrics indent to measure but I am having a hard time in truly grasping them/arrivng at them.

For example, let's say I have a particular play by play summary published by NHL (you know, with all the corsi events and who was on the ice for each team when they occurred). Can someone please walk me through how I would get the QualComp and CorsiRelQoC for a particular player for that game?

]]>For example, let's say I have a particular play by play summary published by NHL (you know, with all the corsi events and who was on the ice for each team when they occurred). Can someone please walk me through how I would get the QualComp and CorsiRelQoC for a particular player for that game?

(originally from www.playfor60minutes.wordpress.com)

by Fangda Li

Unfortunately, this series will be going on semi-hiatus after this instalment until after the February 2015 Sloan Sports Analytics Conference.

Is all on-ice time equal? Despite the fluidity of the game of hockey, one can still observe significant substructure in terms of event density (per unit time) when considering the massive sample size of every regular season game from 2008-2014. We observe that the "flow" of the game changes quite significantly within each period.

Fig. 1 shows the raw home and away Corsi event density, per 30-second bin. The large spikes at the beginning and end of each period correspond to the time it takes to advance the puck from neutral zone face-off, and the nonchalant "throw-puck-at-net" tactic when time expires.

In the even strength data, the Corsi density strongly decreases as a function of time elapsed in the period. In the power play data, the opposite trend is observed, but this is probably because of the greater density of penalties as the period progresses (see "physicality" below).

Fig. 2 shows the Corsi density normalized to the average number of occurrences per 30 second-bin. We observe that there is up to a 30% swing in the density from the beginning to the end of the first period, with a similar increase in power play Corsi density. Is this solely because of the existence of more penalties?

Fig. 3 shows penalty and hit density. The most remarkable feature is the steady decline in hit density as the game wears on (is this fatigue?). Furthermore, penalties are more than 20% rarer in the 3rd period than in the 2nd period. Is this referee bias?

Fig. 4 shows the relative goal density. This is not particularly illuminating until we consider:

The preliminary conclusion of Fig. 5 is that even strength Corsi events in the middle of the second period are up to 15% more likely to be goals than average. Of course, the statistical significance of this is questionable, and more analysis - particularly with regard to score effects - is required. I think consideration of the above data in the Fourier-domain would be particularly fruitful.

Another piece of the puzzle may be the face-off win densities:

]]>by Fangda Li

Unfortunately, this series will be going on semi-hiatus after this instalment until after the February 2015 Sloan Sports Analytics Conference.

Is all on-ice time equal? Despite the fluidity of the game of hockey, one can still observe significant substructure in terms of event density (per unit time) when considering the massive sample size of every regular season game from 2008-2014. We observe that the "flow" of the game changes quite significantly within each period.

Fig. 1 shows the raw home and away Corsi event density, per 30-second bin. The large spikes at the beginning and end of each period correspond to the time it takes to advance the puck from neutral zone face-off, and the nonchalant "throw-puck-at-net" tactic when time expires.

In the even strength data, the Corsi density strongly decreases as a function of time elapsed in the period. In the power play data, the opposite trend is observed, but this is probably because of the greater density of penalties as the period progresses (see "physicality" below).

Fig. 2 shows the Corsi density normalized to the average number of occurrences per 30 second-bin. We observe that there is up to a 30% swing in the density from the beginning to the end of the first period, with a similar increase in power play Corsi density. Is this solely because of the existence of more penalties?

Fig. 3 shows penalty and hit density. The most remarkable feature is the steady decline in hit density as the game wears on (is this fatigue?). Furthermore, penalties are more than 20% rarer in the 3rd period than in the 2nd period. Is this referee bias?

Fig. 4 shows the relative goal density. This is not particularly illuminating until we consider:

The preliminary conclusion of Fig. 5 is that even strength Corsi events in the middle of the second period are up to 15% more likely to be goals than average. Of course, the statistical significance of this is questionable, and more analysis - particularly with regard to score effects - is required. I think consideration of the above data in the Fourier-domain would be particularly fruitful.

Another piece of the puzzle may be the face-off win densities:

I’ve diverted myself from my NHL Play-by-Play Data Mining series to write this short existential essay on "Analytics" in the NHL and sports in general.

by Fangda Li

(originally from www.playfor60minutes.wordpress.com)

I think front offices, bettors, and fans would all agree that the ultimate purpose of "analytics" is the ability to predict events in the future more accurately and precisely than what was previously possible.

The NHL - and indeed professional sports in general - is currently mired an overabundance of "statistics" with a shortage of "analytics". Upcoming player/puck tracking data will only aggravate this situation, in which the progress of theory greatly lags behind the progress of observation. It is easy to understand why this has become the case. Improving the breadth and quality of one’s data is almost strictly an engineering and economic exercise, whereas the development of theory requires deep insights into a complicated system of interacting humans, which is very hard in the general case.

In the face of attempting to understand such a spectacularly complex system, I advocate adopting a physicists’ plan of attack:

1. Make some simplifying assumptions about our system that make the following calculations easier, no matter how crude they may initially be.

2. Pursue the line of reasoning that our assumptions lead. Draw the inevitable conclusions.

3. See how poorly/well these conclusions describe our available data. See how many free parameters our model needed.

4. Think about where our model failed, think about the limitations of our model. Go back to Step 1 with a better/more detailed/more elegant model.

Note that the primary role of "advanced statistics" is only in Step 3 - model validation.

Let’s think about how the first iteration of this process would look in the case of NHL hockey:

1. Model goal-scoring in hockey as a Poisson process; thus the goal difference at the end of a game follows a Skellam distribution.

2. This model does cannot distinguish between the contribution from different players, so all individual (standard and "advanced") statistics for all players are drawn from an identical distribution.

3. Very poor fit to data, but only 1 parameters needed for each team, that describes the overall "talent level" of the team. Nonetheless such a model could reasonably predict the overall standings at the end of an 82-game season.

4. Our second iteration should probably consider the fact that players are intrinsically different from each other, and the subtle details the distribution of goal scoring in games (ala A.C. Thomas 2007).

As an example, our third iteration could consider the second-order effects of inter-player interactions; the fourth iteration could consider the effect of shot clustering and zone possession times. Of course, the specific topics and order in which one chooses to investigate them is largely a matter of taste.

This process can continue without bound, and with each iteration will come a more accurate/precise description and prediction of hockey games at an ever-increasing level of granularity.

The guiding question at each iteration must be: "How much better can you describe and predict the observed data after incorporating the process/effect you are investigating?"

In other words, we will be at last doing science.

]]>by Fangda Li

(originally from www.playfor60minutes.wordpress.com)

I think front offices, bettors, and fans would all agree that the ultimate purpose of "analytics" is the ability to predict events in the future more accurately and precisely than what was previously possible.

The NHL - and indeed professional sports in general - is currently mired an overabundance of "statistics" with a shortage of "analytics". Upcoming player/puck tracking data will only aggravate this situation, in which the progress of theory greatly lags behind the progress of observation. It is easy to understand why this has become the case. Improving the breadth and quality of one’s data is almost strictly an engineering and economic exercise, whereas the development of theory requires deep insights into a complicated system of interacting humans, which is very hard in the general case.

In the face of attempting to understand such a spectacularly complex system, I advocate adopting a physicists’ plan of attack:

1. Make some simplifying assumptions about our system that make the following calculations easier, no matter how crude they may initially be.

2. Pursue the line of reasoning that our assumptions lead. Draw the inevitable conclusions.

3. See how poorly/well these conclusions describe our available data. See how many free parameters our model needed.

4. Think about where our model failed, think about the limitations of our model. Go back to Step 1 with a better/more detailed/more elegant model.

Note that the primary role of "advanced statistics" is only in Step 3 - model validation.

Let’s think about how the first iteration of this process would look in the case of NHL hockey:

1. Model goal-scoring in hockey as a Poisson process; thus the goal difference at the end of a game follows a Skellam distribution.

2. This model does cannot distinguish between the contribution from different players, so all individual (standard and "advanced") statistics for all players are drawn from an identical distribution.

3. Very poor fit to data, but only 1 parameters needed for each team, that describes the overall "talent level" of the team. Nonetheless such a model could reasonably predict the overall standings at the end of an 82-game season.

4. Our second iteration should probably consider the fact that players are intrinsically different from each other, and the subtle details the distribution of goal scoring in games (ala A.C. Thomas 2007).

As an example, our third iteration could consider the second-order effects of inter-player interactions; the fourth iteration could consider the effect of shot clustering and zone possession times. Of course, the specific topics and order in which one chooses to investigate them is largely a matter of taste.

This process can continue without bound, and with each iteration will come a more accurate/precise description and prediction of hockey games at an ever-increasing level of granularity.

The guiding question at each iteration must be: "How much better can you describe and predict the observed data after incorporating the process/effect you are investigating?"

In other words, we will be at last doing science.

New member here, read the FAQ and Intro sticky but still need a little help.

I know what Corsi is and how to calculate it. My question is regarding the data needed to arrive/calculate it. NHL.com doesn't appear to keep records of shots, missed shots and blocked shots both for and against while a particular player is on the ice?

How do the big analytic sites eventually arive to the calculated Corsi? Are they actually watching and recording each shot and noting who is on the ice? Or am I missing something and are they actually deriving it from more fundamental data that perhaps NHL.com provides?

Please help

]]>I know what Corsi is and how to calculate it. My question is regarding the data needed to arrive/calculate it. NHL.com doesn't appear to keep records of shots, missed shots and blocked shots both for and against while a particular player is on the ice?

How do the big analytic sites eventually arive to the calculated Corsi? Are they actually watching and recording each shot and noting who is on the ice? Or am I missing something and are they actually deriving it from more fundamental data that perhaps NHL.com provides?

Please help

I am looking at a number of different methods to compare goalies to each other beyond the traditional goalie stats - gas, sp etc. However, I am finding it difficult to adjust stats based on the goalie of the goaltenders teammates or team. Does anyone know of a method that can be used to accomplish this?

]]>I am looking to find team payrolls for every season since the 2004-2005 lockout. Capgeek only has 2009-2010 to present, wikipedia has until 2007-2008 so I am looking for somewhere I can find the 2008-2009 payrolls or all years if possible. Any help would be much appreciated.

]]>