By The NumbersHockey Analytics... the Final Frontier. Explore strange new worlds, to seek out new algorithms, to boldly go where no one has gone before.

Modified Save Percentage - Goals Against and Scoring Chances

Modified Save Percentage - Goals Against and Scoring Chances

I've done some rudimentary analysis on the scoring chance information that some (about half) of the teams have available through bloggers. I say rudimentary because there is a lot of blogs that only track even strength scoring chances (ESSC), many that don't track them at all, and the ones that do track it differ fairly significantly on their interpretation of a scoring chance. In any case, I believe there is enough information out there to accurately judge the viability of my method, even if we cannot draw any hard and fast conclusions from it. I'll explain after the tables the methods I used to try and minimize the impact of these data inconsistencies.

Well anyway, the tables! SCFA/SCAA Scoring Chance For/Against Average SCSP Scoring Chance Save percentage

Below are the problems I noticed with data and how I approached minimizing them.

1. The team blogger only shows even strength scoring chance data (LAK, SJS)

Many teams had both even strength scoring chance data and overall data, so I calculated power play minutes per game of every team in the league, and averaged the ratio between PPM/G and the difference between ESSC and SC weighted towards the number of data points I had for that particular team. Basically, the more recorded games I had, the more their value influenced the overall average. The idea is this will give me a value for roughly how many scoring chances to expect per, for example, 2 minutes of power play time. Then it became a simple process of adding the expected PP and PK (it was the exact same process with PKM/G) scoring chances to SCF and SCA respectively. This correction value is highlighted in a light orange in the spreadsheet. Any chances given up while on the power play or generated while shorthanded are unfortunately ignored entirely, because it can vary so wildly from team to team and I did not want to apply any kind of average to every team.

2. I could not find scoring chance data for a particular game

I would record chances for any game against a team for whom I could find data. If two teams that are missing data play, well I just pretend those games don't exist for statistical purposes and place a huge mental asterisk on any averages for those teams.

3. Two teams have conflicting data for the same game

This actually wasn't a huge problem. Most of the time, the difference was only one or two chances, and often the numbers would match up entirely. However, there are some blogs that are conservative with giving out scoring chances (Toronto, San Jose, Los Angeles, for example) and some that are liberal (Carolina). I also noticed a few cases where the blogger in question was conservative with scoring chances against, but very liberal with scoring chances for (the Flames guy was atrocious about this). In any case where I get conflicting data, I just take whichever value is higher and use it. The logic is that while there may be some debate on what constitutes a scoring chance, there's a pretty strong consensus on the stuff that is not a scoring chance. Since I'm mainly focused on eliminating those perimeter shots that are stopped 99% of the time, I choose to include any chance that's arguable because odds are it's probably still a legitimate scoring chance.

Last edited by hairylikebear: 03-11-2013 at 03:55 PM.

Very interesting - thanks for putting it together.

I went down this road a few years back, and where I got tripped up was here: the term "shot on goal" is well-defined in the National Hockey League, and yet we see examples where one scorer consistently over (or under)counts.

The term "scoring chance" is not well-defined, so I'd expect the variance from rink to rink to be much larger. How do we best account for this?

The term "scoring chance" is not well-defined, so I'd expect the variance from rink to rink to be much larger. How do we best account for this?

Almost everyone that tracks this stuff has a pretty consistent definition, but not everyone uses the same one. They all seem to have the same rules with regard to the "home plate" though some use a straight line between the dots, some use a rounded line, some are very lenient about borderline cases and some are very strict. Some include screened point shots on goal, some include shots from outside home plate after X amount of puck movement (where X is completely subjective), some automatically include any shot on goal generated from an odd man rush, etc. All of this is simply a function of the lack of any central authority enforcing the standards for what a scoring chance should be. If the problem is that it is not well defined, we just need to define it.

The variance from rink to rink can be corrected by adding more people to record the data, which is obviously way easier said than done, but even having one person from each team working on it improves the data tremendously. With enough data, we can begin to see outliers that highlight discrepancies in recording styles. If one team seems to always record x% lower scoring chances than the team they are playing, we can correct their numbers, though individual game data will still be somewhat dubious. However, it should be emphasized that averaging the numbers recorded for the same game will not improve accuracy.

The single biggest problem with this is that not all scoring chances are shots on goal so it is difficult to assign a save % on chances where the goaltender did not have to make a save.

The single biggest problem with this is that not all scoring chances are shots on goal so it is difficult to assign a save % on chances where the goaltender did not have to make a save.

Not necessarily. A goalie simply being in position to force a shooter to shoot wide is noteworthy. The only times a scoring chance does not also register a shot on goal are situations within the scoring chance criteria in which the shooter (not named Patrik Stefan) would score every time if the goalie was not there, such as wide open shots from the slot.

Not necessarily. A goalie simply being in position to force a shooter to shoot wide is noteworthy. The only times a scoring chance does not also register a shot on goal are situations within the scoring chance criteria in which the shooter (not named Patrik Stefan) would score every time if the goalie was not there, such as wide open shots from the slot.

While I agree with your point, his point (also valid) is that in the situation described, there isn't a save to be made, and so how do you calculate a "save percentage" for that situation?

The answer is probably (1 - goals allowed) / (scoring chances), without regard for whether or not a save was actually made.

Not necessarily. A goalie simply being in position to force a shooter to shoot wide is noteworthy. The only times a scoring chance does not also register a shot on goal are situations within the scoring chance criteria in which the shooter (not named Patrik Stefan) would score every time if the goalie was not there, such as wide open shots from the slot.

Or if a dman makes a save on behalf of the goalie (no shot), the shooter hits the post (no shot) or any other number of instances where there is no shot recorded.

While I agree with your point, his point (also valid) is that in the situation described, there isn't a save to be made, and so how do you calculate a "save percentage" for that situation?

The answer is probably (1 - goals allowed) / (scoring chances), without regard for whether or not a save was actually made.

Scoring chance (SC) sv% is just GA/SC. Whether it's a shot or not is irrelevant.

Quote:

Originally Posted by Fish on The Sand

Or if a dman makes a save on behalf of the goalie (no shot), the shooter hits the post (no shot) or any other number of instances where there is no shot recorded.

Blocked shots don't count as scoring chances. Hitting the post is the same as missing the net.

There is a big difference between a Crosby scoring chance and a Colton Orr scoring chance. They are not equal, so as long as there's just a defined ''scoring chance'' no matter the player the advanced stat, like all hockey advanced stats, will be badly flawed. There are so many variables in hockey, WAY more than baseball and you have to consider everything or its just misleading.

There is a big difference between a Crosby scoring chance and a Colton Orr scoring chance. They are not equal, so as long as there's just a defined ''scoring chance'' no matter the player the advanced stat, like all hockey advanced stats, will be badly flawed. There are so many variables in hockey, WAY more than baseball and you have to consider everything or its just misleading.

Well that's true of any of the traditional statistics as well. There are tap in goals and there are solo deke through 5 people snipe top corner goals, but they still count the same. Then there's an assist that generates a tap in vs an outlet pass to a guy who does all the work that both count the same. Even with the current sv% metric, there are Jeff Woywitka points shots and there are Crosby shots from the slot that both count the same.

Scoring chances are still susceptible to the same flaws. It actually exists as a response to your criticism with regard to shots and saves. The beauty of scoring chances is that, if we wanted to, we could consider the caliber of the shooter in borderline cases.

The issue here is sample size. Save percentage itself is heavily dependent on luck over the course of a season, and it becomes even more important if you reduce the sample (both in the type of shots, and number of games). It's interesting to look at, but pretty meaningless as it stands.

Would like to see what the numbers look like for Lehner and Bishop, too.

Anderson has been out of this world.

By his methodology,

Bishop .8359
Lehner .8787

The problem as I see it is that the SCA is an average for the whole team, not just the starters, so if one goalie gets sheltered games, the team buckles down on chances allowed for their back-up or for any reason some games skew the averages, the numbers won't be entirely accurate for the goalie specific stats.

The issue here is sample size. Save percentage itself is heavily dependent on luck over the course of a season, and it becomes even more important if you reduce the sample (both in the type of shots, and number of games). It's interesting to look at, but pretty meaningless as it stands.

As far as SV% is concerned, all goals against will be counted, "lucky" or otherwise. Shots from the scoring chance pentagon will be included, regardless of luck. The only situation in which there will be a difference involving luck is when a "lucky" shot from outside of the scoring chance area goes in (ie a softy, which counts as a GA but not a scoring chance) or when the goalie makes a "lucky" save on a shot that comes from outside the SC area. The idea behind the pentagon is that any shot from outside the pentagon is relatively harmless, and therefore no save from there can be a "lucky" one. Unless there is a deflection, but those count as scoring chances.

It does reduce the sample size a bit, but at least the dependent variable is kept the same so it minimizes that effect as much as possible.

Quote:

Originally Posted by Micklebot

By his methodology,

Bishop .8359
Lehner .8787

The problem as I see it is that the SCA is an average for the whole team, not just the starters, so if one goalie gets sheltered games, the team buckles down on chances allowed for their back-up or for any reason some games skew the averages, the numbers won't be entirely accurate for the goalie specific stats.

That's not true. I did it that way because I'm missing data and also because I'm lazy. If I had scoring chance data for every game (or even a timestamp for every scoring chance), it would then be possible to determine who was on the ice for them, including the goalie. After that the math is pretty simple.

That's not true. I did it that way because I'm missing data and also because I'm lazy. If I had scoring chance data for every game (or even a timestamp for every scoring chance), it would then be possible to determine who was on the ice for them, including the goalie. After that the math is pretty simple.

Not trying to criticize, I think the data is pretty cool. But I'm not sure where I was off base. From what I could tell, you used the average scoring chances per game, and adjusted each starters GAA toi yield the SCA Sv%. if that's the case, I think what I said was accurate.

Most of the teams Scoring chances data I've seen is game by game, so you could track down the starter for each game to refine it, but it's probably more effort than its worth. TBH, I think data for backups would be skewed simply because of the smaller sample size.

Anyhow, I enjoyed your work and completely off topic, found it funny that the tables auto-sort does not handle negative numbers well.

Not trying to criticize, I think the data is pretty cool. But I'm not sure where I was off base. From what I could tell, you used the average scoring chances per game, and adjusted each starters GAA toi yield the SCA Sv%. if that's the case, I think what I said was accurate.

Most of the teams Scoring chances data I've seen is game by game, so you could track down the starter for each game to refine it, but it's probably more effort than its worth. TBH, I think data for backups would be skewed simply because of the smaller sample size.

Anyhow, I enjoyed your work and completely off topic, found it funny that the tables auto-sort does not handle negative numbers well.

You're right to criticize, completely. That's exactly how I did it and I agree it is a pretty major flaw as it stands. However, it's just a demonstration of the concept.
In a perfect world we would have people from all thirty teams measuring scoring chances under very strict and thorough guidelines with timestamps. If we had that quality of data, I believe we could have the first "advanced" goalie stat on our hands.

You're right to criticize, completely. That's exactly how I did it and I agree it is a pretty major flaw as it stands. However, it's just a demonstration of the concept.
In a perfect world we would have people from all thirty teams measuring scoring chances under very strict and thorough guidelines with timestamps. If we had that quality of data, I believe we could have the first "advanced" goalie stat on our hands.

Closest I can think of is shot trackers and counting only shots from "home plate", but it lacks shots that go wide, are blocked, or hit the post. Also, they tend to lack coordinates for some shots.

This site tracks shots and what player is on the ice. Not sure where he mines the data from, but it's out there somewhere: http://somekindofninja.com/nhl/

You're right to criticize, completely. That's exactly how I did it and I agree it is a pretty major flaw as it stands. However, it's just a demonstration of the concept.
In a perfect world we would have people from all thirty teams measuring scoring chances under very strict and thorough guidelines with timestamps. If we had that quality of data, I believe we could have the first "advanced" goalie stat on our hands.

GameCenter could be the solution to this. Get a group of fans together to go through every game from this season and record the scoring chances and the goaltender in net for said chance.

It may improve the original stats by allowing you to a) strictly define 'scoring chance,' b) avoid missing data sets, c) devise a way to further eliminate homerism either through participant selection criteria or some other mechanism

__________________ CanadianHockey________ __ __________Sens, Oilers, and Team Canada

GameCenter could be the solution to this. Get a group of fans together to go through every game from this season and record the scoring chances and the goaltender in net for said chance.

It may improve the original stats by allowing you to a) strictly define 'scoring chance,' b) avoid missing data sets, c) devise a way to further eliminate homerism either through participant selection criteria or some other mechanism

I saw an article which provided an outline for the "scoring chance" area. It was basically the slot; the area on both sides of the net, between the faceoff dots back to the top of the circles, including the whole top of the circles. I'll see if I can dig it up.

I saw an article which provided an outline for the "scoring chance" area. It was basically the slot; the area on both sides of the net, between the faceoff dots back to the top of the circles, including the whole top of the circles. I'll see if I can dig it up.

So, that could be used along with Gamecenter to provide a more concrete definition of "scoring chance."

Yeah, that's the "home plate" definition, which is probably the most widely used. The one thing that becomes a point of concern is when a shot from outside the home plate scores. You get a goal without a scoring chance. Or when you have a back door play where the goalie plays the shooter but a pass is made to someone just outside of the home plate with a wide open cage.

Nothing will be perfect, but I think some degree of judgement needs to come into play when determining a scoring chance.