League average in the playoffs is not the kind of context that we're looking for in a rational analysis, as there is precious little common ground from goalie to goalie. Achieving a state of ceteris parabis in such an analysis is extremely, extremely difficult, if not impossible.

This is where your mistake lies. A mistaken hypothesis which is known to he hugely flawed is worse than no hypothesis, at all. I strongly dislike scientific approaches which refuse to discard their hypotheses when they become unworkable. Although the goal may be noble, if the results are producing obvious gibberish (Bill Ranford as #1 playoff goalie of all-time or Tony Esposito as better than Ken Dryden) then we need to discard them and improve the method.

You are attempting to smuggle game results (the old method which you implicitly reject) into the results of your SV% analysis. As manuevers go, it is weak, and as analysis goes, it is patently false. If SV% correlated so well with game results, then Hall and Fuhr should have nearly identical postseason records, as their "vs. average" postseason SV% results are quite similar. And yet they could hardly be further apart.

This is science, not a ****ing marriage. If the method obviously doesn't work, you throw it out and work towards a better one.

When you say "they could hardly be further apart" I can't even discern which way you lean. On one hand, Hall is a consensus top-6 goalie and Fuhr is likely outside of the top-20. On the other hand, Fuhr won 4 cups as a starter and is known for being 'clutch' while Hall's numbers get worse and he won less cups than he likely was "supposed" to. And you won last ATD with Fuhr. Clarify your position.

With that said, are you saying that the method "obviously doesn't work" because it doesn't show the results that YOU think it should? If that is the case, why do any numerical analyses, then?

We know Hall is regarded as a better goalie. We know Fuhr had more playoff success. We want to know their individual roles in that outcome. Their save percentages in the playoffs are one piece of the puzzle; a bigger piece than you make it out to be.

When you say "they could hardly be further apart" I can't even discern which way you lean. On one hand, Hall is a consensus top-6 goalie and Fuhr is likely outside of the top-20. On the other hand, Fuhr won 4 cups as a starter and is known for being 'clutch' while Hall's numbers get worse and he won less cups than he likely was "supposed" to. And you won last ATD with Fuhr. Clarify your position.

With that said, are you saying that the method "obviously doesn't work" because it doesn't show the results that YOU think it should? If that is the case, why do any numerical analyses, then?

We know Hall is regarded as a better goalie. We know Fuhr had more playoff success. We want to know their individual roles in that outcome. Their save percentages in the playoffs are one piece of the puzzle; a bigger piece than you make it out to be.

I'd just like to jump in and note that Hall didn't win less cups than he was "supposed" to. The Chicago Blackhawks may have won less than they were "supposed" to, but that's not pinned on Hall.

The numerical analyses can be misleading and incorrect. The biggest example in this would be in Hall in the 1963; both s% and GAA suggest Hall played poorly- but the game accounts said he didn't, and he was really one of the only two consistent good performers on the Hawks that series, it seems. I believe Sturm has a trouble with it because those stats won't paint the complete picture, as shown.

Save percentage is a piece of the puzzle; but in some instances I am unsure of how big a piece.

Last edited by Leafs Forever: 12-06-2009 at 02:12 PM.

When you say "they could hardly be further apart" I can't even discern which way you lean. On one hand, Hall is a consensus top-6 goalie and Fuhr is likely outside of the top-20. On the other hand, Fuhr won 4 cups as a starter and is known for being 'clutch' while Hall's numbers get worse and he won less cups than he likely was "supposed" to. And you won last ATD with Fuhr. Clarify your position.

First off, ATD performance is irrelevant. When we start projecting fantasy performance into further fantasy performance, we've gone right off the deep end. It doesn't matter one bit that Grant Fuhr backstopped the last two winners.

As far as the "they could hardly be further apart" comment goes, I was referring to the correlation between vs. average playoff SV% and game results. Hall and Fuhr are extremely similar in the SV% numbers, and yet their game results (and the Cups that come with them) are widely different. If these factors correlated as you stated, then Fuhr and Hall should be close in terms of playoff games won/lost, but they are not, in spite of the similarity in their career vs. league average SV% numbers.

Quote:

With that said, are you saying that the method "obviously doesn't work" because it doesn't show the results that YOU think it should? If that is the case, why do any numerical analyses, then?

In the hard sciences, theories are easily testable. If I produce a physics model that says light has mass, I can test this model and discard it if it proves to be false. How do I test the model? I observe if what the model predicts actually happens. But what we are doing is not so easily testable. It is historical analysis - looking backwards attempting to make sense out of what happened. The only way of testing whether the numerical analysis is worth a damn is to compare the numbers to the observation: namely, the game accounts, such as they are (objectivity is another question).

Without the game accounts, the numbers are useless, as we have no way of knowing if they have any connection to observed reality. In scientific analysis, observation trumps analytic method when the two conflict. Without the game accounts, we have no way of testing if any numerical analysis of goaltending performance makes sense, and even with them, our ability to test such models is quite limited.

To make matters worse, stopping the puck is a statistically insignificant event. Stopping the puck is the expected outcome - it occurs on roughly 90% of shots. This is a critical problem of analyzing goaltender performances within small sample sizes. For forwards, scoring a goal is a matter of large statistical significance because of the rarity of goalscoring. Let's say, in a very rough and ready style, that the average scoringline forward scores 1 goal for every 60 minutes of gameplay (and we'll assume 20 minutes of ATOI - so one goal every 3 games). Over the course of a 7 game series, then, we expect approximately 2.33 goals from the average scoringline forward. Deviations from this number become significant very quickly for reasons which should be obvious. An extra two goals basically doubles the output from average.

Compare this to an average goalie. Let's say that an average goalie plays the full game, faces 30 shots per game, and has a 90% save percentage - so he makes 27 saves per game. Over the course of a 7 game series, then, he's making about 189 saves. If he makes 191 saves (+2 above average), it is of far less statistical significance than the extra two goals scored by the forward. In order to establish statistical significance for save percentage, then, we need a lot more data than we do for the forwards and their goals.

But you say, why don't we measure instead when the goaltender fails? Isn't that statistically significant? Sure it is: we call that stat Goals Against Average. Of course, I shouldn't have to explain the problems with seperating goaltender from team performance by this metric.

Are we beginning to see the problem here? Goalscoring is roughly 10 times more statistically significant than goal-stopping. Analyzing performance in pressure situations always involves smaller sample sizes. A single "big goal" (in playoff overtime, for example) has significant statistical significance. Score three of them in one playoffs and it's safe to say that you performed well in pressure situations that playoff season. But what about the "big save"? Does making three overtime saves compare statistically to scoring three goals? Do I really have to ask?

I'm not saying it's impossible to create useful analytic models for retrospective analysis; if it were, I wouldn't have a job. I am saying that it is very difficult and that one must be careful in doing so and skeptical at every step of the way. By digging into and trying to make sense of these playoff goalie stats, you boys are trying to build a Formula 1. Just because you have built a wheel does not mean you are ready to start racing it.

First off, ATD performance is irrelevant. When we start projecting fantasy performance into further fantasy performance, we've gone right off the deep end. It doesn't matter one bit that Grant Fuhr backstopped the last two winners.

All I meant by that is that you picked Fuhr, and therefore appear to like him, so I wasn't sure who was so far apart from whom.

Quote:

As far as the "they could hardly be further apart" comment goes, I was referring to the correlation between vs. average playoff SV% and game results. Hall and Fuhr are extremely similar in the SV% numbers, and yet their game results (and the Cups that come with them) are widely different. If these factors correlated as you stated, then Fuhr and Hall should be close in terms of playoff games won/lost, but they are not, in spite of the similarity in their career vs. league average SV% numbers.

Right, which highlights the importance of determining each goalie's individual contribution to those results with something that goes deeper than wins and losses.

Quote:

In the hard sciences, theories are easily testable. If I produce a physics model that says light has mass, I can test this model and discard it if it proves to be false. How do I test the model? I observe if what the model predicts actually happens. But what we are doing is not so easily testable. It is historical analysis - looking backwards attempting to make sense out of what happened. The only way of testing whether the numerical analysis is worth a damn is to compare the numbers to the observation: namely, the game accounts, such as they are (objectivity is another question).

Without the game accounts, the numbers are useless, as we have no way of knowing if they have any connection to observed reality. In scientific analysis, observation trumps analytic method when the two conflict. Without the game accounts, we have no way of testing if any numerical analysis of goaltending performance makes sense, and even with them, our ability to test such models is quite limited.

Isn't the general opinion of the two goalies that Fuhr performed well with a very, very strong team in front of him and won more cups, while Hall performed adequately but not enough to overcome his team's lack of depth and the dynasties that reigned at the time? I wouldn't say that the game results and their sv% results disagree all that much.

Quote:

But you say, why don't we measure instead when the goaltender fails? Isn't that statistically significant? Sure it is: we call that stat Goals Against Average. Of course, I shouldn't have to explain the problems with seperating goaltender from team performance by this metric.

No kidding.

There's another stat that isn't official and that would be the "error rate" which is just 1.00 minus save percentage. The difference between 91 and 93 may not be so much, but the difference between 7 and 9 is huge. Correct me if I'm wrong, but judging the sv% by how many "points" off from the average is, is just a simplified and slightly less exact method than comparing actual error rates. (if sv%'s had varied more widely over time, I'd see the need to do it that way)

Quote:

I'm not saying it's impossible to create useful analytic models for retrospective analysis; if it were, I wouldn't have a job. I am saying that it is very difficult and that one must be careful in doing so and skeptical at every step of the way. By digging into and trying to make sense of these playoff goalie stats, you boys are trying to build a Formula 1. Just because you have built a wheel does not mean you are ready to start racing it.

At some point you pegged me as someone who thinks he has all the answers just by doing some basic arithmetic with old sv% numbers. I am not sure where you got that from. Before you even began posting in this thread, I told LF his game accounts were better.

But you say, why don't we measure instead when the goaltender fails? Isn't that statistically significant? Sure it is: we call that stat Goals Against Average. Of course, I shouldn't have to explain the problems with seperating goaltender from team performance by this metric.

Sturm, let's measure when the goaltender fails and call it Shooting Percentage Against Average, or 1-SV%. Isn't that statistically significant, and at least more separated from team performance than Goals Against Average? I don't understand why you say that it's statistically insignificant when a goalie makes an extra two saves, but statistically significant when a skater scores and extra two goals, and base that on the fact that the goalie saved a lot of other shots. The two events have an identical marginal impact.

If you want to say that shooters are more responsible for individual goals for than goalies are responsible for individual goals against, you could make that argument. But the argument you made made no sense to me at all - it seems to impute significance to the fact that we measure goalie's performance in terms of (saves/shots) rather than (goals against/shots).

There's another stat that isn't official and that would be the "error rate" which is just 1.00 minus save percentage. The difference between 91 and 93 may not be so much, but the difference between 7 and 9 is huge. Correct me if I'm wrong, but judging the sv% by how many "points" off from the average is, is just a simplified and slightly less exact method than comparing actual error rates. (if sv%'s had varied more widely over time, I'd see the need to do it that way)

You're still missing the point. Yes, of course the difference between a .910 and a .930 SV% is quite large. Of course, you are deliberately using an extreme example: .930 is an superhuman save percentage. And yes, you are wrong. Allow me to elaborate.

If you want to follow my model above, the difference of two "errors" that we are talking about over a 7 game series in which goalies are facing 30 shots each with a SV% of .900 is the difference between 21 and 23 goals - less than 10%. Compare that to the difference between 2.33 and 4 goals for an individual skater - a difference of 70%. The scoring of an individual skater is just one small statistical piece of team offense, but the errors of the goalie are a reflection of the entire team's defensive play. The goalie is on the ice for the whole game and every goal gets counted against him. The skater is on the ice for about 1/3 the time and only gets credit for goals that he, himself, knocks in. He is one of many. The goalie is one of one. An example:

A playoff game goes into overtime. One of the teams must score before they can go home. Therefore, each of the two goalies has a 50% chance of making an "error" in an equal setting. If we assume that each scoringline forward plays at a rate that would equal 20 minutes of TOI in a 60 minute game, then he is on the ice for 1/3 of the overtime. He is also one man among three on his line, and playing for one of two teams. 3 * 3 * 2 = 18. So we're up to a roughly 1 in 18 chance that he scores the OT winning goal, and that is before we factor in the possibility of defensemen scoring. We'll say 1 in 20. It takes an entire order of magnitude fewer instances of the goalscoring event to reach statistical significance than it does the goal-allowing event.

We're back to the 1/10 statistical significance ratio comparing the significance of an individual forward scoring a goal vs. a goalie allowing a goal. The average goalie going into OT has a 50/50 chance of letting a goal by him - the same chance that his team has of winning - while the average scoringline forward has a 1 in 20 chance of ending the game with a goal. Statistically, the forward's individual performance is roughly 10 times easier to seperate from the performance of his team. This is not a perfect metric: team play, linemates, etc. also applies to the statistical performance of forwards, but that argument only underscores the need for greater questioning of the sample sizes of forwards who didn't get a lot of time in the playoffs, especially those who played on bad playoff teams (Teemu Selanne and Andy Bathgate, come on down); it does nothing to diminish the need for large sample sizes when evaluating goaltending statistics. The correct statistical significance comparison between goalscoring and goal-allowing would be between the goalie (goals allowed) and his entire team (goals scored).

Attempting a rational analysis of playoff goaltending statistics is a noble (if Sisyphean) cause, but the inherent statistical problems with such analysis must be acknowledged and I have yet to see such an acknowledgement from any of the people pushing this form of analysis. I am making such a fuss about this because I know most ATD GM's do not understand statistics implicitly and because I'd like to see this process lead us towards truth rather than away from it. ****, I feel like Eisenhower talking about the "military industrial complex".

Quote:

At some point you pegged me as someone who thinks he has all the answers just by doing some basic arithmetic with old sv% numbers. I am not sure where you got that from.

You are one of a growing group (though you may be the ringleader) which seems to be infatuated with new and "innovative" uses of goaltending statistics. In the case of of this thread, LF quoted the numerical results of an analysis (which I think was yours) without any kind of context behind them. I called his use of those numbers "brutal", and it absolutely was. I don't blame you for AIDS, North Korean nukes or Sarah Palin, but I think you should be more cautious with how you handle clutch performance among goalies. It is a statistical pandora's box.

Quote:

Before you even began posting in this thread, I told LF his game accounts were better.

Good. The point of my posting all this is to keep it that way. By the way, I drafted Grant Fuhr twice in a row because I thought he was a good value. There is no emotional connection, whatsoever, to Fuhr on my part. In fact, I was an Islanders fan during Fuhr's heyday.

If you want to follow my model above, the difference of two "errors" that we are talking about over a 7 game series in which goalies are facing 30 shots each with a SV% of .900 is the difference between 21 and 23 goals - less than 10%. Compare that to the difference between 2.33 and 4 goals for an individual skater - a difference of 70%.

So how come, in this particular example, it is not just accepted that a 70% difference for a skater is equivalent to a 10% difference for a goalie? Aside from that, the point that overpass brought up, which echose mine, is a good one. why are we focused on 23 saves versus 21 instead of 9 goals against versus 7?

I didn't mean to use a "superhuman" example. In the same way, there is a big difference between 91% and 89%.

Quote:

The correct statistical significance comparison between goalscoring and goal-allowing would be between the goalie (goals allowed) and his entire team (goals scored).

Good series Kimberley. I'm going to be honest and say it stinks to lose when your opponent makes one post and you make a full arguement (not that I fault you for not making arguements; I understand the outside world can get in the way and is more important), but you put together a great team Mr.Bugg I am not ashamed of losing to.

Last edited by Leafs Forever: 12-07-2009 at 06:15 PM.

Good series Kimberley. I'm going to be honest and say it stinks to lose when your opponent makes one post and you make a full arguement (not that I fault you for not making arguements; I understand the outside world can get in the way and is more important), but you put together a great team Mr.Bugg I am not ashamed of losing to.

I absolutely feel bad about it and I'm willing to come back to the series when I have more time to fully argue why I felt my team was suprior, but you're right- the real world is killing me right now. You know the Roar of the Rings going on at Rexall? I am there from the moment the first rock goes to the last, all the way into next Sunday.

Having said that, you have to remember that the argument portion isn't going to change the minds of most people. A lot of GMs won't even read the whole thing. While I admire the work you guys put in, that's human nature. One semi-lengthy post arguing my team is in my mind sufficient- for both myself and the other GM. I would never feel insulted or slighted if that's only what they had time for, because chances are in any given week that's about what I can spare.

So how come, in this particular example, it is not just accepted that a 70% difference for a skater is equivalent to a 10% difference for a goalie? Aside from that, the point that overpass brought up, which echose mine, is a good one. why are we focused on 23 saves versus 21 instead of 9 goals against versus 7?

You are confusing yourself here. The 23 vs. 21 is talking about goals allowed over the course of a series, as is the 2 vs. 4 goals scored for an individual forward. 9 vs. 7 is a much lower rate of goals allowed than what happens in reality. You are mixing up your sample sets. 9 vs. 7 is the difference in goals allowed over 100 shots against, not an entire series. If the rate of goals allowed was actually so low, they would have more statistical significance and be easier to analyze over small sample sizes.

Assume that a goal scored and a goal prevented are of equal value. This is actually untrue because you cannot win a playoff game on goals prevented, alone, but in the regular season shootout era it is a theoretical difference in all games other than 1-0 playoff contests. Over the course of a 7 game series, a goalie who prevents 2 extra goals has the same value as a forward who scores 2 extra goals. Going with our earlier model, we see the problem clearly:

- if 21 goals is the expected outcome for an average goalie, giving up only 19 is the result from a truly exceptional performance (+2 goals differential from a single player in a series is big). This breaks down to a .909 SV% over the series assuming 210 shots faced - a .09% improvement over the expected average of .900, or 9.5% fewer goals when comparing the raw numbers (21 vs. 19). The problem here is that the statistical likelihood of a variation of +/- 2 in a set where the expected mean (over 7 iterations, or games) is 21 is quite large - roughly ten times as likely as variation in a set where the expected mean is 2. Team goals occur frequently, therefore luck plays a large role in short term outcomes when discussing these numbers. Goalscoring by individual forwards occurs much less frequently, and the luck factor can be more easily isolated over small sample sizes.

Call the above The Luck Problem, which is different than The Team Play Problem, which is also much more difficult to isolate for goalies than it is for individual forwards.

I have already explained this issue enough. If you don't understand the sample size problems by now I'm not sure how much better I can explain it in laymen's terms. We are not going to get into alternative vs. null hypotheses and type errors here, so if you really want to understand the numbers behind it, that's up to you. To point you in the right direction here, I am saying that the analysis of goaltending playoff performances I've seen over the past weeks is absolutely rife with Type I errors which stem from a misunderstanding of the statistical power of the data and a conflation of the data with that of another data set (forward scoring) which has a much higher statistical power.

That's it: I won't explain it any deeper here.

Quote:

In other words - wins?

Goals Against Average, actually. I know, I know...it's a depressing answer, but GAA is hardly different than SV%, and at this point may actually be better because at least when dealing with GAA people have a clear understanding of how imperfect the data is when applied to the goalie, alone. SV% is a better isolated goalie stat than GAA, though by how much is highly debatable - it suffers from the same small sample statistical significance and global "team play" problems that GAA suffers from - they're just better hidden.

You are confusing yourself here. The 23 vs. 21 is talking about goals allowed over the course of a series, as is the 2 vs. 4 goals scored for an individual forward. 9 vs. 7 is a much lower rate of goals allowed than what happens in reality. You are mixing up your sample sets. 9 vs. 7 is the difference in goals allowed over 100 shots against, not an entire series. If the rate of goals allowed was actually so low, they would have more statistical significance and be easier to analyze over small sample sizes.

Assume that a goal scored and a goal prevented are of equal value. This is actually untrue because you cannot win a playoff game on goals prevented, alone, but in the regular season shootout era it is a theoretical difference in all games other than 1-0 playoff contests. Over the course of a 7 game series, a goalie who prevents 2 extra goals has the same value as a forward who scores 2 extra goals. Going with our earlier model, we see the problem clearly:

- if 21 goals is the expected outcome for an average goalie, giving up only 19 is the result from a truly exceptional performance (+2 goals differential from a single player in a series is big). This breaks down to a .909 SV% over the series assuming 210 shots faced - a .09% improvement over the expected average of .900, or 9.5% fewer goals when comparing the raw numbers (21 vs. 19). The problem here is that the statistical likelihood of a variation of +/- 2 in a set where the expected mean (over 7 iterations, or games) is 21 is quite large - roughly ten times as likely as variation in a set where the expected mean is 2. Team goals occur frequently, therefore luck plays a large role in short term outcomes when discussing these numbers. Goalscoring by individual forwards occurs much less frequently, and the luck factor can be more easily isolated over small sample sizes.

Call the above The Luck Problem, which is different than The Team Play Problem, which is also much more difficult to isolate for goalies than it is for individual forwards.

I have already explained this issue enough. If you don't understand the sample size problems by now I'm not sure how much better I can explain it in laymen's terms. We are not going to get into alternative vs. null hypotheses and type errors here, so if you really want to understand the numbers behind it, that's up to you. To point you in the right direction here, I am saying that the analysis of goaltending playoff performances I've seen over the past weeks is absolutely rife with Type I errors which stem from a misunderstanding of the statistical power of the data and a conflation of the data with that of another data set (forward scoring) which has a much higher statistical power.

That's it: I won't explain it any deeper here.

That's fine - you've got me thinking anyway.

My point was not to microanalyze but to look at their career as a whole. If sample sizes of playoffs that can be as short as 300 minutes bother you, would a career sample help to smooth those out? It's not hard to take a "vs. average" number and "weigh" it each playoff based on minutes played to come up with a career weighted vs. average number. I could have easily done that but I thought the breakdown of how each goalie got to that final number would be considered important.

Quote:

Goals Against Average, actually. I know, I know...it's a depressing answer, but GAA is hardly different than SV%, and at this point may actually be better because at least when dealing with GAA people have a clear understanding of how imperfect the data is when applied to the goalie, alone. SV% is a better isolated goalie stat than GAA, though by how much is highly debatable - it suffers from the same small sample statistical significance and global "team play" problems that GAA suffers from - they're just better hidden.

I think the two are further apart than you make them out to be.

GAA = (1-sv%) * SA/60min

GAA is a goalie's error rate (sv% looked at in reverse), which is the best individual raw goalie stat, yet still team dependent, times shots against, which a goalie has minimal control over (almost none, I'd say)