By The NumbersHockey Analytics... the Final Frontier. Explore strange new worlds, to seek out new algorithms, to boldly go where no one has gone before.

An offshoot from the new Patrick Roy thread in the HoH forum...

Is anyone interested in doing a rigorous estimate of the quality of each Stanley Cup champion's postseason competition? Some ideas:

I'd say that a fair metric would be (some estimate of) a team's likelihood of winning the Cup prior to the start of the playoffs (given that you know their opponents in advance). So let's say that the 2012 Kings had the Canucks, Blues, Coyotes, and Devils on their docket, and their a priori chances of winning each round were 30%, 45%, 50% and 60% (I note that this is the crux of the analysis). Then, we'd say that the 2012 Kings had a 4.05% chance of doing what they ultimately did (winning the 2012 Stanley Cup).

That seems like a very tough set of competition (of course, I made up the probabilities). How does it compare to others'?

Flaw #1 - I realize that the metric that I propose would undervalue the competition of a truly great team. (Here's proof: holding the competition steady, if we double the ability of the Cup-winning team, then the a priori probability of winning Cup would go up substantially, even though the competition didn't change).

I'll propose a modification to my proposed metric: prior to the start of the playoffs, what is the probability that an average NHL team from the season in question could win the Cup while facing the playoff schedule that the Cup-winning team played?

I'm not sure whether it would be fairer to use an "average" team or an average playoff team.

I'd also like to see various measures of how different teams/player fared in the playoffs over longer periods vs. how they would have been expected by pythagorean win% and/or GF/GA ratios.

I've tinkered with the latter and would like to revisit it whenever I have more time to look at it more closely.

I'm not sure whether it would be fairer to use an "average" team or an average playoff team.

I can see good arguments both ways - ultimately I settled upon an average team, because that automatically self-centers (whereas the average playoff team could be of varying quality relative to the league).

I imagine that the answers would be quite similar if they were run both ways.

Tonnes of possible pitfalls. Have to account for home ice advantage, and have to account for the fact that some teams play better/worse against others, often defying "predictions" of game outcome based on relative rankings in league standings, etc. There is no "average team" in the playoffs; just one team that either matches up well against the other, or not, and the impact of playing more/less games at home (which I think is a pretty well established "advantage", despite not being able to guarantee a win). Head-to-head results in the regular season would be important, imo, but then again lots of teams change their look by the time the trade deadline rolls around.

In the end, though, I can't imagine that any rigorous study accounting for many variables is going to come any closer to successfully painting the picture than a simple comparison of average seed of team still alive in each round. Keeping with the Roy theme, for example, in 1986 the Habs' faced the #8 seed to get to a pool of 9.4 average seeded "opposition", then faced the #11 seed to get to a pool of 10.7, then faced a 14th seed to get to the Final against the #6 seed. That's a lot easier than, say, last year's Kings facing the #1 seed to get to a pool of 7.3 average, then the #3 seed to get to a pool of 7.0, then facing #8 for the Cup.

Tonnes of possible pitfalls. Have to account for home ice advantage, and have to account for the fact that some teams play better/worse against others, often defying "predictions" of game outcome based on relative rankings in league standings, etc. There is no "average team" in the playoffs; just one team that either matches up well against the other, or not, and the impact of playing more/less games at home (which I think is a pretty well established "advantage", despite not being able to guarantee a win). Head-to-head results in the regular season would be important, imo, but then again lots of teams change their look by the time the trade deadline rolls around.

In the end, though, I can't imagine that any rigorous study accounting for many variables is going to come any closer to successfully painting the picture than a simple comparison of average seed of team still alive in each round. Keeping with the Roy theme, for example, in 1986 the Habs' faced the #8 seed to get to a pool of 9.4 average seeded "opposition", then faced the #11 seed to get to a pool of 10.7, then faced a 14th seed to get to the Final against the #6 seed. That's a lot easier than, say, last year's Kings facing the #1 seed to get to a pool of 7.3 average, then the #3 seed to get to a pool of 7.0, then facing #8 for the Cup.

Agreed (for the most part). But let's not allow "perfect" to be the enemy of "decent".

I'm not sure an accurate measure would only be based on the last regular season.

Best example I can think of: in 1995, the 8th seeded Rangers beat that 1st seeded Nordiques. Should have been considered a major upset, but a lot of people were not surprised. Rangers were "the defending Cup champions who underachieved during the regular season" vs "a young team that is there for the first time."

Do you think it's worth bringing in the previous playoffs, or is that too much complication for a minor variable?

I guess it depends upon how complicated the prediction model would be (and the prediction model could be fit by how well it predicts the series - of course, that's another subforum topic entirely ).

Or stated differently - the prediction model would have uses of its own, but would also be able to help answer other questions (such as this one).

I'm not sure an accurate measure would only be based on the last regular season.

Best example I can think of: in 1995, the 8th seeded Rangers beat that 1st seeded Nordiques. Should have been considered a major upset, but a lot of people were not surprised. Rangers were "the defending Cup champions who underachieved during the regular season" vs "a young team that is there for the first time."

Do you think it's worth bringing in the previous playoffs, or is that too much complication for a minor variable?

There were indications from the regular season that the Rangers were better than their record, and the Nordiques worse.

For example, the Rangers had a shot differential of about +250 during the Regular Season, whereas the Nordiques were even.

Not to say that the Rangers were necessarily the better team - Quebec (Colorado) certainly did better the following year.

Just that a wholistic analysis (even if confined to the 1994-95 Regular Season) discloses that the teams were probably pretty even at the time.

I'm sure it wouldn't be too hard to find plenty of examples of teams that finished behind one of their playoff rivals during the regular season, yet dominated (or even "unexpectedly" split) the head-to-head matchups (and vice versa), and ended up winning/losing the series in an "upset". For example, in '85/86 Toronto finished 30 points behind the Blackhawks and faced them in the first round.

Now, if we looked at it just as an 86 point team vs a 57 point team, it would look like a major hurdle. Fact is, though, that Toronto dominated the season series between the two clubs (6W-2L). So it might look like Toronto had a difficult task ahead of them in round 1 of '86, but they were probably more comfortable heading into Chicago than they may have been Minnesota, for example (who Toronto couldn't beat in 8 encounters that year). Lo and behold, they won that series.

This would be insanely hard to do, because weighting each factor would be almost impossible imo. How do we weigh precedence (how often 8 beats 1), underlying numbers, percentages for that year, goaltending, etc. I would maybe start with Gabe Desjardins 75-5-20 or whatever it is split of fenwick/possession, goaltending and luck.

This would be insanely hard to do, because weighting each factor would be almost impossible imo. How do we weigh precedence (how often 8 beats 1), underlying numbers, percentages for that year, goaltending, etc. I would maybe start with Gabe Desjardins 75-5-20 or whatever it is split of fenwick/possession, goaltending and luck.

Regression analysis models solve this exact problem (how much weight to give different factors) every day.

I'm sure it wouldn't be too hard to find plenty of examples of teams that finished behind one of their playoff rivals during the regular season, yet dominated (or even "unexpectedly" split) the head-to-head matchups (and vice versa), and ended up winning/losing the series in an "upset".

That's not evidence, because it can be explained by the fact that the playoffs are a short-series format, where any team has a chance to beat any other team. A certain number of upsets are expected every year, simply because of the short series format.

You'd have to demonstrate, rather than assume, that a team won because of the matchup, rather than normal variance. Same applies to the regular-season matchups; normally the number of games involved is so small that you can't draw reliable conclusions from them.

Regression analysis models solve this exact problem (how much weight to give different factors) every day.

I'm not very educated on statistics and mathematics, but I should clarify that I more or less meant to say that there would be such a huge array of factors that would need to be considered imo, (near impossible was a bad choice of words).

I'm not very educated on statistics and mathematics, but I should clarify that I more or less meant to say that there would be such a huge array of factors that would need to be considered imo, (near impossible was a bad choice of words).

Agreed - and that's the hard part.

On the other hand, no one's expecting a model that's 100% predictive, and so I imagine a process such as this:

We develop a list of candidate factors

Develop a model based on those factors (removing variables, adding variables, combining variables...)

See how the model does

See which types of teams consistently do better (worse) than the model expects

I'm sure it wouldn't be too hard to find plenty of examples of teams that finished behind one of their playoff rivals during the regular season, yet dominated (or even "unexpectedly" split) the head-to-head matchups (and vice versa), and ended up winning/losing the series in an "upset".

Sure.

Of course, proving that it's not just random variation is much harder.

The relevant question is: Does it happen more happen than one would predict, based on chance alone?

Toronto and Chicago played 11 games against each other in 1985-86.

The signal to noise ratio is pretty weak over 11 games.

That's not evidence, because it can be explained by the fact that the playoffs are a short-series format, where any team has a chance to beat any other team. A certain number of upsets are expected every year, simply because of the short series format.

You'd have to demonstrate, rather than assume, that a team won because of the matchup, rather than normal variance. Same applies to the regular-season matchups; normally the number of games involved is so small that you can't draw reliable conclusions from them.

So are you suggesting that the results of head-to-head matchups in the regular season are irrelevant when trying to "determine" the relative difficulty of any given team's path to the Cup? I think there are plenty of 6th seeds (for example) out there who definitely looked forward to facing some teams rather than others based on their confidence (I'd assume) gained from having beaten them during the regular season. I'm really curious now as to the percentage of teams who lost the season series but went on to win the playoff round in which they met again. I also wonder how much the home vs away factor figures in there.

I dunno, it seems like something that could at least supplement something equally uselessly basic on its own such as "quality of competition" based simply on the "fact" that playing an 8th seed should be easier than playing a 7th seed, which is easier than playing a 6th seed... etc.

So are you suggesting that the results of head-to-head matchups in the regular season are irrelevant when trying to "determine" the relative difficulty of any given team's path to the Cup?

I think that what he's saying is that, although there are certainly cases where one team had an advantage on an opponent based upon matchups (above and beyond what's reflected in overall team ability), you would be hard-pressed to pick out which ones were genuine advantages, and which ones were the result of small sample sizes.

If that's not what he's saying, then I'll take credit for it.

I think that what he's saying is that, although there are certainly cases where one team had an advantage on an opponent based upon matchups (above and beyond what's reflected in overall team ability), you would be hard-pressed to pick out which ones were genuine advantages, and which ones were the result of small sample sizes.

If that's not what he's saying, then I'll take credit for it.

Oh, I totally understand about the sample size thing. I'm also not sure how I'd build it into a cumulative "quality of competition" stat, anyhow. It just seems to me that 1st seed>2nd seed>3rd seed... isn't necessarily going to prove to be any less random-seeming or predictive as the sample size gets bigger. Maybe at the extremes (1st seed vs 8th, specifically), I have no idea.

Oh, I totally understand about the sample size thing. I'm also not sure how I'd build it into a cumulative "quality of competition" stat, anyhow. It just seems to me that 1st seed>2nd seed>3rd seed... isn't necessarily going to prove to be any less random-seeming or predictive as the sample size gets bigger. Maybe at the extremes (1st seed vs 8th, specifically), I have no idea.

I think that it'd probably be a component of a regression equation (although likely not a large one) - my guess is that the information about your seed is largely redundant to your goal differentials, point totals, and other metrics (for instance, if you outscored your opponents by 160 goals over the regular season, you wouldn't gain much predictive power by also knowing that you're a #1 seed).