View Single Post
04-29-2012, 06:08 PM
Registered User
Join Date: Mar 2011
Posts: 980
vCash: 500
Originally Posted by seventieslord View Post
my mistake, you made it sound like you had done work using the actual TOI file that we've been referring to. It sounds like you're referring more to overpass' method of estimating PP/PK usage.
Yes I did. Are you refering to an Excel sheet named NHL68-06TOI.xls ?
I haven't yet put that data into my database.
It's color coded. I suppose white background means factual data. Blue seems to be recent seasons when players have changed team during the season, and where an estimation has been made. I find that estimation unreliable, as I for example found it to be way wrong for Ozolinsh. Green seem to be completely estimated.
But is there really a way to know how wrong the estimated (green) data are, when we don't have any factual data to compare with?
Okay, one may use the same estimation algorithm for the recent (white) seasons, but if so I need to know the formula. (Sorry but I'm a bit weak at keeping track at things. And I suppose this is a case where an Excel skilled person may do this faster than me.)

Originally Posted by seventieslord View Post
What I was saying is, in the TOI file where GF/GA in PP and PK situations are used to determine TOI, there is probably an adjustment for top unit players that accounts for the fact that they score and get scored on more often. This would "smooth out" the effect you're seeing in the extreme Niedermayer/Stevens example.
That is an assumption, based on a generalization. As I see it, I don't know for sure.
If I was to assume, I would basically agree with you that it probably would be smoothened out, even though my guess it that prime Niedermayer - during that particular season - still would have better PK stats than 39 or 40 year old Stevens.
If I get the time, I might study this case even more (to for example see what pts per minute pace the oppoents who was on ice during goals had).

I haven't yet put the data in the Excel sheet into my database (because it didn't seem very reliable), but when/if I do, I will likely compare

Originally Posted by seventieslord View Post
I wouldn't necessarily call that an error. It does, however, underscore the importance of looking at the actual numbers and not getting caught up in rankings. You're right that there can easily be differences from actual to estimated results; I think that this would only happen in cases where they were very close to begin with. And I don't think it would be very often that estimates would change this. i.e. if you're close in actual numbers you'll be close in estimated numbers, and I don't think it should really concern anyone if one player is 30 seconds ahead in actual numbers and 30 seconds behind when estimated; this is not a huge deal. People should be getting away from the whole "see, he was the #2 defensemen because he played 30 seconds more than the #3 guy" mindset and more towards the "these two guys played about the same minutes, you could say they were the co #2/3" mindset.
I basically agree.

Originally Posted by seventieslord View Post
When I asked about errors I wanted to know about differences in the calculated times. When you say that it might swap who the #2 and #3 defensemen are, it might only take a 10 second swing to make that swap, or it might take a 3 minute swing, so it really says nothing about the quantity of the error.

So let me rephrase the question - how often are the estimated results more than 10% away from the actual results?
I have searched for more than an hours without finding the necessary data or code, so I'm afraid I have to let you wait for an answer.
I have, however, posted about it here on the board. I got some replies saying that the differences overall looked small, which I don't agree with. The average error could have been say 6 % (or I may remember wrong).

Originally Posted by seventieslord View Post
True. Keep in mind that the estimate of how much situational icetime a team had in a season is a very easy thing to estimate. You know how many PPs they had for and against, and we know the average length of a PP over time. As long as a team didn't have a massively dispropotionate propensity to score PP goals very early in the PP, or allow then really late, then those numbers are pretty solid indeed.
Here is something I spent many hours on earlier this year.
Basically you're right. But average powerplay time actually do change both between seasons and between teams.
At least two things seems to affect their length:
1. Powerplay percentage. The better powerplay percentage a certain team had, the shorter their powerplays on average tended to last.
2. Total number of penalties. A powerplay ends when a) the power play team scores, b) the period ends, or c) the power playing team takes a penalty. If I remember right, this is not as important to account for as power play percentage (1), but seem to affect things.
I spent an awful lot of time trying to integrate these two parameters in the estimation formula, but it wasn't easy, and I got more and more dizzy. (Maybe this is a case for CzechYourMath.)

Below are seasonal data, showing league averages.
19971998380.154450.885 0.000057.34610.000 0.1508 0.0263 0.1245 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.8724
19981999359.111440.222 0.000056.778 8.148 0.1581 0.0227 0.1354 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.8706
19992000330.821397.964 0.000053.429 7.714 0.1615 0.0233 0.1382 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.8661
20002001376.067449.400 0.000062.567 8.900 0.1664 0.0237 0.1427 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.8608
20012002338.467414.133 0.000053.367 7.333 0.1577 0.0217 0.1360 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.8709
20022003362.533454.600617.66859.567 7.667 0.1643 0.0211 0.1432 0.1929 0.0248 0.168110.369411.9011 1.7038 0.8679
20032004347.567428.700587.63257.233 8.133 0.1647 0.0234 0.1413 0.1948 0.0277 0.167110.267311.9681 1.6907 0.8657
20052006479.667606.633789.32584.83310.600 0.1769 0.0221 0.1548 0.2150 0.0269 0.1881 9.304410.6330 1.6456 0.8598
20062007397.833514.633662.94869.967 8.933 0.1759 0.0225 0.1534 0.2111 0.0270 0.1841 9.475210.8621 1.6664 0.8633
20072008351.367468.933570.23962.367 7.967 0.1775 0.0227 0.1548 0.2187 0.0279 0.1908 9.143310.4823 1.6229 0.8669
20082009340.933485.967549.59464.600 7.833 0.1895 0.0230 0.1665 0.2351 0.0285 0.2066 8.5076 9.6816 1.6120 0.8665
20092010304.533437.767495.16555.467 6.367 0.1821 0.0209 0.1612 0.2240 0.0257 0.1983 8.927310.0848 1.6260 0.8729
20102011290.533417.000477.99652.367 6.867 0.1802 0.0236 0.1566 0.2191 0.0287 0.1904 9.127910.5054 1.6452 0.8742
(GD is goal difference, or in this case "net" stats (SHGA-SHGF and PPGF-PPGA).

It's not what you specifically asked for, but might perhaps be of interest anyway.
One can see what appear to be correlating things in the table, but I eventually got very dizzy in my attempts at creating formulas to e.g. estimate average PP time based on the different data.

On a team level, normalized to 1.0, PP time lengths were from .9249 to 1.0640. Half of the teams were between .9844 and 1.0154 (i.e. half of the teams were within 1.5 % of the average). I don't remember if the normalization were to all seasons combined, or if I normalized each season individually. 8 seasons studied, from 2002-03 to 2010-11. 240 teams.
Like I said above, there is a very strong correlation between a team's power play percentage and their average powerplay time time. The 14 teams with lowest average PP time all were above average PP percentage wise. About 20 out of the 21 teams with the highest average PP time were below average PP percentage wise.
If I'm not too confused, the estimated powerplay lengths should in half of the cases be less than 1.5 % wrong, and in half of the cases more than 1.5 % (but in this case lower than 7.5 %) wrong.
I don't know how to apply this to older seasons. I suspect the estimation formula should for some older seasons would show considerably less correct estimations than for the more recent seasons. But I don't know which seasons and to what extent.

I don't have in my head now if I studied this on player level too. On a team level, the more effective (or ineffective) you were on the PP, the more the estimated PP time will differ from the factual one. (That is a generalization, because we have no idea about the specific cases.) The same thinking might, or might not, be appliable to player level.

Sorry for not being able to immediately answer your main questions.

By the way, this is a stats heavy post, and probably a bit unappealing to the general reader here. Dealing with things like this can often be a bit boring (and very time consuming) to me too. But somehow I think this history forum is a good place for this anyway, as this is a very stats oriented section. We use many statistical "components", and the better we can make each component, the better the components relying on them will be.

Last edited by plusandminus: 04-29-2012 at 06:22 PM.
plusandminus is offline   Reply With Quote