View Single Post
Old
04-30-2012, 09:54 AM
  #69
seventieslord
Moderator
 
seventieslord's Avatar
 
Join Date: Mar 2006
Location: Regina, SK
Country: Canada
Posts: 25,861
vCash: 500
Quote:
Originally Posted by plusandminus View Post
Yes I did. Are you refering to an Excel sheet named NHL68-06TOI.xls ?
yes.

Quote:
I haven't yet put that data into my database.
It's color coded. I suppose white background means factual data. Blue seems to be recent seasons when players have changed team during the season, and where an estimation has been made. I find that estimation unreliable, as I for example found it to be way wrong for Ozolinsh. Green seem to be completely estimated.
But is there really a way to know how wrong the estimated (green) data are, when we don't have any factual data to compare with?
Yes. I thought that's what we were talking about.

Take the formula used to estimate ice times, apply it to seasons where the results are known, compare the estimates to the actual results.

This was already done though. It's where the "96% correlation" thing comes from.

Quote:
I have, however, posted about it here on the board. I got some replies saying that the differences overall looked small, which I don't agree with. The average error could have been say 6 % (or I may remember wrong).
6% I would be very comfortable with, considering these are estimates.


Quote:
Here is something I spent many hours on earlier this year.
Basically you're right. But average powerplay time actually do change both between seasons and between teams.
At least two things seems to affect their length:
1. Powerplay percentage. The better powerplay percentage a certain team had, the shorter their powerplays on average tended to last.
2. Total number of penalties. A powerplay ends when a) the power play team scores, b) the period ends, or c) the power playing team takes a penalty. If I remember right, this is not as important to account for as power play percentage (1), but seem to affect things.
I spent an awful lot of time trying to integrate these two parameters in the estimation formula, but it wasn't easy, and I got more and more dizzy. (Maybe this is a case for CzechYourMath.)

Below are seasonal data, showing league averages.
SeasSeas2PPoppPPshotsPPtimeMinPPGFPPGAPPGFperOppPPGAperOppPPGDperOppPPGFper2MinPPGAper2MinPPGDper2minPPtimeGFPPtimeGDPPOpplengthSHsavePerc
19971998380.154450.885 0.000057.34610.000 0.1508 0.0263 0.1245 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.8724
19981999359.111440.222 0.000056.778 8.148 0.1581 0.0227 0.1354 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.8706
19992000330.821397.964 0.000053.429 7.714 0.1615 0.0233 0.1382 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.8661
20002001376.067449.400 0.000062.567 8.900 0.1664 0.0237 0.1427 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.8608
20012002338.467414.133 0.000053.367 7.333 0.1577 0.0217 0.1360 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.8709
20022003362.533454.600617.66859.567 7.667 0.1643 0.0211 0.1432 0.1929 0.0248 0.168110.369411.9011 1.7038 0.8679
20032004347.567428.700587.63257.233 8.133 0.1647 0.0234 0.1413 0.1948 0.0277 0.167110.267311.9681 1.6907 0.8657
20052006479.667606.633789.32584.83310.600 0.1769 0.0221 0.1548 0.2150 0.0269 0.1881 9.304410.6330 1.6456 0.8598
20062007397.833514.633662.94869.967 8.933 0.1759 0.0225 0.1534 0.2111 0.0270 0.1841 9.475210.8621 1.6664 0.8633
20072008351.367468.933570.23962.367 7.967 0.1775 0.0227 0.1548 0.2187 0.0279 0.1908 9.143310.4823 1.6229 0.8669
20082009340.933485.967549.59464.600 7.833 0.1895 0.0230 0.1665 0.2351 0.0285 0.2066 8.5076 9.6816 1.6120 0.8665
20092010304.533437.767495.16555.467 6.367 0.1821 0.0209 0.1612 0.2240 0.0257 0.1983 8.927310.0848 1.6260 0.8729
20102011290.533417.000477.99652.367 6.867 0.1802 0.0236 0.1566 0.2191 0.0287 0.1904 9.127910.5054 1.6452 0.8742
(GD is goal difference, or in this case "net" stats (SHGA-SHGF and PPGF-PPGA).

It's not what you specifically asked for, but might perhaps be of interest anyway.
One can see what appear to be correlating things in the table, but I eventually got very dizzy in my attempts at creating formulas to e.g. estimate average PP time based on the different data.

On a team level, normalized to 1.0, PP time lengths were from .9249 to 1.0640. Half of the teams were between .9844 and 1.0154 (i.e. half of the teams were within 1.5 % of the average). I don't remember if the normalization were to all seasons combined, or if I normalized each season individually. 8 seasons studied, from 2002-03 to 2010-11. 240 teams.
Like I said above, there is a very strong correlation between a team's power play percentage and their average powerplay time time. The 14 teams with lowest average PP time all were above average PP percentage wise. About 20 out of the 21 teams with the highest average PP time were below average PP percentage wise.
If I'm not too confused, the estimated powerplay lengths should in half of the cases be less than 1.5 % wrong, and in half of the cases more than 1.5 % (but in this case lower than 7.5 %) wrong.
I don't know how to apply this to older seasons. I suspect the estimation formula should for some older seasons would show considerably less correct estimations than for the more recent seasons. But I don't know which seasons and to what extent.

I don't have in my head now if I studied this on player level too. On a team level, the more effective (or ineffective) you were on the PP, the more the estimated PP time will differ from the factual one. (That is a generalization, because we have no idea about the specific cases.) The same thinking might, or might not, be appliable to player level.

Sorry for not being able to immediately answer your main questions.


By the way, this is a stats heavy post, and probably a bit unappealing to the general reader here. Dealing with things like this can often be a bit boring (and very time consuming) to me too. But somehow I think this history forum is a good place for this anyway, as this is a very stats oriented section. We use many statistical "components", and the better we can make each component, the better the components relying on them will be.
That is interesting stuff. I figured that if there was any other factor other than just randomness that would cause a team's average time per PP, it would be their PP efficiency.

seventieslord is offline   Reply With Quote