Introducing a new stat: Location Adjusted Expected Goals Percentage
View Single Post
09-07-2013, 03:45 PM
Join Date: May 2012
Originally Posted by
Awesome work. As other have said, something like this has been a long time coming.
I was wondering how significant of a difference it is we are seeing between the players (this has kind of already been brought up). You've got Boyle's lower bound at 0.493526, then Gallagher 0.000644 below him, then at the bottom of that list is Subban who is 0.0524 below Boyle.
While the highest lower bound number could imply the player with the most "reliably" high "true" number, it doesn't change that Subban's "true" number could just as easily be the highest given how much their intervals overlap. It's essentially randomness in between.
I have very little Bayesian training, but matnor may have been on to something with that? The goal may simply be how confidently you can suggest a player is above average (your prior).
You may have acknowledged this and I missed it (though I could be seriously misstating some things as I'm tired and haven't brushed up on my econometrics in some time haha, I almost went on a semi-related confidence intervals rant, but realized I was fudging way too much in my response haha).
It may simply be the case you'll never have enough of a sample to improve your accuracy. Perhaps look at player's multi-season samples? Baseball defensive stats have similar issues (check out UZR if you haven't already) and a common rule of thumb is you need 3 seasons of data before you can infer anything.
Another thought is you could try to identify players in "batches". What group of players you can confidently declare "elite" (say that whole group you've posted so far), "good", "average", etc. Perhaps I'm mislabeling your intentions here, as I doubt anyone would now be definitively saying Boyle > Gallagher > Clarkson... It just becomes a question of "usefulness" for evaluating an individual player when it's all essentially random.
I'm curious what kind of numbers you have for the worst in the league?
Anyway, again, great work.
Lower bounds of confidence intervals are just an easy way to sort the players that allowed me to give more credit to players that are trusted with more ice time (repeated performance on a larger sample size). I completely agree that you can't say that Boyle is definitively > Gallagher, but can you do that with any other statistic? Besides, you could easily go back to simply using GF%, or even GF differential, which is essentially the same as Corsi.
View Public Profile
Find More Posts by Wesleyy