HFBoards Did some statistical programming
 Register FAQ Members List Search Today's Posts Mark Forums Read
 Mobile Hockey's Future Become a Sponsor Site Rules Support Forum vBookie
 Notices Please do not post or solicit links to illegal game streams.

 NHL Draft - Prospects Discuss hockey prospects from all over the world and the NHL Draft.

# Did some statistical programming

 07-14-2005, 02:38 AM #16 King'sPawn Enjoy the chaos     Join Date: Jul 2003 Posts: 10,596 vCash: 500 Seems like there's a flaw. Isn't the percentage of picking based upon the number of balls that are removed? For example, let's say your team is Detroit. You have 1 ball out of 48. That's 2.08% chance of getting the #1 pick. Now let's say Toronto gets the #1 pick. That means Detroit would have a 2.12% (1/47) chance of getting the second overall pick. However, if Buffalo gets the 1st overall pick, Detroit would have a 2.22% (1/45). Now let's say Toronto and Philadelphia get #1 and #2. That would mean the Red Wings have a 2.17% (1/46) chance of getting the third overall pick. However, if Buffalo and New York get the top 2 picks, the Red Wings would have a 2.38% (1/42) chance of getting third overall. If all three ball teams get picked first, that would give the Red Wings a 2.78% (1/36) chance of picking 5th, as opposed to a 2.32% chance of picking 5th if four one-ball teams go first. It's good to illustrate a point, but it's not exact.
 07-14-2005, 06:27 AM #17 Potted Plant Registered User   Join Date: May 2003 Location: Tuscaloosa, AL Posts: 858 vCash: 500 I took that into account in my program. The x array represents all of the balls in the balls in the hopper (ball 49 is just a place holder). They're initialized to different variables to represent different "kinds" of draft possibilities. Teams with 3 draft balls are initialized to have their X values set to 5, 3, and 4. Five means that when that ball is picked, you have to register that one and the next two as having been picked. Three means that you set that one, the previous one, and the next one to register it having been picked. Four means you reset that one and the previous two. (5 then represents the "first" ball, 3 represents the "second", and 4 represents the "third"). I realize the notation is confusing and if I truly meant for it to be shared with others who would modify it for their own purposes or something like that, I would have gone back and rewritten it to be easier to follow. To be complete, teams with two balls have their X values initialized to 1 and 2. One means you reset that ball and the next one. Two means you reset that ball and the previous one. One-ball teams have theirs set to 0, which means to just look at that ball. I register that a team has been picked by changing all of its X-values to 6. The variable "var" is the ball that is picked. After picking a ball, I go through and check whether it, or one of its companions, has been picked before. If it has, I just pick a new one. Last edited by Potted Plant: 07-14-2005 at 06:44 AM.
 07-14-2005, 06:43 AM #18 Potted Plant Registered User   Join Date: May 2003 Location: Tuscaloosa, AL Posts: 858 vCash: 500 Oh, I think I understand your question now. You're pointing out that your percentages of getting a certain pick is dependent on exactly who is picked in front of you. Well, it was actually the primary goal of my simulation to take that into account. That's why I had the program simulate the draft 200,000 times. That way, you start averaging out all the variation that comes from the differences you talk about. My program does not break it down by those variables. It averages them out. If you want to figure out what happens at pick #2 assuming pick #1 goes to a certain team, it's pretty easy. There are 48 balls. If a 3-ball team gets #1, there are then 45 balls. The remaining 3-ball teams have a 3/45 chance of getting pick #2. The 2-ball teams have a 2/45 chance, and the 1-ball teams have a 1/45 chance. If it's a two-ball team or a one-ball team that gets #1, just add 1 or 2 to the denominators above, and you get the answer. The program starts from time zero (now) and predicts your team's individual chances of getting whatever pick. Once the first ball is picked, my statistics won't help you anymore.
 07-14-2005, 09:32 AM #19 MaV Registered User   Join Date: Jun 2002 Posts: 482 vCash: 500 Those numbers aren't really accurate yet though, or? I mean, shouldn't the three balls holders simply have 3/48 chance for the first pick? That's 6,250 and your number is slightly less. The same with three ball holder for second pick 9/48*3/45 + 20/48*3/46 + 16/48*3/47 is 6,095, again your simulation gives slightly smaller chances. So, maybe it would need to be run even more times to get the numbers right. It's very complicated system anyway.
 07-14-2005, 09:44 AM #20 MojoJojo Registered User     Join Date: Jan 2003 Location: Philadelphia Posts: 9,354 vCash: 500 I'm not sure you did this right, though I am a C programmer and can only generally follow what you did. Basically the problem is that you need to account for the number of balls removed from the pool with each draft pick. In a multiball system this is difficult, because the number changes depending on who got what pick before the pick you are calculating, ie, for the second pick you need to add up the probability that first pick was taken by a one, two and three ball team. The third pick you need to add up the probablities that between 2 and 6 balls were removed in the first two picks, etc.
 07-14-2005, 11:09 AM #21 King'sPawn Enjoy the chaos     Join Date: Jul 2003 Posts: 10,596 vCash: 500 Thanks for clarifying, HRR. I misunderstood the premise of your program So basically you ran the program x amount of times, and you averaged out the results? Meaning in any given simulation, there was probably a time when all of the 1 balls were drafted before anyone else... but there was another time when all of the 1 ball teams were drafted last? Your program averaged it out like that? If that's what you did, that makes perfect sense. You were aiming more for a general likelihood of each spot as opposed to giving exact numbers.
 07-14-2005, 11:14 AM #22 NYRangers Registered User   Join Date: Aug 2004 Posts: 2,853 vCash: 500 The 4 "bad" teams have a 25% chance of getting the top pick. The 10 "mediocre" teams have a 41.6% chance of getting the top pick. The 16 "good" teams have a 33.3% chance of getting the top pick. Crosby is probably going to a somewhat good team.
07-14-2005, 11:46 AM
#23
Potted Plant
Registered User

Join Date: May 2003
Location: Tuscaloosa, AL
Posts: 858
vCash: 500
Quote:
 Originally Posted by MaV Those numbers aren't really accurate yet though, or? I mean, shouldn't the three balls holders simply have 3/48 chance for the first pick? That's 6,250 and your number is slightly less. The same with three ball holder for second pick 9/48*3/45 + 20/48*3/46 + 16/48*3/47 is 6,095, again your simulation gives slightly smaller chances. So, maybe it would need to be run even more times to get the numbers right. It's very complicated system anyway.
No, it's not perfect. The problem is that I only had the program run the draft 200,000 times. It's probably accurate to within a factor of 1-2%. It's not designed to get rock-solid numbers. It's designed to converge to the right answer. It will just take an infinite number of runs to get it perfect. I tried it with 1,000,000 drafts, but the computer I was on wouldn't run it. I'd get overflow errors. To get better numbers, you can just open up an Excel Spreadsheet, go to the VB editor and copy my program into it. Change the 200,000 number to 10 million and see if your machine will run it. It will output the absolute number of times a certain 3-ball team got each pick (I followed balls 1, 2, and 3 for this purpose), the number of times a certain 2-ball team got each pick (I followed balls 13 and 14 for this), and the number of times a certain 1-ball team got each pick (ball 30 I think). Just divide each result by 10,000,000 and multiply by 100 to get percentages. The results will be more accurate than what I got.

07-14-2005, 11:52 AM
#24
Potted Plant
Registered User

Join Date: May 2003
Location: Tuscaloosa, AL
Posts: 858
vCash: 500
Quote:
 Originally Posted by MojoJojo I'm not sure you did this right, though I am a C programmer and can only generally follow what you did. Basically the problem is that you need to account for the number of balls removed from the pool with each draft pick. In a multiball system this is difficult, because the number changes depending on who got what pick before the pick you are calculating, ie, for the second pick you need to add up the probability that first pick was taken by a one, two and three ball team. The third pick you need to add up the probablities that between 2 and 6 balls were removed in the first two picks, etc.
It is done. What I did accounts for that. I assigned all balls to their teams. When one team's ball was picked, I registered all their other balls as picked as well. The mechanics of the program simulate it like this:

Starting with pick #1
1. Pick a ball
2. Check to see that the ball hasn't been picked already.
3. If it hasn't, register that ball as picked.
4. Register each of the team's other balls as picked as well.
5. Put the ball back in the bin
6. Check to see if this is one of the three teams I'm following.
7. If so, register where that team is picking in the order.
8. Go to the next pick and start again.

The same ball could easily be picked twice, but those are just discarded. The pick is also discarded and tried again if another of the same team's balls is picked.

It would be very hard to figure out with mathematical certainty what the chances are. It's easier this way, running a mock draft a couple hundred thousand times and just counting up the results.

07-14-2005, 12:39 PM
#25
MojoJojo
Registered User

Join Date: Jan 2003
Posts: 9,354
vCash: 500
Quote:
 Originally Posted by MaV The same with three ball holder for second pick 9/48*3/45 + 20/48*3/46 + 16/48*3/47 is 6,095, again your simulation gives slightly smaller chances.
You also need to multiply that total by the odds that the team did not get the first pick, which is 1 minus the odds they got it.

07-14-2005, 12:47 PM
#26
MojoJojo
Registered User

Join Date: Jan 2003
Posts: 9,354
vCash: 500
Quote:
 Originally Posted by HighlyRegardedRookie It would be very hard to figure out with mathematical certainty what the chances are. It's easier this way, running a mock draft a couple hundred thousand times and just counting up the results.
OK, I see how it works now. Thats probably the easiest way, even if its not an explicit solution.

 07-14-2005, 12:56 PM #27 ceber Registered User   Join Date: Apr 2003 Location: Wyoming, MN Country: Posts: 3,500 vCash: 500 http://hfboards.com/showpost.php?p=3...&postcount=196 I think those were the results of a billion-run simulation. Should be pretty close to theoretical, from what I understand.
 07-14-2005, 01:03 PM #28 MojoJojo Registered User     Join Date: Jan 2003 Location: Philadelphia Posts: 9,354 vCash: 500 One thing thats interesting is that all teams have roughly the same odds at getting the 17th pick.
07-14-2005, 01:41 PM
#29
MaV
Registered User

Join Date: Jun 2002
Posts: 482
vCash: 500
Quote:
 Originally Posted by MojoJojo You also need to multiply that total by the odds that the team did not get the first pick, which is 1 minus the odds they got it.
No no, no need for that. I mean, 9/48 is the chance of some other three ball holder to get the 1st pick, 20/48 two ball holder and 16/48 one ball holder. You know, then the team in question could not have got the pick.

But anyway, from math to the origianl question. Rangers have only ~54% chance to get top-10 pick. So it's not guaranteed.

07-14-2005, 03:34 PM
#30
Patman
Registered User

Join Date: Feb 2004
Posts: 330
vCash: 500
Quote:
 Originally Posted by ceber http://hfboards.com/showpost.php?p=3...&postcount=196 I think those were the results of a billion-run simulation. Should be pretty close to theoretical, from what I understand.
Yeah, and if news sources want to use those numbers (or want to see the full simulation) I have no problem with it. I am fully confident that those numbers are accurate to the digits listed but after that it's sketchy. Standard error calculations maximized the variance at 0.00158%... 3 standard deviations in either direction is about 0.009% for a ball park in the deviation estimates for each team when done team by team... realize that my calculations are further averaged since we have teams with the same weights binned together so they are even more accurate than the 0.009% figure... but not too much more accurate.

If the media does want to use this I'd just ask them to contact me at patrick.joyce@huskymail.uconn.edu . I admit this board gets rather close to public domain but I'd love to see my name in print or on TV :p

Forum Jump