I finally wrapped my head around the Birthday Paradox today. You can go ahead and look that up on Wikipedia, but it will make your brain melt. I see all of those wacky mathy symbols and start wondering what's for dinner. I'll to explain in a way that any old moron can understand, because that's basically all I can figure out.
The basic question is "how many people do you have to get together before the odds are better than 50/50 that two of them share a birthday?" A common initial reaction is to say that it would be
183 (this would be more than one half of 365). If you think about that for a minute, you'll realize that
My first thought was that it was a problem of graphing all possible connections between the n number of people in the room. To simplify, I'll illustrate with four:
Using that logic, there are 4 people with 6 possible connections. So whatever number of people would make over
183 connections would be the correct number. The formula for that is recursive: edges(n) = (n-1) + edges(n-1). I'll save you the math, the answer is
Thinking that over again, I realized that for every person in the room who doesn't match anyone else, a birthdate is eliminated. We no longer have to take that date into consideration when checking anyone after that. So the odds of a match increase with every person who doesn't match. This is unlike, say, flipping a coin where the odds of a result are independent of the earlier results. It's more like picking the Ace of Spades from a deck of cards. With every card that isn't the Ace of Spades, there are fewer cards to choose from can be it.
The birthday paradox works the same way. For the first guy (Arthur), there is a 1/365 chance that he matches with the next person he talks to (Brad). Then there's a 1/365 chance of Arthur being a match with the guy after that (Carl), etc. So the odds of him matching someone else are (1/365)^n (where n is the total number of people in the room).
But when Arthur has spoken to everyone in the room, we know that his birthday (let's say Jan 1) is not shared with anyone in the room. Therefore we don't even think of Jan 1 as being an option for the next guy. So Brad will have a 1/364 chance of matching with Carl, a 1/364 chance of matching with Dave, a 1/364 chance of matching with Eric, etc. That is, his total odds of matching with someone are (1/364)^(n-1). (The n-1 is because we already know he doesn't match with Arthur. We don't want to count that combination twice.)
The odds that Arthur doesn't match anyone AND Brad doesn't match anyone are calculated by multiplying the two sets of odds together: (1/365)^n * (1/364)^(n-1). This continues for each person in the room. Subtract from n and multiply by the odds of the previous guys. The formula eventually works out to:
(1/(365-0))^n * (1/(365-1))^(n-1) * (1/(365-2))^(n-2) ... (1/365-(step_number-1))^1
If that looks like the unreadable stepchild of Lisp and Perl, I also wrote a nice succinct Ruby scipt to sum it up:
def birthday_paradox(people, days)
odds = 1.0
people.downto(1) do | i |
((days.to_f - 1) / days.to_f) ** (i - 1)
odds *= odds_of_nonmatch_this_round
days -= 1
return 1 - odds
Keep running that until you get a number over 0.5.
*SPOILER* 23. Now you are enlightened! Hopefully. Otherwise, it's probably because I explained it badly. You can have your money back on this post.
Edit (08/28/09 11:30am): I fixed a couple of errors. Notes to self: Don't do long strings of addition in your head. You will get them wrong. Also, in a math post about statistics, don't round numbers to the nearest ten. Half of 365 is not 180.