Home     Archive     Tags     About        RSS


Emanuel Derman
Felix Salmon
Seeking Alpha
Steve Keen
Zero Hedge
Dick Lipton
Terence Tao
A Derivation of Wallis's Formula for Pi

For \(\pi\) day plus one, here is a derivation of Wallis's formula for \(\pi\) that is based on probability distributions. Start with the binomial distribution for a fair coin toss. The probability of getting \(k\) heads on \(n\) tosses is


Assume an even number of tosses, \(n=2m\). Then the distribution has a maximum at \(k=m\) of


For large \(n\), the binomial distribution can be closely approximated by a Normal probability density function give by


where \(\mu=n/2=m\) and \(\sigma^2=n/4=m/2\). The maximum at \(x=\mu\) is


Comparing this with the maximum for the binomial distribution and taking the limit as \(m\to\infty\), you get


Now we just need the following double factorial identities



Then we can write


and \(\pi\) can be expressed as


or in product form as


This is almost Wallis's formula. To get there, note that


Substitute this into the above formula, take the limit and you get Wallis's formula


Is that cool or what?

Probabilities in Powerball

The powerball jackpot is now over a billion dollars. The idea of winning that kind of money is too crazy for anyone to ignore, so let's do a little analysis of the game.

To play the game, you fill out a ticket with 5 numbers ranging from 1 to 69, no duplicates allowed, and one number from 1 to 26. If you're lazy you can let the computer randomly pick the numbers for you. In any case, each ticket will cost you 2 dollars and give you one chance to win one of nine prizes that range from 4 dollars up to the grand prize.

How many possible tickets are there? You can choose 5 numbers in \(69\cdot 68\cdot 67\cdot 66\cdot 65 = 69!/64! = 1,348,621,560\) different ways but the order of the numbers doesn't matter so this has to be divided by \(5!=120\), giving

\[\frac{69!}{5! 64!} = \binom{69}{5} = 11,238,513\]

for the number of ways to choose 5 different numbers from 1 to 69 where the order doesn't matter. Note that I used binomial coefficient notation

\[\binom{n}{k}=\frac{n!}{k! (n-k)!}\]

which counts the number of ways \(k\) objects can be chosen from \(n\) when the order doesn't matter.

To complete the ticket you still have to choose one number from 1 to 26, which can obviously be done in 26 different ways. The total number of possible tickets is then

\[26\binom{69}{5} = 292,201,338\]

That's over 292 million tickets and only one of them will get you the grand prize. These are very long odds but hey, it's only 2 dollars for a nonzero chance at becoming a billionaire. Who can resist?

What about the other prizes? The table below shows all the prize amounts in dollars and the number of matches needed to win. The 5 numbers are randomly chosen by drawing balls from a drum containing 69 white balls. The single number, called the powerball, is randomly chosen from a drum containing 26 red balls. In the table, the \(wb\) column is the number of white ball matches and the \(pb\) column indicates the powerball match.

Prize \(wb\) \(pb\)
Grand Prize 5 1
1,000,000 5 0
50,000 4 1
100 4 0
100 3 1
7 3 0
7 2 1
4 1 1
4 0 1

Matching \(k\) white balls means matching \(k\) of the 5 drawn balls and matching \(5-k\) of the other 64 balls. The number of ways this can happen is


To get the probability, divide this by the number of ways 5 white balls can be drawn from 69.


If you're familiar with discrete probability distributions then you may recognize this as a Hypergeometric distribution. In general, a hypergeometric distribution applies when there is a population of size \(N\) that contains a sub-population of size \(K\). The elements of the sub-population are called successes and we want to know the probability of getting \(k\) successes when sampling \(n\) elements from the general population. That probability is given by


For powerball, set \(N=69\), \(K=5\), and \(n=5\) to get the above equation for \(P(wb=k)\). The mean of a Hypergeometric distribution is given by


which for powerball is \(\overline{k}=\frac{25}{69}= 0.3623\). If you play a lot of powerball, this is the average number of white balls you will match.

We still need the probability of matching the powerball. There is only one way to match and there are 26 powerballs so the probability of matching is \(p=1/26\). The probability of not matching is \(1-p=25/26\), therefor


The prize probabilities can then be calculated from the formula


The probabilities are shown in the following table.

\(wb\) \(pb\) Probability
5 1 3.4222978130237e-9
5 0 8.5557445325593e-8
4 1 1.0951353001676e-6
4 0 2.7378382504190e-5
3 1 6.8993523910558e-5
3 0 0.0017248380977639
2 1 0.0014258661608182
1 1 0.0108722294762387
0 1 0.0260933507429730

All these probabilities are very low but there is a way to play that will guarantee you win at least something. If you plan to buy at least 26 tickets then be sure to play all 26 of the powerball numbers. This will guarantee a win of at least 4 dollars so you essentially get 26 tickets for the price of 24.

The most likely result is that you match no white balls and no powerball. The probability of this happening is \(0.6523337685743246\). If you play a lot of powerball then over \(65\) percent of the time you will match nothing. Note also that matching 2 white balls and no powerball or 1 white ball and no powerball gets you nothing. The probabilities for these matches are \(0.03564665402045489\) and \(0.2718057369059686\) respectively.

If powerball had a constant grand prize, represented by the variable \(gp\), and you played it repeatedly, how much could you expect to win on average per game? This is called the expected return. To calculate it, subtract the 2 dollar ticket price from each of the prize values, multiply by the respective probabilities and take the sum. The result is


For a grand prize less than \(490,936,628\) the expected return is negative. If the grand prize is larger than \(491\) million dollars then the expected return becomes positive. Of course the grand prize fluctuates in value and you can't play often enough for this expectation calculation to make any sense because most of the probabilities are so small. Still I use it as a rule of thumb for when to play. If the grand prize gets above \(491\) million, I buy a ticket, just one, otherwise I leave it alone.

Tattoos and the Principle of Maximum Entropy

The following is a simple example of the principle of maximum entropy. It is an excerpt from our new book: Information Theory: A Concise Introduction. It shows how to find a probability distribution that is consistent with the known facts. The correct distribution maximizes the entropy and introduces no additional assumptions.

Sparky is a tattoo artist who offers three different tattoos priced at 8, 12, and 16 dollars. At the end of the week he knows how many tattoos he did in total and how much money he made but he forgot to keep track of how many of the three different tattoos were sold. He asks Spike, his mathematician friend, to help him figure it out. Taking the total amount Sparky made, \(A\), and dividing by the number of tattoos, \(N\), gives Spike the average cost of a tattoo, \(a=A/N\). Letting \(p_1\), \(p_2\), and \(p_3\) be the probabilities of the 8, 12, and 16 dollar tattoos respectively, he can then set up the following equation for the average cost of a tattoo.

\[8p_1 + 12p_2 + 16p_3 = a\]

He gets another equation from the fact that the probabilities must sum to \(1\).

\[p_1 + p_2 + p_3 = 1\]

Now Spike has two equations with three unknowns. There are many possible solutions. How can he find the correct one? He decides to use the distribution that maximizes the entropy which is given by

\[H = -p_1\log(p_1)-p_2\log(p_2)-p_3\log(p_3)\]

Using the two previous equations he can write \(p_2\) and \(p_3\) as follows

\[p_2 = 4 - \frac{a}{4} - 2 p_{1}\] \[p_3 = \frac{a}{4} - 3 + p_{1}\]

Substituting these into the entropy equation gives him an expression for the entropy in terms of the probability \(p_1\). Next he finds the maximum of \(H(p_1)\) by taking the derivative with respect to \(p_1\), setting the result equal to zero and solving for \(p_1\). Checking to make sure he has a maximum and not a minimum, he gets the following expression for the value of \(p_1\) that maximizes the entropy.

\[p_1(a) = \frac{52-3a-\sqrt{64-3(a-12)^2}}{24}\]

The value of \(a\) must be in the range \([8,12]\). At \(a=8\) all the tattoos must have been the 8 dollar tattoo and at \(a=16\) they must all have been the 16 dollar tattoo. Checking, he gets \(p_1(8)=1\) and \(p_1(16)=0\) which is correct. A plot of the three probabilities as a function of \(a\) is shown in the figure below.

The constraints \(p_1(8)=1\) and \(p_1(16)=0\) could have been inferred without using the maximum entropy principle so that a reasonable assumption is \(p_{1}(a)=(16-a)/8\). For \(a=12\), this would result in the probabilities \(p_{1}(12)=1/2\), \(p_{2}(12)=0\), and \(p_{3}(12)=1/2\). The entropy would then be \(H=1\). Using maximum entropy we get \(p_{1}(12)=p_{2}(12)=p_{3}(12)=1/3\) and the entropy is \(H=\log(3)=1.5849625\ldots\). The higher entropy is a result of making fewer assumptions, i.e. no 12 dollar tattoos were sold.

Calculating Logarithms With Continued Fractions

This is a note on how to calculate logarithms in terms of continued fractions. The number \(x=\log_b{a}\) is the log of \(a\) to the base \(b\). It solves the equation \(a=b^x\). To find \(x\), we start by finding the integer \(n_0 \ge 0\) such that

\[b^{n_0} < a < b^{n_0+1}\]

Given \(n_0\), we can write \(a\) as \(a=b^{n_0+1/x_1}\) for some yet to be determined number \(1/x_1 < 1\). Our logarithm is then \(x=n_0+1/x_1\) and the problem is to find \(x_1\). We get an equation for \(x_1\) by defining \(b_1=a/b^{n_0}=b^{1/x_1}\) or \(b=b_1^{x_1}\). This shows that \(x_1\) is the log of \(b\) to the base \(b_1\). As above, we next find the integer \(n_1 \ge 0\) such that

\[b_1^{n_1} < b < b_1^{n_1+1}\]

We can then write \(b\) as \(b=b_1^{n_1+1/x_2}\) for a yet to be determined number \(1/x_2 < 1\). So we have \(x_1=n_1+1/x_2\) and the same procedure repeats to find \(x_2\), \(x_3\), and so on. These terms are related as follows

\[x=n_0+1/x_1\] \[x_1=n_1+1/x_2\] \[x_2=n_2+1/x_3\] \[\cdots\]

In continued fraction form \(x\) is then

\[x = n_0 + \cfrac{1}{n_1 + \cfrac{1}{n_2 + \cfrac{1}{n_3 + \cdots}}}\]

What we have described so far is basically Shanks' algorithm for calculating logarithms. For more information on this algorithm see the following references.

"A Logarithm Algorithm", Daniel Shanks, Mathematical Tables and Other Aids to Computation, Vol. 8, No. 46 (April 1954), pp. 60-64

"On Shanks' Algorithm For Computing The Continued Fraction Of logb.", Terence Jackson and Keith Matthews, Journal of Integer Sequences, 5.2 (2002): 3.

One way to improve the algorithm is to use the following approximation for \(x_i\)


This is an excellent approximation for \(x_i=\log_{b_i}{b_{i-1}}\) in the range \(1\le b_{i-1}\le b_i\). At the endpoints of the range the approximation gives the correct answers \(x_i=0\) and \(x_i=1\). The figure below shows \(x=\log_2{b}\) and the approximation \(x=3\frac{b-1}{b+1}\) for \(1\le b\le 2\). The approximation is excellent near the endpoints of the range and is worse near the center.

It is easy to see that the \(b_i\) terms in the above algorithm approach \(1\) for increasing \(i\) so the approximation for \(x_i\) will also improve. It is always possible to get \(n_i\) by taking the integer part of \(x_i\) but we can also get \(n_{i+1}\) and more terms by doing a continued fraction expansion of \(x_i\). For example we have


With \(b_{i-1}\approx 1\) you can get \(n_{i+2}\), \(n_{i+3}\) and more terms from \(x_i\). If the goal is to calculate the logarithm and not the continued fraction then the approximation for \(x_i\) can be used as the last term in the continued fraction for an excellent and fast approximation of the logarithm.

© 2010-2016 Stefan Hollos and Richard Hollos