If you really want to be good at solving probability problems then you need at least a basic understanding of combinatorics. This is especially true for discrete probability problems where an insightful solution often involves some kind of combinatorial analysis. The good news is that solving combinatorics problems can be great fun and a good way to sharpen your general problem solving skills. This post discusses a way to categorize a number of combinatorics problems. The categorization leads to a little puzzle that as far as I know has no explanation.

A large number of combinatorial problems are equivalent to counting the ways balls can be put into boxes. The problem comes in four flavors.

Putting numbered balls into numbered boxes.

Putting identical balls into numbered boxes.

Putting numbered balls into identical boxes.

Putting identical balls into identical boxes.

Numbered balls or boxes is just a way of saying that they are distinguishable. We could also call them uniquely colored or marked in some way to make them distinguishable. Identical means there is no way to tell one from the other.

To simplify things let the number of balls and boxes both equal \(n\) and assume there is no restriction as to how many balls can go in a box. The formulas for putting \(n\) balls into \(n\) boxes, corresponding to the above four flavors, are then:

\(n^n\)

\(C(n)=\binom{2n-1}{n-1}\)

\(B(n)=\sum_{k=1}^nS(n,k)\)

\(p(n)\)

It is interesting that as you go down the list things become harder to calculate. The first two formulas are easy enough to calculate. The third formula calculates the number \(B(n)\) which is called a Bell number. It is equal to the number of ways you can partition a set of size \(n\). This is equivalent to the number of ways you can put \(n\) numbered balls into \(n\) identical boxes. The terms \(S(n,k)\) in the sum are called Stirling numbers of the second kind. They count the number of ways to partition a set of size \(n\) into \(k\) nonempty subsets.

The last equation is simply given as the number \(p(n)\) which is called an integer partition number because it is equal to the number of ways you can write \(n\) as a sum of positive integers. For example \(n=4\) can be written in 5 ways:

- 4
- 3 + 1
- 2 + 2
- 2 + 1 + 1
- 1 + 1 + 1 + 1

Clearly these are also the ways you can put 4 identical balls into 4 identical boxes. In the first partition all 4 balls are in the same box. In the second partition 3 balls are in one box and 1 ball is in another box, and so on. I did not give a formula for \(p(n)\) because there is no general formula. There are asymptotic formulas for \(p(n)\) but the fact is that \(p(n)\) is intrinsically harder to calculate than the other numbers.

Below is a table of the last three functions for \(n\) from 1 to 10. I left out \(n^n\) since it gets very large very quickly.

n | p(n) | B(n) | C(n) | ||||||
---|---|---|---|---|---|---|---|---|---|

1 | 1 | 1 | 1 | ||||||

2 | 2 | 2 | 3 | ||||||

3 | 3 | 5 | 10 | ||||||

4 | 5 | 15 | 35 | ||||||

5 | 7 | 52 | 126 | ||||||

6 | 11 | 203 | 462 | ||||||

7 | 15 | 877 | 1716 | ||||||

8 | 22 | 4140 | 6435 | ||||||

9 | 30 | 21147 | 24310 | ||||||

10 | 42 | 115975 | 92378 |

There is a small puzzle in these numbers. You can see that for \(n< 10\) we have \(B(n)< C(n)\) and for \(n\ge 10\) we have \(B(n) >C(n)\) (the condition holds beyond what is shown in the table). The question is why does this switch occur at \(n=10\). Is there a combinatorial explanation? It seems there should be a kind of symmetry or relationship between putting identical balls into numbered boxes and putting numbered balls into identical boxes. There must be a way to construct a map between these two kinds of distributions that would explain the numbers.

One way to relate the distributions is through code words. A \(C(n)\) distribution can be described by a string of occupancy numbers. These are numbers that count how many balls are in each box. For example with \(n=2\) you can have 2 balls in the first box, or 2 in the second, or 1 in each. The code words for these distributions are 20, 02, and 11. A \(B(n)\) distribution can be encoded using code words that indicate which elements are in the same subset of the partition. For example with \(n=3\) and the set \(S=(a,b,c)\) the code words are: 000, 001, 010, 011, 012, corresponding to the partitions: \((a,b,c)\), \((a,b)(c)\), \((a,c)(b)\), \((a)(b,c)\), \((a)(b)(c)\). You can map a \(C(n)\) code word to a \(B(n)\) code word by replacing the first number in the \(C(n)\) word, and all its occurrences, with 0, then take the next of the original numbers and replace it and all its occurrences with 1 and so on until all the numbers have been replaced. The result will be a \(B(n)\) code word. Doing this for \(n=3\) gives the following:

C(3) | B(3) |
---|---|

111 | 000 |

003 | 001 |

030 | 010 |

300 | 011 |

210 | 012 |

120 | 012 |

201 | 012 |

021 | 012 |

102 | 012 |

012 | 012 |

There are 10 \(C(3)\) code words with 4 of them mapping onto 4 unique \(B(3)\) code words and 6 of them mapping onto 1 \(B(3)\) code word. All 5 of the \(B(3)\) code words are accounted for by one or more of the 10 \(C(3)\) code words. With \(n=4\) there are 5 \(B(4)\) words with unique \(C(4)\) words, 3 of them with 2 \(C(4)\) words each, and 6 of them with 4 \(C(4)\) words each. The \(C(4)\) code words account for 14 of the 15 \(B(4)\) code words. The one unaccounted \(B(4)\) code word is 0123 which would require a \(C(4)\) code word with 4 unique occupancy numbers which is impossible. The four smallest unique occupancy numbers are 0 1 2 3 and they sum to 6.

One way to solve the puzzle then is to explain how the above mapping works for all \(n\). How many \(C(n)\) code words map to a given \(B(n)\) code word? How many \(B(n)\) code words have no corresponding \(C(n)\) code word? The answers may be interesting.

If you're new to combinatorics here is a short introduction.

© 2010-2012 Stefan Hollos and Richard Hollos

blog comments powered by Disqus