# Abrazolica

Counting runs in sequences

Using $k$ symbols, how many sequences of length $n$ have $r$ runs? To answer this question, start by letting $*$ represent any symbol. Then a sequence of length $n$ is just a string such as $*****\cdots *$. The runs can now be marked off by placing a bar $|$ where one run ends and another begins. To mark off $r$ runs we need $r-1$ bars which can be placed in $n-1$ positions. For example if $n=10$ and $r=3$ then one way to mark off 3 runs is

$**|*****|***$

The three runs have lengths 2, 5, and 3. In general, the number of ways to place the $r-1$ bars in $n-1$ positions is

$\binom{n-1}{r-1}$

Once the runs are marked off the symbol values can be assigned to each run. There are $k$ choices for the first run and $k-1$ choices for each subsequent run so the number of ways to assign symbols is

$k(k-1)^{r-1}$

The number of sequences of length $n$ on $k$ symbols that have $r$ runs is then the product of these last two expressions

$k(k-1)^{r-1}\binom{n-1}{r-1}$

For a given $n$ and $k$, the generating function for these numbers is

$g(z)=kz(1+(k-1)z)^{n-1}$

The coefficient of $z^r$ in the expansion of $g(z)$ is the expression given above for the number of sequences of length $n$ on $k$ symbols that have $r$ runs. If all symbols have probability $1/k$ then $g(z)$ can be turned into a probability generating function by dividing the entire expression by $k^n$. It can then be used to get the following expressions for the average and variance of the number of runs.

$\mu(n,k)=\frac{(k-1)n+1}{k}$

$\sigma^2=\frac{(k-1)(n-1)}{k^2}$

If the symbol probabilities are not equal you can still calculate average and variance but the process is more difficult. For the binary case, $k=2$, see our book Coin Tossing: The Hydrogen Atom of Probability.

© 2010-2020 Stefan Hollos and Richard Hollos 