Abrazolica


Home     Archive     Tags     About        RSS
Counting runs in sequences

Using \(k\) symbols, how many sequences of length \(n\) have \(r\) runs? To answer this question, start by letting \(*\) represent any symbol. Then a sequence of length \(n\) is just a string such as \(*****\cdots *\). The runs can now be marked off by placing a bar \(|\) where one run ends and another begins. To mark off \(r\) runs we need \(r-1\) bars which can be placed in \(n-1\) positions. For example if \(n=10\) and \(r=3\) then one way to mark off 3 runs is

\[**|*****|***\]

The three runs have lengths 2, 5, and 3. In general, the number of ways to place the \(r-1\) bars in \(n-1\) positions is

\[\binom{n-1}{r-1}\]

Once the runs are marked off the symbol values can be assigned to each run. There are \(k\) choices for the first run and \(k-1\) choices for each subsequent run so the number of ways to assign symbols is

\[k(k-1)^{r-1}\]

The number of sequences of length \(n\) on \(k\) symbols that have \(r\) runs is then the product of these last two expressions

\[k(k-1)^{r-1}\binom{n-1}{r-1}\]

For a given \(n\) and \(k\), the generating function for these numbers is

\[g(z)=kz(1+(k-1)z)^{n-1}\]

The coefficient of \(z^r\) in the expansion of \(g(z)\) is the expression given above for the number of sequences of length \(n\) on \(k\) symbols that have \(r\) runs. If all symbols have probability \(1/k\) then \(g(z)\) can be turned into a probability generating function by dividing the entire expression by \(k^n\). It can then be used to get the following expressions for the average and variance of the number of runs.

\[\mu(n,k)=\frac{(k-1)n+1}{k}\]

\[\sigma^2=\frac{(k-1)(n-1)}{k^2}\]

If the symbol probabilities are not equal you can still calculate average and variance but the process is more difficult. For the binary case, \(k=2\), see our book Coin Tossing: The Hydrogen Atom of Probability.


© 2010-2020 Stefan Hollos and Richard Hollos

submit to reddit   

blog comments powered by Disqus