Home     Archive     Tags     About        RSS
Counting runs in sequences

Using \(k\) symbols, how many sequences of length \(n\) have \(r\) runs? To answer this question, start by letting \(*\) represent any symbol. Then a sequence of length \(n\) is just a string such as \(*****\cdots *\). The runs can now be marked off by placing a bar \(|\) where one run ends and another begins. To mark off \(r\) runs we need \(r-1\) bars which can be placed in \(n-1\) positions. For example if \(n=10\) and \(r=3\) then one way to mark off 3 runs is


The three runs have lengths 2, 5, and 3. In general, the number of ways to place the \(r-1\) bars in \(n-1\) positions is


Once the runs are marked off the symbol values can be assigned to each run. There are \(k\) choices for the first run and \(k-1\) choices for each subsequent run so the number of ways to assign symbols is


The number of sequences of length \(n\) on \(k\) symbols that have \(r\) runs is then the product of these last two expressions


For a given \(n\) and \(k\), the generating function for these numbers is


The coefficient of \(z^r\) in the expansion of \(g(z)\) is the expression given above for the number of sequences of length \(n\) on \(k\) symbols that have \(r\) runs. If all symbols have probability \(1/k\) then \(g(z)\) can be turned into a probability generating function by dividing the entire expression by \(k^n\). It can then be used to get the following expressions for the average and variance of the number of runs.



If the symbol probabilities are not equal you can still calculate average and variance but the process is more difficult. For the binary case, \(k=2\), see our book Coin Tossing: The Hydrogen Atom of Probability.

© 2010-2020 Stefan Hollos and Richard Hollos

submit to reddit   

blog comments powered by Disqus