Abrazolica
Home     Archive     Tags     About        RSS
Using \(k\) symbols, how many sequences of length \(n\) have \(r\) runs? To answer this question, start by letting \(*\) represent any symbol. Then a sequence of length \(n\) is just a string such as \(*****\cdots *\). The runs can now be marked off by placing a bar \(|\) where one run ends and another begins. To mark off \(r\) runs we need \(r-1\) bars which can be placed in \(n-1\) positions. For example if \(n=10\) and \(r=3\) then one way to mark off 3 runs is
\[**|*****|***\]
The three runs have lengths 2, 5, and 3. In general, the number of ways to place the \(r-1\) bars in \(n-1\) positions is
\[\binom{n-1}{r-1}\]
Once the runs are marked off the symbol values can be assigned to each run. There are \(k\) choices for the first run and \(k-1\) choices for each subsequent run so the number of ways to assign symbols is
\[k(k-1)^{r-1}\]
The number of sequences of length \(n\) on \(k\) symbols that have \(r\) runs is then the product of these last two expressions
\[k(k-1)^{r-1}\binom{n-1}{r-1}\]
For a given \(n\) and \(k\), the generating function for these numbers is
\[g(z)=kz(1+(k-1)z)^{n-1}\]
The coefficient of \(z^r\) in the expansion of \(g(z)\) is the expression given above for the number of sequences of length \(n\) on \(k\) symbols that have \(r\) runs. If all symbols have probability \(1/k\) then \(g(z)\) can be turned into a probability generating function by dividing the entire expression by \(k^n\). It can then be used to get the following expressions for the average and variance of the number of runs.
\[\mu(n,k)=\frac{(k-1)n+1}{k}\]
\[\sigma^2=\frac{(k-1)(n-1)}{k^2}\]
If the symbol probabilities are not equal you can still calculate average and variance but the process is more difficult. For the binary case, \(k=2\), see our book Coin Tossing: The Hydrogen Atom of Probability.
© 2010-2020 Stefan Hollos and Richard Hollos
Tweet   
blog comments powered by Disqus