Home     Archive     Tags     About        RSS
Modeling Markets Using Regular Expressions

What follows is a very simple model for the ebb and flow of irrational exuberance that leads to market cycles. The model is a conceptual toy, something to play with. No proof or claim is made that the model has any use in modeling a real market. Maybe it's a starting point for further research, who knows?

Assume that every day the market closes either above or below its previous day close. Label a close above as \(A\) and a close below as \(B\). The market's history over a period of \(N\) days can then be represented as a string of \(N\) \(A\)'s and \(B\)'s. Finding the probability of such a string is where regular expressions will come in.

The market is modeled as going through a series of \(m\) stages where \(m\) is some small integer. To be specific, let's say \(m=4\). The results can easily be generalized once the general pattern is clear.

In the first stage confidence is low and the market starts with a downward move so the first symbol is a \(B\). At this stage the probability of an upward move is \(p=1/4=0.25\) and the probability of a further downward move is \(1-p=3/4=0.75\). The market may manage one or two up days in this first stage but the probability is low.

Another downward move takes things to the second stage. Now confidence picks up a little, whether by talk of intervention or bargains to be had, and the probability of an upward move rises to \(p=1/2=0.5\). At this point the market is just as likely to go up as down. A few up days are followed by a down day as some jittery investors, not believing in an upward trend, cash out.

This takes things to the third stage where the probability of an up move goes to an exuberant \(p=3/4=0.75\). The market is now significantly more likely to go up than down and a string of several up days is not out of the question. At some point the spell is broken with a down move that takes the market to the fourth and final stage.

The last stage is essentially the same as the first. Investors are asking themselves: What was I thinking? Things are not rosy enough to justify the big upward move and a mood of pessimism sets in.

Now what kind of string of \(A\)'s and \(B\)'s would such a model produce? The string would start with a \(B\) followed by zero or more \(A\)'s followed by a \(B\) and so on to a fourth and final \(B\). In other words a series of increasingly longer stretches of up days punctuated by single down days. The regular expression that matches such a string is \(BA^*BA^*BA^*B\). The reason for constructing a regular expression is that it makes it very easy to find a probability generating function for these strings.

The simplest probability generating function is for the probability that a complete cycle of 4 stages lasts exactly \(n\) days. Call this generating function \(G(z)\). To get \(G(z)\) replace both the \(A\)'s and \(B\)'s in the regular expression by the variable \(z\) and write the Kleene star expression \(z^*\) as follows:


The only thing left is to multiply the \(z\)'s by their correct probabilities. The first \(z\) corresponding to the first \(B\) has probability \(1\) so it is just left alone. The second \(z\) corresponding to the first \(A^*\) term has probability \(1/4\) since this is the probability for a first stage up day. The third \(z\) corresponds to the second \(B\) which has probability \(3/4\). Continuing on with the rest of the terms gives the following expression for \(G(z)\):


This simplifies to:


The generating function contains all the information about this simple model. If \(G(z)\) is expanded as a power series then the coefficient of \(z^n\) in the expansion will be the probability that a complete cycle lasts exactly \(n\) days. You can get an equation for the probability by first doing a partial fraction expansion and then expanding each of the simple fractions separately. To find the average length of a cycle, take the first derivative of \(G(z)\) and set \(z=1\). This gives an average length of 8.33 days for this model.

To get a generating function for only the number of up days, divide \(G(z)\) by \((3/4)(1/2)(1/4)z^4=3z^4/32\) which is the probability for the four \(B\)'s. To get a probability generating function for a given number of up days in the first, second and third stages, replace the \(A^*\) terms in the regular expression with different variables. For example use \(z_1\) for the first term, \(z_2\) for the second, and \(z_3\) for the third. Don't forget to multiply by the corresponding probabilities.

It is easy to extend this simple model to more than four stages. Also, the probabilities I used can easily be changed to something else. The model can be extended to more complex patterns and if you can write a regular expression to match the patterns then finding a probability generating function is straightforward if not easy.

It should be obvious that this can be applied to any process that generates a binary string. The same general idea can also be used to model a process that generates more than two symbols. Applications in DNA sequencing where you have four symbols come to mind.

For general information about generating functions see Herbert Wilf's generatingfunctionology. For specific information about generating functions related to binary strings see The Coin Toss: The Hydrogen Atom of Probability. For general information about regular expressions see Mastering Regular Expressions. There are many books on theoretical computer science that go into depth about regular expressions and regular languages.

© 2010-2012 Stefan Hollos and Richard Hollos

blog comments powered by Disqus