Markov Chain

Prepared by Bülent Özel, Istanbul Bilgi University

Table of Contents


Introduction

A Markov chain, named in honor of Russian mathematician Andrei Markov, is a stochastic process. Here, we consider the general notion of a discrete time or discrete space Markov chain,  excluding the discussion of continous time Markov chains.  This discrete time stochastic process Xn,  n = 0,1,2,... takes  on a finite or countable number of possible values. This set of possible values will be denoted by the set of nonnegative integers 0,1,2,3,... . If Xn = i, then the process, the Morkov process, is said to be in state i at time n.

Probability Triple of Markov Chain

In general, Markov Chain represents a random motion of some particle moving around the state space S with following descriptive

triple:

  1. A finite state space S.

  2. An initial distribution gf, iS. The non-negative values sums up to 1:
    gf.

  1.       Transition probabilities gf, gf, where,

          p_{ij}=P(X_{n+1}=j\mid X_n=i) and

         gf for each

Mathematically it is a sequence of random variables taking values in S.

eqnfor any gf and .

Markov Chain Examples

Ex.1: (Simple Random Walk)

This Markov chain begins at the point a with a probability 1, and at each step either increases by 1, with probability p, or decreases by 1, with probability 1-p.

Ex. 2:

Let S=gf consist of just three elements, and define the transition probabilities in the following matrix form:

df

This Markov chain jumps around on the three points {1,2,3} in a random and interesting way.


Ex.3: (Random Walk on Z / gf)

Let gf and define the transition probabilities by

gf

We can consider the d elements of S as arranged in a circle, then our particle, at each step, either stays where it is, or moves one step counter-clockwise, each with probability 1/3.


Ex.4: (Ehrenfest’s Urn)

Consider two urns, Urn1 and Urn 2. Suppose there are d balls divided between these urns. At each step choose one ball uniformly at random among d balls and switch it to the opposite urn. Let be the number of balls in Urn1 at time n. Thus,

gf with gf

and gf for gf, where gf if gf.

This Markov chain moves randomly among the possible numbers {0,1,2…d}. The interesting thing about this chain is that when d is large and when the Markov chain runs for a long time, there would most likely be approximately d/2 balls in Urn 1. Next, we will consider various properties of Markov chains that would describe such behaviors.

N-step Transition Probability of Markov Chains

The n-step transition probability gf of the Markov chain is defined as the conditional probability, given that the chain is currently in state i, that it will be in state j after n additional transitions. For a single transition it will be as follows:




Generalizing this for the transition matrix, we would obtain:



Following example will clarify the n-step transition probability of Markov Chains.

Ex. 1: A Very Simple Weather Model

The probabilities of weather conditions, given the weather on the preceding day, can be represented by a transition matrix:

P = \begin{bmatrix}
        0.9 & 0.5 \\
        0.1 & 0.5
    \end{bmatrix}

The matrix P represents the weather model in which a sunny day is 90% likely to be followed by another sunny day, and a rainy day is 50% likely to be followed by another rainy day. The columns can be labelled "sunny" and "rainy" respectively, and the rows can be labelled in the same order.

(P)i j is the probability that, if a given day is of type i, it will be followed by a day of type j.

Notice that the columns of P sum to 1: this is because P is a stochastic matrix.

Predicting the Weather

The weather on day 0 is known to be sunny. This is represented by a vector in which the "sunny" entry is 100%, and the "rainy" entry is 0%:

\mathbf{x}^{(0)} = \begin{bmatrix}
        1 \\
        0
    \end{bmatrix}

The weather on day 1 can be predicted by:

\mathbf{x}^{(1)} = P \mathbf{x}^{(0)} = \begin{bmatrix}
        0.9 & 0.5 \\
        0.1 & 0.5
    \end{bmatrix}
    \begin{bmatrix}
        1 \\
        0
    \end{bmatrix}
    = \begin{bmatrix}
        0.9 \\
        0.1
    \end{bmatrix}

Thus, there is an 90% chance that day 1 will also be sunny.

The weather on day 2 can be predicted in the same way:

\mathbf{x}^{(2)} = P \mathbf{x}^{(1)} = P^2 \mathbf{x}^{(0)}
    = \begin{bmatrix}
        0.9 & 0.5 \\
        0.1 & 0.5
    \end{bmatrix}^2
    \begin{bmatrix}
        1 \\
        0
    \end{bmatrix}
    = \begin{bmatrix}
        0.86 \\
        0.14
    \end{bmatrix}

General rules for day n are:

\mathbf{x}^{(n)} = P \mathbf{x}^{(n-1)}

\mathbf{x}^{(n)} = P^n \mathbf{x}^{(0)}

Steady State of the Weather

In this example, predictions for the weather on more distant days are increasingly inaccurate and tend towards a steady state vector. This vector represents the probabilities of sunny and rainy weather on all days, and is independent of the initial weather.

The steady state vector is defined as:

\mathbf{q} = \lim_{n \to \infty} \mathbf{x}^{(n)}

but only converges if P is a regular transition matrix, that is, there is at least one Pn with all non-zero entries.

Since the q is independent from initial conditions, it must be unchanged when transformed by P. This makes it an eigenvector with eigenvalue 1, and means it can be derived from P. For the weather example:

\begin{matrix}
        P & = & \begin{bmatrix}
            0.9 & 0.5 \\
            0.1 & 0.5
        \end{bmatrix}
        \\
        P \mathbf{q} & = & \mathbf{q}
        & \mbox{(} \mathbf{q} \mbox{ is unchanged by } P \mbox{.)}
        \\
        & = & I \mathbf{q}
        \\
        (I - P) \mathbf{q} & = & \mathbf{0} \\
        & = & \left( \begin{bmatrix}
            1 & 0 \\
            0 & 1
        \end{bmatrix}
        - \begin{bmatrix}
            0.9 & 0.5 \\
            0.1 & 0.5
        \end{bmatrix} \right) \mathbf{q}
        \\
        & = & \begin{bmatrix}
            0.1 & -0.5 \\
            -0.1 & 0.5
        \end{bmatrix} \mathbf{q}
    \end{matrix}

\begin{bmatrix}
        0.1 & -0.5 \\
        -0.1 & 0.5
    \end{bmatrix}
    \begin{bmatrix}
        q_1 \\
        q_2
    \end{bmatrix} = \begin{bmatrix}
        0 \\
        0
    \end{bmatrix}

\begin{bmatrix}
        0.1 & -0.5 \\
        -0.1 & 0.5
    \end{bmatrix}
    \approx \begin{bmatrix}
        1 & -5 \\
        0 & 0
    \end{bmatrix}

q1 - 5q2 = 0

Set s = q2, so 5 s = q1. As the total probability of whether conditions sums up to 1, we want s + 5 s = 1: therefore s = 0.167. Our steady state vector is:

\begin{bmatrix}
        q_1 \\
        q_2
    \end{bmatrix}
    = \begin{bmatrix}
        5 s \\
        s
    \end{bmatrix}
    = \begin{bmatrix}
        0.833 \\
        0.167
    \end{bmatrix}
In conclusion, in the long term, 83% of days are sunny.

Ex. 2: Another Weather Forecast

Suppose that whether it rains today depends on previous weather conditions only from the last two days. Specifically, suppose that if it has rained for the past  two days, then it will rain tommorrow with probability 0.7; if it rained today but not yesterday, then it will rain tommorrow with probability 0.5; if it rained yesterday but not today, then it will rain tommorrow with probability 0.4; if it has not rained past two days, then it will rain tommorrow with probability 0.2.

Let transform this into Markov chains letting the states determined as follows:


Today

Yesterday

S0

1

1

S1

1

0

S2

0

1

S3

0

0

For instance, S0 is the State 0, which descibes that it rained both today and yesterday.

The preceding matrix would then represent a four-state Markov chain whose transition probability matrix is as follows:


gf

Suppose that it rained both in Monday and Tuesday, what is the probability that it will rain in Thursday? In such situation we would apply a 2-step Markov chain based on the following n-step Markov chain property as we have seen in the simple weather model.

gf

Then in order to derive raining probabilities for the day after tommorrow, we would obtain following transition matrix:

gf


Becasuse the chain is in state 0 on Tuesday, and because it will rain on Thursday if the chain is in either state 0 or state 1 on that day,  the desired probability is rain

Classification of States

For the details and proofs, please refere to the bibliography.

Accessible: State j is said to be accessible from state i if gf for some gf
This implies that state j is accessible from state i.  In other words, starting in state i it is possible that the process will ever be in state j.
On the other hand, if j is not accessible from i, then == 0. Note that, as , any state is accessible by itself.


Communication: If j is accessible from i and if i is accessible from j then i,j communicate:


Same classes: if two states, i and j, communicate , they are at the same class.  Two classes are either identical or disjoint. If there is more than one class in Markov chain, then it means that the state-space is divided into a number of separate classes.

Irreducible: Markov chain is said to be irreducible, if there is only one class, that is, all states communicate with each other.

Ex: Consider the Markov chain consisting of the three states 0,1,2 and having transition probability matrix

.

We can show that the transition is possible and conclude that this chain is irreducible. That is, one way of getting from state 0 to state 2 is to go from state 0 to state 1 with probability 1/2 and then go from state 1 to stae 2 with prpbability 1/4.


Ex: Consider a Markov chain consisting of the four states 0,1,2,3 and having transition probability matrix

.

There are three classes {0,1}, {2}, {3} for this Markov chain. Note that while state 0 and 1 are accessible from state 2, the reverse is not true. As state 3 is an absorbing state, no other state is accessible from it.


Recurrent and Transient: For any state let denote the probability that, starting in state i, the process will ever re-enter that state. Then state i is said to be recurrent if and transient if 1.

Scientific Applications

Markov chains are used to model various processes in queueing theory and statistics, and can also be used as a signal model in entropy coding techniques such as arithmetic coding. Markov chains also have many biological applications, particularly population processes, which are useful in modelling processes that are, at least, analogous to biological populations. Furthemore, the concept of Markov chains has been used in bioinformatics as well.


Bibliography



Mini AppendLinx:

Stochastic Process

  A stochastic process is a random function. In practical applications, the domain over which the function is defined is either:

Andrey Markov

Andrei Andreevich Markov (June 2, 1856-July 20, 1922) was a Russian mathematician.

He studied at Petersburg University in 1874 under the instruction of Chebyshev and later in 1886 became a member of the St. Petersburg Academy of Science. He is best known for his work on Markov chains.

Markov Process

A process with the Markov property is usually called a Markov process, and may be described as Markovian.
The most famous Markov processes are Markov chains, but many other processes, including Brownian motion, are Markovian.

Eigenvalue

In linear algebra, a scalar λ is called an eigenvalue of a matrix A if there exists a nonzero vector x such that Axx. The vector x is called an eigenvector.

 

Eigenvector

  In linear algebra, the eigenvectors, from the German, where eigen means "inherent, characteristic",  of a Matrix are non-zero vectors which, when operated on by the Matrix as an operator, result in a scalar multiple of themselves. The scalar is then called the eigenvalue associated with the eigenvector.