 [ad_1]

## A Beginner’s Guide with R code

Many of you might not have heard of stochastic processes before and be wondering how they might be relevant to you. Firstly, statisticians might find stochastic processes a nice way of modeling probabilistic events. Additionally, those interesting in reinforcement learning may find that this information solidifies their understanding of RL concepts such as Markov Chains. Lastly, this article is short and easy-to-follow, so if you’re curious about stochastic processes themselves, then this is a good introduction.

Any readers of the following tutorial should know matrix operations, such as multiplication, as well as a basic understanding of linear algebra. For the sake of brevity, many linear algebra concepts are not explained in this tutorial. This article has R code for those who are interested, but it is not necessary to follow the code as you read the article.

## Basics and Definition

The first question many of you may have is the exact nature of a stochastic process. The word ‘stochastic’ literally means ‘random’, though stochastic processes are not necessarily random: they can be entirely deterministic, in fact. Simply put, a stochastic process is any mathematical process that can be modeled with a family of random variables. A coin toss is a great example because of its simplicity. We start with a coin head-ups and then flip it exactly once. The probability of the coin landing on heads is .5, and tails is .5. If the coins lands on heads, the ‘state’ of the world is unchanged, while if the coin lands on tails, the ‘state’ has changed. This can be modeled with the matrix below:

Or, a more visual approach:

The rows represent states, with probabilities of transitioning to other states in are in the columns. If you’re in state 0 (heads) and you flip the coin, your chances of ending up in state tails (state 1) can be seen in the (0, 1) entry in the matrix above (.5).

As another example, suppose you have 2 printers in a library. When both printers are working, there is a 30 percent chance that one breaks every day, and a 0 percent chance that they both break. If one printer is broken, then there is a 20 percent chance that a repairman comes by on that day and fixes it, and a 40 percent chance that the second printer breaks. When both are broken, then there is a 100 percent chance that a repairman comes by and fixes both printers. The matrix representing this process is shown below:

Notice that (0, 0) = .7 since (0, 1) = .3 and (0, 2) = 0. Each row must add up to 1 because they’re populated by discrete probability distributions. The same holds true for entry (1, 1). Notice that the final row deterministically goes from state 2 to state 0. To make sure you understand, answer the following questions:

1. What does the entry at (0, 1) represent?

2. What would the new matrix look like if the repairman fixed both printers with P = .7, but sometimes only has time to fix one printer when both are broken with P = .1?

## The Markov Property

Now that we have seen two simple examples, let’s talk about the hallmark property of many random processes: the Markov property. A system or variable is said to be Markov when knowing the current state of the process is just as good as knowing the entire history. In the previous example, if we define the random variable X to be the current number of printers functioning at the nth step of the process (with lowercase x denoting the actual value at some step), then knowing X is sufficient to give a distribution for Xn+1. Simply put:

Intuitively, knowing that both printers are functioning is enough to know the probabilities of all future states. It does not matter how many printers were broken in the previous state or the state before that. A coin toss is another example: the odds of a flipped coin landing on heads is entirely independent of the result of any previous coin tosses. Markov processes are said to be ‘memoryless’ because the history of their states is irrelevant. Only the current state matters (though, not necessarily. In the coin toss example, you don’t even need to know the current state as P(heads) = .5 and P(tails) = .5 in all states). In terms of a reinforcement learning example, where the idea is to specify an optimal move with some probability distribution, many board games can be considered Markov. Chess is one such game, as selecting an optimal move is unaffected by the exact move order preceding the current state. Though there are a few exceptions, such as the en passant rule and whether either side has castled, these can be encoded into the current state, which would make the chain Markov.

## Predicting the Future 🔮

Now that we’ve covered the Markov property, let’s return to the printer example. Notice that it is trivial to calculate P(Xn = 1 | Xn-1 = 0) ( the probability that the chain goes to 1 given that it is currently in state 0) as it is the (0, 1) entry in the matrix. But what if we wanted to calculate 2-step-ahead chance? To do that we would need to sum all possible two-step transitions that start at 0 and lead to 1:

Read More …

[ad_2]