Statistical modeling with “Pomegranate” —fast and intuitive | by Tirthajyoti Sarkar | Dec, 2020
We can easily model a simple Markov chain with Pomegranate and calculate the probability of any given sequence.
We will have the quintessential rainy-cloudy-sunny example for this one. Here is the transition probability table,
We also know that, on average, there are 20% rainy days, 50% sunny days, and 30% cloudy days. We encode both the discrete distribution and the transition matrix in the
Now, we can calculate the probability of any given sequence using this object. For example, if you look at the table above, you can convince yourself that a sequence like “Cloudy-Rainy-Cloudy” has a high likelihood whereas a sequence like “Rainy-Sunny-Rainy-Sunny” is unlikely to show up. We can confirm this with precise probability calculations (we take logarithm to handle small probability numbers),
Fitting data to a GMM class
We write a small function to generate a random sequence of rainy-cloudy-sunny days and feed that to the GMM class.
First, we feed this data for 14 days’ observation— “Rainy-Sunny-Rainy-Sunny-Rainy-Sunny-Rainy-Rainy-Sunny-Sunny-Sunny-Rainy-Sunny-Cloudy”. The probability transition table is calculated for us.
If we generate a random sequence of 10 years i.e. 3650 days, then we get the following table. Because our random generator is uniform, as per the characteristic of a Markov Chain, the transition probabilities will assume limiting values of ~0.333 each.
There are a lot of cool things you can do with the HMM class in Pomegranate. Here, we just show a small example of detecting the high-density occurrence of a sub-sequence within a long string using HMM predictions.
Let’s say we are recording the names of four characters in a Harry Potter novel as they appear one after another in a chapter, and we are interested in detecting some portion where Harry and Dumbledore are appearing together. However, because they may be conversing and may mention Ron or Hagrid’s names in these portions, the sub-sequence is not clean i.e. it does not strictly contain Harry and Dumbledore’s names.
Following code initiates a uniform probability distribution, a skewed probability distribution, two states with names, and the HMM model with these states.
Then, we need to add the state transition probabilities and ‘bake’ the model for finalizing the internal structure. Note the high self-loop probabilities for the transition i.e. the states tend to stay in their current state with high likelihood.
Now, we have an observed sequence and we will feed this to the HMM model as an argument to the
predict method. The transition and emission probabilities will be calculated and a sequence of 1’s and 0’s will be predicted where we can notice the island of 0’s indicating the portion rich with the appearance of ‘Harry-Dumbledore’ together.
In this article, we introduced a fast and intuitive statistical modeling library called Pomegranate and showed some interesting usage examples. Many more tutorials can be found here. Examples in this article were also inspired by these tutorials.
The library offers utility classes from various statistical domains — general distributions, Markov chain, Gaussian Mixture Models, Bayesian networks — with uniform API that can be instantiated quickly with observed data and then can be used for parameter estimation, probability calculations, and predictive modeling.
Read More …