ŷhat | Self-Organising Maps: In Depth
About David: David Asboth is a Data Scientist with a software program improvement background. He’s had many various job titles over time, with a typical theme: he solves human issues with computer systems and knowledge. This put up initially appeared on his weblog, davidasboth.com
In Part 1, I launched the idea of Self-Organising Maps (SOMs). Now in Part 2 I wish to step by the method of coaching and utilizing a SOM – each the instinct and the Python code. At the tip I’ll additionally current a few actual life use circumstances, not simply the toy instance we’ll use for implementation.
The very first thing we’d like is an issue to resolve!
I’ll use the color map because the walkthrough instance as a result of it lends itself very properly to visualisation.
Our knowledge shall be a group of random colors, so first we’ll artificially create a dataset of 100. Each color is a 3D vector representing R, G and B values:
import numpy as np raw_data = np.random.randint(0, 255, (3, 100))
That’s merely 100 rows of 3D vectors all between the values of Zero and 255.
Just to be clear, right here’s what we’re making an attempt to do. We wish to take our 3D color vectors and map them onto a 2D floor in such a method that related colors will find yourself in the identical space of the 2D floor.
Before coaching a SOM we have to resolve on a couple of parameters.
First of all, its dimensionality. In principle, a SOM could be any variety of dimensions, however for visualization functions it’s usually 2D and that’s what I’ll be utilizing too.
We additionally have to resolve the variety of neurons within the 2D grid. This is a type of selections in machine learning which may as properly be black magic, so we in all probability have to attempt a couple of sizes to get one which feels proper.
Remember, that is unsupervised studying, which means no matter reply the algorithm comes up with must be evaluated considerably subjectively. It’s typical in an unsupervised drawback (e.g. k-means clustering) to do a number of runs and see what works.
I’ll go together with a 5 by 5 grid. I assume one rule of thumb needs to be to make use of fewer neurons than you’ve knowledge factors, in any other case they won’t overlap. As we’ll see we really need them to overlap, as a result of having a number of 3D vectors mapping to the identical level in 2D is how we discover similarities between our knowledge factors.
One necessary side of the SOM is that every of the 2D factors on the grid really signify a multi-dimensional weight vector. Each level on the SOM has a weight vector related to it that’s the similar variety of dimensions as our enter knowledge, on this case Three to match the three dimensions of our colors. We’ll see why that is necessary after we undergo the implementation.
Training the SOM is an iterative course of – it would get higher at its job with each iteration, so we’d like a cutoff level. Our drawback is kind of small so 2,000 iterations ought to suffice however in greater issues it’s fairly potential to wish over 10,000.
We additionally want a studying price. The studying price decides by how a lot we apply modifications to our SOM at every iteration.
If it’s too excessive, we are going to hold making drastic modifications to the SOM and would possibly by no means choose an answer.
If it’s too low, we’ll by no means get something executed as we are going to solely make very small modifications.
In apply it’s best to begin with a bigger studying price and cut back it slowly over time. This is in order that the SOM can begin by making large modifications however then settle into an answer after some time.
For the remainder of this put up I’ll use 3D to confer with the dimensionality of the enter knowledge (which in actuality may very well be any variety of dimensions) and 2D because the dimensionality of the SOM (which we resolve and is also any quantity).
To setup the SOM we have to begin with the next:
- Decide on and initialise the SOM parameters (as above)
- Setup the grid by making a 5×5 array of random 3D weight vectors
network_dimensions = np.array([5, 5]) n_iterations = 2000 init_learning_rate = 0.01 # set up measurement variables primarily based on knowledge m = raw_data.form n = raw_data.form # weight matrix (i.e. the SOM) must be one m-dimensional vector for every neuron within the SOM web = np.random.random((network_dimensions, network_dimensions, m)) # preliminary neighbourhood radius init_radius = max(network_dimensions, network_dimensions) / 2 # radius decay parameter time_constant = n_iterations / np.log(init_radius)
Those final two parameters relate to the 2D neighbourhood of every neuron within the SOM throughout coaching. We’ll return to these within the studying part. Like the educational price, the preliminary 2D radius will embody many of the SOM and can step by step lower because the variety of iterations will increase.
Another element to debate at this level is whether or not or not we normalise our dataset.
First of all, SOMs prepare quicker (and “better”) if all our values are between Zero and 1. This is usually true with machine learning issues, and it’s to keep away from certainly one of our dimensions “dominating” the others within the studying course of. For instance, if certainly one of our variable was wage (within the 1000’s) and one other was top (in metres, so hardly ever over 2.0) then wage will get a better significance just because it has a lot greater values. Normalising to the unit interval will take away this impact.
In our case all Three dimensions confer with a worth between Zero and 255 so we will normalise your complete dataset directly. However, if our variables have been on completely different scales we must do that column by column.
I don’t need this code to be solely tailor-made to the color dataset so I’ll depart the normalisation choices tied to some Booleans which might be straightforward to alter.
normalise_data = True # if True, assume all knowledge is on widespread scale # if False, normalise to [0 1] vary alongside every column normalise_by_column = False # we wish to make a copy of the uncooked knowledge for later knowledge = raw_data # verify if knowledge must be normalised if normalise_data: if normalise_by_column: # normalise alongside every column col_maxes = raw_data.max(axis=0) knowledge = raw_data / col_maxes[np.newaxis, :] else: # normalise total dataset knowledge = raw_data / knowledge.max()
Now we’re prepared to begin the educational course of.
In broad phrases the educational course of shall be as follows. We’ll fill within the implementation particulars as we go alongside.
For a single iteration:
- Find the neuron within the SOM whose related 3D vector is closest to our chosen 3D color vector. At every step, that is known as the Best Matching Unit (BMU)
- Move the BMU’s 3D weight vector nearer to the enter vector in 3D house
- Identify the 2D neighbours of the BMU and likewise transfer their 3D weight vectors nearer to the enter vector, though by a smaller quantity
- Update the educational price (cut back it at every iteration)
And that’s it. By doing the penultimate step, shifting the BMU’s neighbours, we’ll obtain the specified impact that colors which might be shut in 3D house shall be mapped to related areas in 2D house.
Let’s step by this in additional element, with code.
1. Select a Random Input Vector
This is simple:
# choose a coaching instance at random t = knowledge[:, np.random.randint(0, n)].reshape(np.array([m, 1]))
2. Find the Best Matching Unit
# discover its Best Matching Unit bmu, bmu_idx = find_bmu(t, web, m)
For that to work we’d like a operate to seek out the BMU. It have to iterate by every neuron within the SOM, measure its Euclidean distance to our enter vector and return the one which’s closest. Note the implementation trick of not really measuring Euclidean distance, however the squared Euclidean distance, thereby avoiding an costly sq. root computation.
def find_bmu(t, web, m): """ Find one of the best matching unit for a given vector, t, within the SOM Returns: a (bmu, bmu_idx) tuple the place bmu is the high-dimensional BMU and bmu_idx is the index of this vector within the SOM """ bmu_idx = np.array([0, 0]) # set the preliminary minimal distance to an enormous quantity min_dist = np.iinfo(np.int).max # calculate the high-dimensional distance between every neuron and the enter for x in vary(web.form): for y in vary(web.form): w = web[x, y, :].reshape(m, 1) # do not trouble with precise Euclidean distance, to keep away from costly sqrt operation sq_dist = np.sum((w - t) ** 2) if sq_dist < min_dist: min_dist = sq_dist bmu_idx = np.array([x, y]) # get vector equivalent to bmu_idx bmu = web[bmu_idx, bmu_idx, :].reshape(m, 1) # return the (bmu, bmu_idx) tuple return (bmu, bmu_idx)
3. Update the SOM Learning Parameters
As described above, we wish to decay the educational price over time to let the SOM “settle” on an answer.
What we additionally decay is the neighbourhood radius, which defines how far we seek for 2D neighbours when updating vectors within the SOM. We wish to step by step cut back this over time, like the educational price. We’ll see this in a bit extra element in step 4.
# decay the SOM parameters r = decay_radius(init_radius, i, time_constant) l = decay_learning_rate(init_learning_rate, i, n_iterations)
The features to decay the radius and studying price use exponential decay:
Where $lambda$ is the time fixed (which controls the decay) and $sigma$ is the worth at numerous instances $t$.
def decay_radius(initial_radius, i, time_constant): return initial_radius * np.exp(-i / time_constant) def decay_learning_rate(initial_learning_rate, i, n_iterations): return initial_learning_rate * np.exp(-i / n_iterations)
3. Move the BMU and its Neighbours in 3D Space
Now that we have now the BMU and the proper studying parameters, we’ll replace the SOM in order that this BMU is now nearer in 3D house to the color that mapped to it. We will even establish the neurons which might be near the BMU in 2D house and replace their 3D vectors to maneuver “inwards” in direction of the BMU.
The formulation to replace the BMU’s 3D vector is:
That is to say, the brand new weight vector would be the present vector plus the distinction between the enter vector $V$ and the burden vector, multiplied by a studying price $L$ at time $t$.
We are actually simply shifting the burden vector nearer to the enter vector.
We additionally establish all of the neurons within the SOM which might be nearer in 2D house than our present radius, and likewise transfer them nearer to the enter vector.
The distinction is that the burden replace shall be proportional to their 2D distance from the BMU.
One very last thing to notice: this proportion of 2D distance isn’t uniform, it’s Gaussian. So think about a bell form centred across the BMU – that’s how we resolve how a lot to tug the neighbouring neurons in by.
Concretely, that is the equation we’ll use to calculate the affect $i$:
the place $d$ is the 2D distance and $sigma$ is the present radius of our neighbourhood.
Putting that each one collectively:
def calculate_influence(distance, radius): return np.exp(-distance / (2* (radius**2)) # now we all know the BMU, replace its weight vector to maneuver nearer to enter # and transfer its neighbours in 2-D house nearer # by an element proportional to their 2-D distance from the BMU for x in vary(web.form): for y in vary(web.form): w = web[x, y, :].reshape(m, 1) # get the 2-D distance (once more, not the precise Euclidean distance) w_dist = np.sum((np.array([x, y]) - bmu_idx) ** 2) # if the gap is inside the present neighbourhood radius if w_dist <= r**2: # calculate the diploma of affect (primarily based on the 2-D distance) affect = calculate_influence(w_dist, r) # now replace the neuron's weight utilizing the formulation: # new w = previous w + (studying price * affect * delta) # the place delta = enter vector (t) - previous w new_w = w + (l * affect * (t - w)) # commit the brand new weight web[x, y, :] = new_w.reshape(1, 3)
Repeating the educational steps 1-Four for two,000 iterations needs to be sufficient. We can all the time run it for extra iterations afterwards.
Handily, the 3D weight vectors within the SOM can be interpreted as colors, since they’re simply 3D vectors similar to the inputs.
To that finish, we will visualise them and give you our closing color map:
None of these colors essentially needed to be in our dataset. By shifting the 3D weight vectors to extra carefully match our enter vectors, we’ve created a 2D color house which clearly exhibits the connection between colors. More blue colors will map to the left a part of the SOM, whereas reddish colors will map to the underside, and so forth.
Finding a 2D color house is an effective visible option to get used to the thought of a SOM. However, there are clearly sensible functions of this algorithm.
A dataset favoured by the machine learning group is Sir Ronald Fisher’s dataset of measurements of irises. There are 4 enter dimensions: petal width, petal size, sepal width and sepal size and we might use a SOM to seek out related flowers.
Applying the iris knowledge to a SOM after which retrospectively colouring every level with their true class (to see how good the SOM was at separating the irises into their distinct classes) we get one thing like this:
This is a 10 by 10 SOM and every of the small factors is without doubt one of the irises from the dataset (with added jitter to see a number of factors on a single SOM neuron). I added the colors after coaching, and you’ll fairly clearly see the three distinct areas the SOM has divided itself into.
There are a couple of SOM neurons the place each the inexperienced and the blue factors get assigned to, and this represents the overlap between the versicolor and virginica varieties.
Another software I touched on in Part 1 is making an attempt to establish handwritten characters.
In this case, the inputs are high-dimensional – every enter dimension represents the grayscale worth of 1 pixel on a 28 by 28 picture. That makes the inputs 784-dimensional (every dimension is a worth between Zero and 255).
Mapping them to a 20 by 20 SOM, and once more retrospectively colouring them primarily based on their true class (a quantity from Zero to 9) yields this:
In this case the true courses are labelled in response to the colors within the backside left.
What you’ll be able to see is that the SOM has efficiently divided the 2D house into areas. Despite some overlap, most often related digits get mapped to the identical area.
For instance, the yellow area is the place the 6s have been mapped, and there may be little overlap with different classes. Whereas within the backside left, the place the inexperienced and brown factors overlap, is the place the SOM was “confused” between 4s and 9s. A visible inspection of a few of these handwritten characters exhibits that certainly most of the 4s and 9s are simply confused.
I hope this was a helpful walkthrough on the instinct behind a SOM, and a easy Python implementation. There is a Jupyter pocket book for the color map instance.
Mat Buckland’s wonderful rationalization and code walkthrough of SOMs was instrumental in serving to me study. My code is kind of a Python port of his C++ implementation. Reading his posts ought to fill in any gaps I could haven’t lined.