ŷhat | Self-Organising Maps: An Introduction
About David: David Asboth is a Data Scientist with a software program improvement background. He’s had many alternative job titles over time, with a standard theme: he solves human issues with computer systems and knowledge. This publish initially appeared on his weblog, davidasboth.com
When you study machine learning strategies, you often get a choice of the standard suspects. Something like: Support Vector Machines, determination timber/random forests, and logistic regression for classification, linear regression for regression, k-means for clustering and maybe PCA for dimensionality discount.
In reality, KDNuggets has a very good publish concerning the 10 machine learning algorithms it is best to know.
If you need to study machine learning strategies, it is best to begin there. The level is, as regards to these algorithms the web has you coated.
In this publish I need to speak about a much less prevalent algorithm, however one which I like and that may be helpful for various functions.
It’s referred to as a Self-Organising Map (SOM).
SOMs are a sort of synthetic neural community. Some of the ideas date again additional, however SOMs had been proposed and have become widespread within the 1980s, by a Finnish professor named Teuvo Kohonen. Unsurprisingly SOMs are additionally known as Kohonen maps.
Artificial Neural Networks
Artifical neural networks (ANNs) had been designed initially to be a computational illustration of what’s believed to occur within the mind. The means alerts are handed alongside an ANN is predicated on how alerts cross between neurons within the mind.
ANNs are constructed as a collection of layers of linked nodes. The first layer consists of your inputs, the final layer consists of your outputs, and there are any variety of so-called hidden layers in between.
The broad concept of an ANN is that you simply give it a dataset and a set of desired outputs, and it learns to map the inputs to the outputs. A traditional instance is instructing an ANN to recognise handwritten characters by giving it pixel values as inputs and the right digit (say a quantity from 0-9) because the output.
During the coaching part it learns the associations between pixel values and the digits. Then, you can provide it a brand new set of inputs, digits it hasn’t seen earlier than, and will probably be capable of recognise them.
Here is such a system recognising characters in actual time. It was constructed by Yann LeCun within the 1990s.
The means most ANNs “learn” a specific downside is by error-correcting. That is, throughout the coaching part they adapt and enhance based mostly on the errors they make, and incrementally get higher at fixing the issue.
This is a supervised machine learning downside since you are telling the algorithm the specified reply for every set of inputs it’s skilled on, so it is aware of if it makes errors.
The SOM as an ANN
There are three foremost methods during which a Self-Organising Map is totally different from a “standard” ANN:
- A SOM shouldn’t be a collection of layers, however sometimes a 2D grid of neurons
- They don’t be taught by error-correcting, they implement one thing referred to as aggressive studying
- They take care of unsupervised machine learning issues
Competitive studying within the case of a SOM refers to the truth that when an enter is “presented” to the community, solely one of many neurons within the grid can be activated. In a means the neurons on the grid “compete” for every enter.
The unsupervised facet of a SOM refers to the concept that you current your inputs to it with out associating them with an output. Instead, a SOM is used to search out construction in your knowledge.
What is a SOM used for?
This final level about unsupervised studying brings me to an necessary query, as a result of summary ideas like neural networks are nice to speak about however I’m a sensible type of man.
In that spirit then, what’s a SOM used for?
A traditional instance of what clustering algorithms are used for is discovering related clients in your buyer base. SOMs can even do that. In reality, a SOM is supposed to be a 2D illustration of your multi-dimensional dataset. In this 2D illustration, every of your authentic inputs, e.g. every of your clients, maps to one of many nodes on the 2D grid. Most importantly, related (high-dimensional) inputs will map to the identical 2D node, or at the very least the identical area in 2D area. This is how the SOM finds and teams related inputs collectively.
Related to discovering construction is the truth that by discovering this construction a SOM finds a lower-dimensional illustration of your dataset whereas preserving the similarity between your data.
That is, knowledge factors which are “nearby” in high-dimensional area will even be close by within the SOM.
By making a (sometimes) 2D illustration of your dataset you may also extra simply visualise it, which you’ll’t do in case your knowledge has greater than Three dimensions.
To summarise, I’ll quote a solution I gave on StackOverflow to a query about SOMs:
The concept behind a SOM is that you simply’re mapping high-dimensional vectors onto a smaller dimensional (sometimes 2D) area. You can consider it as clustering, like in Ok-means, with the added distinction that vectors which are shut within the high-dimensional area additionally find yourself being mapped to nodes which are shut in 2D area.
SOMs due to this fact are mentioned to “preserve the topology” of the unique knowledge, as a result of the distances in 2D area replicate these within the high-dimensional area. Ok-means additionally clusters related knowledge factors collectively, however its closing “representation” is difficult to visualise as a result of it’s not in a handy 2D format.
A typical instance is with colors, the place every of the info factors are 3D vectors that signify R,G,B colors. When mapped to a 2D SOM you may see areas of comparable colors start to develop, which is the topology of the color area.
I hope that sounds fascinating, as a result of in Part 2 of this publish (approaching Thursday) I’ll talk about some concrete examples and stroll via a Python implementation of Self-Organising Maps.
The instance we’ll be working with is utilizing a 3D dataset of colors (the place the three dimensions are R, G and B) and producing a 2D SOM the place we visualise the “topology” of the 3D color area.
Something like this:
Stay tuned & see you in a pair days!