Error-Correcting Output Codes (ECOC) for Machine Learning

[ad_1]

Machine studying algorithms, like logistic regression and assist vector machines, are designed for two-class (binary) classification issues.

As such, these algorithms should both be modified for multi-class (greater than two) classification issues or not used in any respect. The Error-Correcting Output Codes technique is a method that enables a multi-class classification downside to be reframed as a number of binary classification issues, permitting using native binary classification fashions for use straight.

Unlike one-vs-rest and one-vs-one strategies that supply an analogous resolution by dividing a multi-class classification downside into a set variety of binary classification issues, the error-correcting output codes method permits every class to be encoded as an arbitrary variety of binary classification issues. When an overdetermined illustration is used, it permits the additional fashions to behave as “error-correction” predictions that can lead to higher predictive efficiency.

In this tutorial, you’ll uncover the right way to use error-correcting output codes for classification.

After finishing this tutorial, you’ll know:

  • Error-correcting output codes is a method for utilizing binary classification fashions on multi-class classification prediction duties.
  • How to suit, consider, and use error-correcting output codes classification fashions to make predictions.
  • How to tune and consider totally different values for the variety of bits per class hyperparameter utilized by error-correcting output codes.

Let’s get began.

Error-Correcting Output Codes (ECOC) for Machine Learning

Error-Correcting Output Codes (ECOC) for Machine Learning
Photo by Fred Hsu, some rights reserved.

Tutorial Overview

This tutorial is split into three components; they’re:

  1. Error-Correcting Output Codes
  2. Evaluate and Use ECOC Classifiers
  3. Tune Number of Bits Per Class

Error-Correcting Output Codes

Classification duties are these the place a label is predictive for a given enter variable.

Binary classification duties are these classification issues the place the goal accommodates two values, whereas multi-class classification issues are those who have greater than two goal class labels.

Many machine learning fashions have been developed for binary classification, though they could require modification to work with multi-class classification issues. For instance, logistic regression and assist vector machines have been particularly designed for binary classification.

Several machine learning algorithms, resembling SVM, have been initially designed to unravel solely binary classification duties.

— Page 133, Pattern Classification Using Ensemble Methods, 2010.

Rather than limiting the selection of algorithms or adapting the algorithms for multi-class issues, another strategy is to reframe the multi-class classification downside as a number of binary classification issues. Two frequent strategies that can be utilized to realize this embrace the one-vs-rest (OvR) and one-vs-one (OvO) methods.

  • OvR: splits a multi-class downside into one binary downside per class.
  • OvO: splits a multi-class downside into one binary downside per every pair of lessons.

Once break up into subtasks, a binary classification mannequin might be match on every activity and the mannequin with the most important response might be taken because the prediction.

Both the OvR and OvO could also be regarded as a sort of ensemble studying mannequin provided that a number of separate fashions are match for a predictive modeling activity and utilized in live performance to make a prediction. In each instances, the prediction of the “ensemble members” is a straightforward winner take all strategy.

… convert the multiclass activity into an ensemble of binary classification duties, whose outcomes are then mixed.

— Page 134, Pattern Classification Using Ensemble Methods, 2010.

For extra on one-vs-rest and one-vs-one fashions, see the tutorial:

A associated strategy is to organize a binary encoding (e.g. a bitstring) to symbolize every class in the issue. Each bit within the string might be predicted by a separate binary classification downside. Arbitrarily, size encodings might be chosen for a given multi-class classification downside.

To be clear, every mannequin receives the complete enter sample and solely predicts one place within the output string. During coaching, every mannequin might be educated to provide the right Zero or 1 output for the binary classification activity. A prediction can then be made for new examples by utilizing every mannequin to make a prediction for the enter to create the binary string, then examine the binary string to every class’s recognized encoding. The class encoding that has the smallest distance to the prediction is then chosen because the output.

A codeword of size l is ascribed to every class. Commonly, the scale of the codewords has extra bits than wanted as a way to uniquely symbolize every class.

— Page 138, Pattern Classification Using Ensemble Methods, 2010.

It is an attention-grabbing strategy that enables the category illustration to be extra elaborate than is required (maybe overdetermined) as in comparison with a one-hot encoding and introduces redundancy into the illustration and modeling of the issue. This is intentional as the extra bits within the illustration act like error-correcting codes to repair, appropriate, or enhance the prediction.

… the thought is that the redundant “error-correcting” bits enable for some inaccuracies, and may enhance efficiency.

— Page 606, The Elements of Statistical Learning, 2016.

This provides the method its title: error-correcting output codes, or ECOC for brief.

Error-Correcting Output Codes (ECOC) is a straightforward but highly effective strategy to take care of a multi-class downside primarily based on the mix of binary classifiers.

— Page 90, Ensemble Methods, 2012.

Care might be taken to make sure that every encoded class has a really totally different binary string encoding. A collection of various encoding schemes has been explored in addition to particular strategies for developing the encodings to make sure they’re sufficiently far aside within the encoding area. Interestingly, random encodings have been discovered to work maybe simply as nicely.

… analyzed the ECOC strategy, and confirmed that random code task labored in addition to the optimally constructed error-correcting codes

— Page 606, The Elements of Statistical Learning, 2016.

For an in depth evaluation of the assorted totally different encoding schemes and strategies for mapping predicted strings to encoded lessons, I like to recommend Chapter 6 “Error Correcting Output Codes” of the e book “Pattern Classification Using Ensemble Methods“.

Evaluate and Use ECOC Classifiers

The scikit-learn library supplies an implementation of ECOC through the OutputCodeClassifier class.

The class takes as an argument the mannequin to make use of to suit every binary classifier, and any machine learning mannequin can be utilized. In this case, we’ll use a logistic regression mannequin, meant for binary classification.

The class additionally supplies the “code_size” argument that specifies the scale of the encoding for the lessons as a a number of of the variety of lessons, e.g. the variety of bits to encode for every class label.

For instance, if we wished an encoding with bit strings with a size of 6 bits, and we had three lessons, then we are able to specify the coding measurement as 2:

  • encoding_length = code_size * num_classes
  • encoding_length = 2 * 3
  • encoding_length = 6

The instance beneath demonstrates the right way to outline an instance of the OutputCodeClassifier with 2 bits per class and utilizing a LogisticRegression mannequin for every bit within the encoding.


Although there are numerous subtle methods to assemble the encoding for every class, the OutputCodeClassifier class selects a random bit string encoding for every class, no less than on the time of writing.

We can discover using the OutputCodeClassifier on an artificial multi-class classification downside.

We can use the make_classification() operate to outline a multi-class classification downside with 1,000 examples, 20 enter options, and three lessons.

The instance beneath demonstrates the right way to create the dataset and summarize the variety of rows, columns, and lessons within the dataset.


Running the instance creates the dataset and studies the variety of rows and columns, confirming the dataset was created as anticipated.

The variety of examples in every class is then reported, exhibiting a virtually equal variety of instances for every of the three configured lessons.


Next, we are able to consider an error-correcting output codes mannequin on the dataset.

We will use a logistic regression with 2 bits per class as we outlined above. The mannequin will then be evaluated utilizing repeated stratified k-fold cross-validation with three repeats and 10 folds. We will summarize the efficiency of the mannequin utilizing the imply and and commonplace deviation of classification accuracy throughout all repeats and folds.


Tying this collectively, the whole instance is listed beneath.


Running the instance defines the mannequin and evaluates it on our artificial multi-class classification dataset utilizing the outlined take a look at process.

Note: Your outcomes might fluctuate given the stochastic nature of the algorithm or analysis process, or variations in numerical precision. Consider operating the instance just a few instances and examine the typical consequence.

In this case, we are able to see that the mannequin achieved a imply classification accuracy of about 76.6 p.c.


We might select to make use of this as our remaining mannequin.

This requires that we match the mannequin on all accessible knowledge and use it to make predictions on new knowledge.

The instance beneath supplies a full instance of the right way to match and use an error-correcting output mannequin as a remaining mannequin.


Running the instance matches the ECOC mannequin on the whole dataset and makes use of the mannequin to foretell the category label for a single row of knowledge.

In this case, we are able to see that the mannequin predicted the category label 0.


Now that we’re aware of the right way to match and use the ECOC mannequin, let’s take a more in-depth take a look at the right way to configure it.

Tune Number of Bits Per Class

The key hyperparameter for the ECOC mannequin is the encoding of sophistication labels.

This consists of properties resembling:

  • The selection of illustration (bits, actual numbers, and many others.)
  • The encoding of every class label (random, and many others.)
  • The size of illustration (variety of bits, and many others.)
  • How predictions are mapped to lessons (distance, and many others.)

The OutputCodeClassifier scikit-learn implementation doesn’t at present present a number of management over these components.

The ingredient it does give management over is the variety of bits used to encode every class label.

In this part, we are able to carry out a handbook grid search throughout totally different numbers of bits per class label and examine the outcomes. This supplies a template that you would be able to adapt and use by yourself mission.

First, we are able to outline a operate to create and return the dataset.


We can then outline a operate that may create a group of fashions to guage.

Each mannequin might be an instance of the OutputCodeClassifier utilizing a LogisticRegression for every binary classification downside. We will configure the code_size of every mannequin to be totally different, with values starting from 1 to 20.


We can consider every mannequin utilizing associated k-fold cross-validation as we did within the earlier part to present a pattern of classification accuracy scores.


We can report the imply and commonplace deviation of the scores for every configuration and plot the distributions as field and whisker plots facet by facet to visually examine the outcomes.


Tying this all collectively, the whole instance of evaluating ECOC classification with a grid of the variety of bits per class is listed beneath.


Running the instance first evaluates every mannequin configuration and studies the imply and commonplace deviation of the accuracy scores.

Note: Your outcomes might fluctuate given the stochastic nature of the algorithm or analysis process, or variations in numerical precision. Consider operating the instance just a few instances and examine the typical consequence.

In this case, we are able to see that maybe 5 or 6 bits per class leads to the very best efficiency with reported imply accuracy scores of about 78.2 p.c and 78.Zero p.c respectively. We additionally see good outcomes for 9, 13, 17, and 20 bits per class, with maybe 17 bits per class giving the very best results of about 78.5 p.c.


A determine is created exhibiting the field and whisker plots for the accuracy scores for every mannequin configuration.

We can see that moreover a worth of 1, the variety of bits per class delivers related outcomes by way of unfold and imply accuracy scores that cluster round 77 p.c. This means that the strategy in all fairness steady throughout configurations.

Box and Whisker Plots of Bits Per Class vs. Distribution of Classification Accuracy for ECOC

Box and Whisker Plots of Bits Per Class vs. Distribution of Classification Accuracy for ECOC

Further Reading

This part supplies extra sources on the subject if you’re trying to go deeper.

Related Tutorials

Papers

Books

APIs

Summary

In this tutorial, you found the right way to use error-correcting output codes for classification.

Specifically, you realized:

  • Error-correcting output codes is a method for utilizing binary classification fashions on multi-class classification prediction duties.
  • How to suit, consider, and use error-correcting output codes classification fashions to make predictions.
  • How to tune and consider totally different values for the variety of bits per class hyperparameter utilized by error-correcting output codes.

Do you’ve any questions?
Ask your questions within the feedback beneath and I’ll do my finest to reply.

[ad_2]

Source hyperlink

Write a comment