Using MATLAB’s Deep Learning Toolbox | Part 2 | by Scott Campit | Jan, 2021


Training a Deep Feedforward Neural Network using Breast Cancer Imaging data

Photo by Alina Grubnyak on Unsplash


In the first part of this 3 article series, we covered MATLAB’s Deep Learning Toolbox (DLT) for training a Shallow Neural Network classifier on breast cancer malignancy data.

Here’s the link to that article if you want to review Part 1:

In this article, we’ll train a deep feedforward neural network on the same Breast Cancer Wisconsin (Diagnostic) Data Set built into MATLAB.

Photo by Katie Rodriguez on Unsplash

Why use MATLAB and the Deep Learning Toolbox?

MATLAB isn’t paying me (call me MathWorks) to review their toolbox. However, the reason why I think more beginners should use MATLAB to get started in deep learning is not to ultimately build an AI application that will be deployed on a large scale.

Instead, this article is meant to demonstrate some of the practical considerations necessary to train a neural network without getting too bogged down in each component’s details.

This way, my hope is to hit the ground running and learn as you go in your AI projects.

Photo by Franki Chamaki on Unsplash

If you want to know precisely why I think MATLAB is the programming language to learn when first getting into data science from a non-programming background, you can read this article I wrote:

First, I’ll briefly describe the dataset, which was obtained from 699 biopsies. Each feature in the cancerInputs variable describes cell attributes (adhesion, size, shape).

The cancerTargets variable is encoded into 0 and 1, describing whether the cell was benign (0) or malignant (1). We need to ensure that the data type for the output is a categorical variable.

Compared to the shallow neural network functions, there are no built-in methods that will automatically split the data into a training, test, and validation set, and MATLAB has not made a function that does this automatically.

I wrote a function that imitates scikit-learn’s train_test_split() function, and called it… well, trainTestSplit(). The function is outlined below:

We’ll first split the training and the test data using this trainTestSplit() function.

Note that the features should be spanning the rows, and the samples should span the columns. Additionally, the data type needs to be converted to a categorical. Finally, we will only use one of the target labels for our binary classification task.

We will train a 5 layer deep feedforward neural network with the following neural network architecture:

A schematic representation of the 5-layer neural network architecture we’ll be training. Note that the output layer is a single vector containing boolean values representing the two cancer states we’re predicting (Benign/Malignant). The figure was designed by Scott Campit.

Compared to our shallow neural network classifier, we need to specify 4 components in this network architecture, which are described below:

  1. Input data layer: The function to call for piping in the input dataset differs on the data type inputted (such as a 2D numerical, tensor, etc.). For our dataset, the sequenceInputLayer() function is needed, requiring the number of features as the parameter.
  2. Hidden layer: To design a fully connected feedforward neural network, we need to call the fullyConnectedLayer() function, which requires the number of activation nodes as the first parameter. The WeightsInitializer() and BiasInitializer() arguments explain how to initialize the weight and bias parameters. In the code below, we’re specifying using small random numbers from a normal distribution for symmetry breaking.
  3. Activation function: Between each hidden layer is the activation function. While we can use the cross-entropy layer for binary classification, we can also use the softmaxLayer() function, which generalizes to multi-class classification problems.
  4. Classification layer: The classificationLayer() decodes the probabilities to their predicted labels.

We’ll store these components into the variable layers:

For a description of the different layer objects available in MATLAB, you can check out the documentation.

From here, we should stop and see if we have the right architecture, and if there are any potential warnings or issues that will cause the training phase to fail.

We can use the analyzeNetwork() function, which takes in the layers variable we just set and visualizes the network architecture.

This function outputs a table showing a summary of the layers we just specified and the number of parameters in the network.

A summary table from the analyzeNetwork() function showing the layer type, the number of nodes in each layer, and the number of parameters the model is learning.

Next, we have to specify the model hyperparameters. The trainingOptions() function creates a model object that lets you feed in the hyperparameters into the neural network function.

The specific arguments we will use are shown below.

Additional hyperparameter options for model training can be found here.

Finally, let’s train the neural network using the trainNetwork() function, which allows us to train other deep neural network architectures.

For more information on other neural networks you can train using this function, you can read the documentation.

This function also outputs a graph that shows the accuracy and cost curves as the model is training in real-time. This graph is shown below.

The training record outputted from the trainNetwork() function. This plot shows the accuracy (top) and cost (bottom) curve as a function of epochs. This graphical user interface pops up during model training.

The main takeaway from this plot is that the cost converges to a minimum value at 200 epochs. If we increase the number of epochs, there is evidence that the model begins overfitting.

To make predictions using the deep neural network model, we can use the built-in classify() function, which returns the target labels given the validation set.

In Part 1, we trained a shallow neural network and evaluated its performance against the validation set.

Let’s see if our deeper neural network performs better than the shallow version by plotting the confusion matrices side-by-side.

The code the generate a confusion matrix in MATLAB is shown below:

Benchmarking the shallow neural network (Accuracy: 96.7%) against the 5-layer neural network (Accuracy: 97.3%).

The 5-layer neural network slightly increases the accuracy of our breast cancer classification task.

In this case, we are observing a general trend of deep neural networks: increasing the depth and the number of nodes in a neural network tends to increase the accuracy.

Photo by Clay Banks on Unsplash

There are methods to perform Bayesian hyperparameter tuning on deep neural networks in MATLAB. However, the process is more involved and will be discussed in more depth in a future article.

We added deep feedforward neural networks to our repertoire of Deep Learning tools and found that it improved our ability to classify whether or not a patient had a benign or malignant tumor.

However, you may have noticed that we have been using numerical data to classify tumor imaging data. What if we wanted to use the raw images themselves?

In the next article, we’ll train a convolutional neural network on images to predict tumor pathogenicity.

Read More …


Write a comment