Learn How Neural Networks Learn. It’s not so different from humans… | by Anna Shi | Nov, 2020


What was the problem? Well, I trusted my sister more than my parents to give me the right answer. I assigned more weight to my sister’s explanation than my parents’ answer. Next time, I’ll know to listen more to my parents and less to my sister. This process of adjusting weights is called backpropagation.

When a neural network is first set up, the weights and biases in each layer are randomized (we can think of a bias as a weight). The prediction is wildly inaccurate as a result. Backpropagation updates the weights and biases of each of the nodes until the model can make consistently accurate predictions.

Here were the steps in the context of my story:

  1. My parents and my sister both provided me with answers. Keeping them both in mind, I chose to go with an answer of window.
  2. I compared my answer to my teacher’s correct answer.
  3. Window was completely wrong. The answer was 2.
  4. I looked back to my sources to figure out where I went wrong. My parents are clearly smarter than my sister.
  5. I decided to start trusting my parents more than my sister from now on.
  6. I use my new methodology to figure out the answer to 2+2.

Here’s the rundown in machine learning terms:

  1. Perform a feedforward operation.
  2. Compare the model’s output with the desired output.
  3. Calculate the error with the error function.
  4. Run the feedforward operation backwards.
  5. Update the weights.
  6. Rinse and repeat.

The feedforward operation is the process of generating an output from the given inputs in a neural network.

I won’t go in-depth on this, but I’ve made a quick sequence to follow.

Receive inputs: If this is the first hidden layer, the inputs will come directly from the data. Otherwise, the inputs will be the outputs generated from the previous layer.

Calculate prediction: The prediction depends on the weight for each input and the bias. The bias can be considered its own weight for an input of 1. The formula for the prediction:

Apply sigmoid function: This just turns the prediction into a number between 0 and 1.

Generate output: This output will be sent to the nodes in the next layer. This sequence will repeat in every node of every layer until we reach our final prediction.

Note: the bias can also be represented as the weights of a unitary input. That means we’ll add an input of 1 in every hidden layer and the input layer. We’ll use this in future notation.

We can also write this in matrix form. The output for the first node in the hidden layer will be calculated as follows:

Note that the weights in that equation are only those from the input layer, so they should have a superscript of (1). If we imagine this in the grand scheme of the feedforward operation, then we need to consider the different collections of weights in different layers. So it might be more accurate to say that the output for that first node would be this:

The final prediction will be calculated as follows:

If it helps, read it from back to front. First, we calculate the output for each of the nodes in the hidden layer. Then, we calculate the final prediction for our model.

Now that we have our final prediction, we can compare it with the desired output — that is, the correct answer. It’ll be wrong, but it will give us insights as to how we’d want the model’s boundary to move.

  • If the point is misclassified, the boundary should move closer to it.
  • If the point is classified correctly, the boundary should move farther away.

On the left, the red point is classified incorrectly as a green point, so the boundary should move closer to minimize the error. On the right, the red point is classified correctly, so the boundary should move farther away.

Left: The red point is classified incorrectly as a green point, so the boundary should move closer to minimize the error | Right: The red point is classified correctly, so the boundary should move farther away.

The next few steps go hand in hand, so there might be some overlap in the explanation.

We have the final prediction from our output layer. We know in which direction we want to push the boundary. We just need to know how much the boundary should move.

We’ll start by considering only one answer as “correct” and calculating its error with the error function. Remember the equation for the error function?

The negative of the gradient of this error function will determine the amount by which the line will have to move, either closer to or farther from the point. It will be formed by the partial derivatives of the cost function with respect to every weight.

From there, we can give our suggestions for updating the weights on the hidden layer’s nodes. Remember that each node’s input is the sigmoid function of all the outputs from the previous layer.

To update the weights, we subtract the product of the learning rate α with the negative of the gradient from the current weight value. It looks like this:

Well, that’s how the process would work if we only had one output value to consider.

What actually happens, though, is that each output from the final layer will tell each node in the hidden layer to change its weights in order to minimize its error. Each node in the hidden layer will consider all of these suggestions from the outputs for changing the weights and average them. This is the negative gradient of the error function for each weight in each layer. The equation is written like this:

This process will repeat for each layer, updating each of the weights. It’s like a feedforward operation, but backwards. We’ll repeat this until we’ve adjusted all the weights from the hidden layers to the input layer. Here’s the equation:

And now we do it again! We’ve updated the weights, but not by a lot. We’ll have to keep running the operation, checking our prediction, and updating our weights until we’ve minimized the error of our function. Then, we’re done. Woohoo!

Maybe you’re wondering what the “rinse” is. Well, there are two interpretations:

  1. Cry tears of joy for having finally gotten through this.
  2. Cry tears of disappointment for still being confused.

If you didn’t get it all the first time, don’t worry! Reread the steps and watch other videos and articles I’ve linked down below. You’ve got this 🙂

My take on “Machine Learning.” Yes, I am aware that computers can draw better than me.

Intro to Deep Learning with PyTorch | Udacity Free Courses

What is backpropagation really doing? | Deep learning, chapter 3

The Maths behind Back Propagation


Source link

Write a comment