## Neural Networks Demystified [Part 2: Forward Propagation]

Neural Networks Demystified
@stephencwelch

Supporting Code:
https://github.com/stephencwelch/Neural-Networks-Demystified

In this short series, we will build and train a complete Artificial Neural Network in python. New videos every other friday.

Part 1: Data + Architecture
Part 2: Forward Propagation
Part 4: Backpropagation
Part 6: Training
Part 7: Overfitting, Testing, and Regularization

Source

### Comment List

• Welch Labs
November 26, 2020

Fantastic video, really explains the concepts well!

• Welch Labs
November 26, 2020

for myself 0:59

• Welch Labs
November 26, 2020

Thanks for the educational video. Did you not include bias terms for some reason?

• Welch Labs
November 26, 2020

Back prop updates weights, does forward prop also update weights?

• Welch Labs
November 26, 2020

Amazing videos, thank you for putting them up.
One question, however, is why don't you include biases in the calculations?

• Welch Labs
November 26, 2020

Hey this is great explanation thanks a lot

• Welch Labs
November 26, 2020

musics are so annoying as study material background.

• Welch Labs
November 26, 2020

On a technicality, the size of w1 should be 3×2. and it should be transposed before X is multiplied with it. We tend to put the transposed version of weight already, which makes it confusing for beginners.

• Welch Labs
November 26, 2020

Warning: cs231n uses transposed version of W and x and does Wx for forward pass, here we do XW.

Representing all input examples in one big input matrix x:
(Usually x is nx1 where n is number of input nodes, and this whole column vector nx1 represents one example in our data set, AND WE WOULD HAVE TO TRANSPOSE IT TURNING IT INTO ROW VECTOR BEFORE MULTIPLYING IT BY WEIGHTS MATRIX)
Input matrix doesn’t have to be nx1 (or 1xn after transpose), can be nxd where d is number of input nodes/input dimensions we have (hours of sleep, hours of study) d=2
Each row in input matrix is one “input data set”, (instead of each column being one input data set in the case of nx1 input)
W11 is weight from node 1 to node 1 in hidden layer, W12 is weight from node 1 to node 2 in hidden layer etc
For an input example (row in our input matrix) see 1:19 for how it’s multiplied

When our input matrix is nxd where each of the rows is an example data set/input multidimensional, then our weights matrix each row represents weights from 1 input node to all the nodes in the hidden layer, W becomes (dxh) where h is number of nodes in hidden layer. Where d is the dimension of our input.

The resulting matrix is nxh where h is number of nodes in the hidden layer. Each row in the resulting matrix is output of the hidden layer for an input data example.

The formula here is XW (not Wx like regression) this depends on how our input matrix looks. In this case, each row in the input matrix is a data example, so it’s transposed and that’s why it’s XW not Wx.

Numpy can take in a scalar or a matrix and apply a function to it, it applies the function to each element in the matrix (we don’t have to explicitly loop through the matrix like we would in other languages)

W^(i) where i represents the layer index. W^(1) is the weight matrix from input nodes to first hidden layer.
W is always dxh

• Welch Labs
November 26, 2020

if you want to do the plot from his gitgub, use this snippet of code:
import numpy as np

import matplotlib.pyplot as plt

testInput = np.arange(-6,6,0.01)

plt.plot(testInput, sigmoid(testInput), linewidth= 2)

plt.grid(1)

plt.show()

• Welch Labs
November 26, 2020

How do you know you are aupposed to choose 3 hidden layers?

• Welch Labs
November 26, 2020

Damn you made that so so so simple. I love the way you teach.

• Welch Labs
November 26, 2020

This video is bloody fantastic! Youre making the world a better place, my guy.

• Welch Labs
November 26, 2020

Very helpful, but that music really sucks.

• Welch Labs
November 26, 2020

for the exact same code you typed in, I faced a gazillion errors.
Do explain how the sigmoid function can take more than one argument when you type: self.a2=self.sigmoid( self.z2)

• Welch Labs
November 26, 2020

OMG the music is so distracting

• Welch Labs
November 26, 2020

Supporting Code is broken :'( :'( :'(

• Welch Labs
November 26, 2020

Thank you very much, Stephen. I am sure you have already helped so many people. Good job.

• Welch Labs
November 26, 2020

The bed music shouldn't be as loud as the voice.

• Welch Labs
November 26, 2020

after half of video i didnt understand anything…. the programming, the functions

• Welch Labs
November 26, 2020

Really clear and illustrative. It deserves one like. Thanks for the video.

• Welch Labs
November 26, 2020

Why do the weights have to be randomized? Why not initialize parameters to 0 like during SGD?

• Welch Labs
November 26, 2020

Perfect thanks a lot

• Welch Labs
November 26, 2020

Great video! I wish the background music was not as loud. Everything else is great! Thanks!

• Welch Labs
November 26, 2020

"Neural Networks Demystified" Honestly, I'm just more confused.

• Welch Labs
November 26, 2020

4:26 What's the input data for X? And what function that y represents? Am I missing something or did you already write the code for that?

• Welch Labs
November 26, 2020

Finds a good ANN explanation video. Music makes it even harder to follow.

• Welch Labs
November 26, 2020

Great videos, very intuitive explanation of how Neural Networks function.

If I could provide some criticism, there's a very specific notation in Machine Learning for labeling weights which I noticed was not used here. Was this intentional as not to cause confusion? Also, generally most authors see the first hidden layer's computation as Z^1, meaning Z^1 = XW^1 – not Z^2 as you have stated here.

Great work regardless!