## Neural Networks Demystified [Part 2: Forward Propagation]

[ad_1]

Neural Networks Demystified

@stephencwelch

Supporting Code:

https://github.com/stephencwelch/Neural-Networks-Demystified

In this short series, we will build and train a complete Artificial Neural Network in python. New videos every other friday.

Part 1: Data + Architecture

Part 2: Forward Propagation

Part 3: Gradient Descent

Part 4: Backpropagation

Part 5: Numerical Gradient Checking

Part 6: Training

Part 7: Overfitting, Testing, and Regularization

Source

[ad_2]

Fantastic video, really explains the concepts well!

for myself 0:59

Thanks for the educational video. Did you not include bias terms for some reason?

Back prop updates weights, does forward prop also update weights?

Amazing videos, thank you for putting them up.

One question, however, is why don't you include biases in the calculations?

Hey this is great explanation thanks a lot

musics are so annoying as study material background.

On a technicality, the size of w1 should be 3×2. and it should be transposed before X is multiplied with it. We tend to put the transposed version of weight already, which makes it confusing for beginners.

Warning: cs231n uses transposed version of W and x and does Wx for forward pass, here we do XW.

Representing all input examples in one big input matrix x:

(Usually x is nx1 where n is number of input nodes, and this whole column vector nx1 represents one example in our data set, AND WE WOULD HAVE TO TRANSPOSE IT TURNING IT INTO ROW VECTOR BEFORE MULTIPLYING IT BY WEIGHTS MATRIX)

Input matrix doesn’t have to be nx1 (or 1xn after transpose), can be nxd where d is number of input nodes/input dimensions we have (hours of sleep, hours of study) d=2

Each row in input matrix is one “input data set”, (instead of each column being one input data set in the case of nx1 input)

W11 is weight from node 1 to node 1 in hidden layer, W12 is weight from node 1 to node 2 in hidden layer etc

For an input example (row in our input matrix) see 1:19 for how it’s multiplied

When our input matrix is nxd where each of the rows is an example data set/input multidimensional, then our weights matrix each row represents weights from 1 input node to all the nodes in the hidden layer, W becomes (dxh) where h is number of nodes in hidden layer. Where d is the dimension of our input.

The resulting matrix is nxh where h is number of nodes in the hidden layer. Each row in the resulting matrix is output of the hidden layer for an input data example.

The formula here is XW (not Wx like regression) this depends on how our input matrix looks. In this case, each row in the input matrix is a data example, so it’s transposed and that’s why it’s XW not Wx.

Numpy can take in a scalar or a matrix and apply a function to it, it applies the function to each element in the matrix (we don’t have to explicitly loop through the matrix like we would in other languages)

W^(i) where i represents the layer index. W^(1) is the weight matrix from input nodes to first hidden layer.

W is always dxh

if you want to do the plot from his gitgub, use this snippet of code:

import numpy as np

import matplotlib.pyplot as plt

testInput = np.arange(-6,6,0.01)

plt.plot(testInput, sigmoid(testInput), linewidth= 2)

plt.grid(1)

plt.show()

How do you know you are aupposed to choose 3 hidden layers?

Damn you made that so so so simple. I love the way you teach.

This video is bloody fantastic! Youre making the world a better place, my guy.

Very helpful, but that music really sucks.

for the exact same code you typed in, I faced a gazillion errors.

Do explain how the sigmoid function can take more than one argument when you type: self.a2=self.sigmoid( self.z2)

OMG the music is so distracting

Supporting Code is broken :'( :'( :'(

Thank you very much, Stephen. I am sure you have already helped so many people. Good job.

The bed music shouldn't be as loud as the voice.

after half of video i didnt understand anything…. the programming, the functions

Really clear and illustrative. It deserves one like. Thanks for the video.

Why do the weights have to be randomized? Why not initialize parameters to 0 like during SGD?

Why no intercept term added?

Perfect thanks a lot

Great video! I wish the background music was not as loud. Everything else is great! Thanks!

"Neural Networks Demystified" Honestly, I'm just more confused.

4:26 What's the input data for X? And what function that y represents? Am I missing something or did you already write the code for that?

Finds a good ANN explanation video. Music makes it even harder to follow.

Great videos, very intuitive explanation of how Neural Networks function.

If I could provide some criticism, there's a very specific notation in Machine Learning for labeling weights which I noticed was not used here. Was this intentional as not to cause confusion? Also, generally most authors see the first hidden layer's computation as Z^1, meaning Z^1 = XW^1 – not Z^2 as you have stated here.

Great work regardless!