Neural Networks Demystified [Part 2: Forward Propagation]




[ad_1]

Neural Networks Demystified
@stephencwelch

Supporting Code:
https://github.com/stephencwelch/Neural-Networks-Demystified

In this short series, we will build and train a complete Artificial Neural Network in python. New videos every other friday.

Part 1: Data + Architecture
Part 2: Forward Propagation
Part 3: Gradient Descent
Part 4: Backpropagation
Part 5: Numerical Gradient Checking
Part 6: Training
Part 7: Overfitting, Testing, and Regularization

Source


[ad_2]

Comment List

  • Welch Labs
    November 26, 2020

    Fantastic video, really explains the concepts well!

  • Welch Labs
    November 26, 2020

    for myself 0:59

  • Welch Labs
    November 26, 2020

    Thanks for the educational video. Did you not include bias terms for some reason?

  • Welch Labs
    November 26, 2020

    Back prop updates weights, does forward prop also update weights?

  • Welch Labs
    November 26, 2020

    Amazing videos, thank you for putting them up.
    One question, however, is why don't you include biases in the calculations?

  • Welch Labs
    November 26, 2020

    Hey this is great explanation thanks a lot

  • Welch Labs
    November 26, 2020

    musics are so annoying as study material background.

  • Welch Labs
    November 26, 2020

    On a technicality, the size of w1 should be 3×2. and it should be transposed before X is multiplied with it. We tend to put the transposed version of weight already, which makes it confusing for beginners.

  • Welch Labs
    November 26, 2020

    Warning: cs231n uses transposed version of W and x and does Wx for forward pass, here we do XW.

    Representing all input examples in one big input matrix x:
    (Usually x is nx1 where n is number of input nodes, and this whole column vector nx1 represents one example in our data set, AND WE WOULD HAVE TO TRANSPOSE IT TURNING IT INTO ROW VECTOR BEFORE MULTIPLYING IT BY WEIGHTS MATRIX)
    Input matrix doesn’t have to be nx1 (or 1xn after transpose), can be nxd where d is number of input nodes/input dimensions we have (hours of sleep, hours of study) d=2
    Each row in input matrix is one “input data set”, (instead of each column being one input data set in the case of nx1 input)
    W11 is weight from node 1 to node 1 in hidden layer, W12 is weight from node 1 to node 2 in hidden layer etc
    For an input example (row in our input matrix) see 1:19 for how it’s multiplied

    When our input matrix is nxd where each of the rows is an example data set/input multidimensional, then our weights matrix each row represents weights from 1 input node to all the nodes in the hidden layer, W becomes (dxh) where h is number of nodes in hidden layer. Where d is the dimension of our input.

    The resulting matrix is nxh where h is number of nodes in the hidden layer. Each row in the resulting matrix is output of the hidden layer for an input data example.

    The formula here is XW (not Wx like regression) this depends on how our input matrix looks. In this case, each row in the input matrix is a data example, so it’s transposed and that’s why it’s XW not Wx.

    Numpy can take in a scalar or a matrix and apply a function to it, it applies the function to each element in the matrix (we don’t have to explicitly loop through the matrix like we would in other languages)

    W^(i) where i represents the layer index. W^(1) is the weight matrix from input nodes to first hidden layer.
    W is always dxh

  • Welch Labs
    November 26, 2020

    if you want to do the plot from his gitgub, use this snippet of code:
    import numpy as np

    import matplotlib.pyplot as plt

    testInput = np.arange(-6,6,0.01)

    plt.plot(testInput, sigmoid(testInput), linewidth= 2)

    plt.grid(1)

    plt.show()

  • Welch Labs
    November 26, 2020

    How do you know you are aupposed to choose 3 hidden layers?

  • Welch Labs
    November 26, 2020

    Damn you made that so so so simple. I love the way you teach.

  • Welch Labs
    November 26, 2020

    This video is bloody fantastic! Youre making the world a better place, my guy.

  • Welch Labs
    November 26, 2020

    Very helpful, but that music really sucks.

  • Welch Labs
    November 26, 2020

    for the exact same code you typed in, I faced a gazillion errors.
    Do explain how the sigmoid function can take more than one argument when you type: self.a2=self.sigmoid( self.z2)

  • Welch Labs
    November 26, 2020

    OMG the music is so distracting

  • Welch Labs
    November 26, 2020

    Supporting Code is broken :'( :'( :'(

  • Welch Labs
    November 26, 2020

    Thank you very much, Stephen. I am sure you have already helped so many people. Good job.

  • Welch Labs
    November 26, 2020

    The bed music shouldn't be as loud as the voice.

  • Welch Labs
    November 26, 2020

    after half of video i didnt understand anything…. the programming, the functions

  • Welch Labs
    November 26, 2020

    Really clear and illustrative. It deserves one like. Thanks for the video.

  • Welch Labs
    November 26, 2020

    Why do the weights have to be randomized? Why not initialize parameters to 0 like during SGD?
    Why no intercept term added?

  • Welch Labs
    November 26, 2020

    Perfect thanks a lot

  • Welch Labs
    November 26, 2020

    Great video! I wish the background music was not as loud. Everything else is great! Thanks!

  • Welch Labs
    November 26, 2020

    "Neural Networks Demystified" Honestly, I'm just more confused.

  • Welch Labs
    November 26, 2020

    4:26 What's the input data for X? And what function that y represents? Am I missing something or did you already write the code for that?

  • Welch Labs
    November 26, 2020

    Finds a good ANN explanation video. Music makes it even harder to follow.

  • Welch Labs
    November 26, 2020

    Great videos, very intuitive explanation of how Neural Networks function.

    If I could provide some criticism, there's a very specific notation in Machine Learning for labeling weights which I noticed was not used here. Was this intentional as not to cause confusion? Also, generally most authors see the first hidden layer's computation as Z^1, meaning Z^1 = XW^1 – not Z^2 as you have stated here.

    Great work regardless!

Write a comment