Polynomial Regression β€” Gradient Descent from Scratch | by Mark Garvey | Jan, 2021


We can start be choosing coefficients for a second degree polynomial equation (π‘Žπ‘₯Β²+𝑏π‘₯+𝑐) that will distribute the data we will try to model:

These will be the coefficients for our base/ground truth model we hope to get our predictive model as close as possible to. Next, we need an evaluation function for a second degree polynomial, which, given a set of coefficients and a given input π‘₯ returns the corresponding 𝑦:

We can see it in action when x=3:


Creating the data and base model

Define some x data (inputs) we hope to predict y (outputs) of:

Curve for polynomial 2xΒ² -5x + 4 = 0 β€” Author

This is good, but we could improve on this by making things more realistic. You can add noise or β€˜jitter’ to the values so they can resemble real-world data:

Test it out:

Should get value in the range 3 - 11

This updated function will take in the inputs for the second-order polynomial and a jitter value j to add noise to this input, to give us a more realistic output than just a perfect curve:

Original data with jitter β€” Author

When we build our predictive model, and optimize it with gradient descent, hopefully we will get as close to these values as possible.

The first pass at modelling involves generating and storing random coefficients for a second degree polynomial (𝑦=π‘Žπ‘₯Β² +𝑏π‘₯+𝑐). This will be our initial model which will most likely not be that accurate, which we will aim to improve upon until it fits the data well enough.

(7, 6, 3)

Inspect this model’s accuracy by calculating the predicted output values from your input values:

Original data vs first random prediction β€” Author

It is evident from the above plot that this new model with random coefficients does not fit our data all that well. To get a quantifiable measure of how incorrect it is, we calculate the Mean Squared Error loss for the model. This is the mean value of the sum of the squared differences between the actual and predicted outputs:


Quite a large number. Let’s now see if we can improve on this fairly high loss metric by optimizing the model with gradient descent.

Gradient Descent and Loss Reduction

We wish to improve our model. Therefore we want to alter its coefficients a, b and c to decrease the error. Therefore we require knowledge about how each coefficient affects the error. This is achieved by calculating the partial derivative of the loss function with respect to each of the individual coefficients.

In this case, we are using MSE as our loss function β€” this is the function we wish to calculate partial derivatives for:

With output predictions for our model as:

Loss can therefore be reformulated as:

In this specific case, our partial derivatives for that loss function are the following:

  • If you calculate the value of each derivative, you will obtain the gradient for each coefficient.
  • These are the values that give you the slope of the loss function with regards to each specific coefficient.
  • They indicate whether you should increase or decrease it to reduce the loss, and also by how much it should be safe to do so.

Given coefficients π‘Ž, 𝑏 and 𝑐, calculated gradients π‘”π‘Ž, 𝑔𝑏 and 𝑔𝑐 and a learning rate π‘™π‘Ÿ, typically one would update the coefficients so that their new, updated values are defined as below:

Once you have applied that new model to the data, your loss should have decreased.

Read More …


Write a comment