 ## How to Explain Key Machine Learning Algorithms at an Interview

By Terence Shin, Data Scientist | MSc Analytics & MBA pupil. Created by katemangostar.

### Linear Regression

Linear Regression entails discovering a ‘line of best fit’ that represents a dataset utilizing the least squares methodology. The least squares methodology entails discovering a linear equation that minimizes the sum of squared residuals. A residual is equal to the precise minus predicted worth.

To give an instance, the purple line is a greater line of finest match than the inexperienced line as a result of it’s nearer to the factors, and thus, the residuals are smaller. Image created by Author.

### Ridge Regression

Ridge regression, also referred to as L2 Regularization, is a regression method that introduces a small quantity of bias to scale back overfitting. It does this by minimizing the sum of squared residuals plus a penalty, the place the penalty is equal to lambda occasions the slope squared. Lambda refers to the severity of the penalty. Image Created by Author.

Without a penalty, the road of finest match has a steeper slope, which signifies that it’s extra delicate to small modifications in X. By introducing a penalty, the road of finest match turns into much less delicate to small modifications in X. This is the thought behind ridge regression.

### Lasso Regression

Lasso Regression, also referred to as L1 Regularization, is analogous to Ridge regression. The solely distinction is that the penalty is calculated with absolutely the worth of the slope as a substitute. ### Logistic Regression

Logistic Regression is a classification method that additionally finds a ‘line of best fit.’ However, in contrast to linear regression, the place the road of finest match is discovered utilizing least squares, logistic regression finds the road (logistic curve) of finest match utilizing most chance. This is finished as a result of the y worth can solely be one or zero. Check out StatQuest’s video to see how the utmost chances are calculated. Image Created by Author.

### Okay-Nearest Neighbours

Okay-Nearest Neighbours is a classification method the place a brand new pattern is classed by wanting at the closest categorised factors, therefore ‘K-nearest.’ In the instance beneath, if okay=1, then an unclassified level could be categorised as a blue level. Image Created by Author.

If the worth of okay is simply too low, then it may be topic to outliers. However, if it’s too excessive, then it might overlook lessons with just a few samples.

### Naive Bayes

The Naive Bayes Classifier is a classification method impressed by Bayes Theorem, which states the next equation: Because of the naive assumption (therefore the title) that variables are unbiased given the category, we will rewrite P(X|y) as follows: Also, since we’re fixing for y, P(X) is a continuing, which signifies that we will take away it from the equation and introduce a proportionality.

Thus, the chance of every worth of y is calculated because the product of the conditional chance of xn given y.

### Support Vector Machines

Support Vector Machines are a classification method that finds an optimum boundary, known as the hyperplane, which is used to separate totally different lessons. The hyperplane is discovered by maximizing the margin between the lessons. Image Created by Author.

### Decision Trees

A call tree is actually a sequence of conditional statements that decide what path a pattern takes till it reaches the underside. They are intuitive and simple to construct however have a tendency not to be correct. ### Random Forest

Random Forest is an ensemble method, that means that it combines a number of fashions into one to enhance its predictive energy. Specifically, it builds 1000s of smaller resolution timber utilizing bootstrapped datasets and random subsets of variables (also referred to as bagging). With 1000s of smaller resolution timber, random forests use a ‘majority wins’ mannequin to decide the worth of the goal variable. For instance, if we created one resolution tree, the third one, it will predict 0. But if we relied on the mode of all four resolution timber, then the expected worth could be 1. This is the ability of random forests.

AdaBoost is a boosted algorithm that’s related to Random Forests however has a few vital variations:

1. Rather than a forest of timber, AdaBoost usually makes a forest of stumps (a stump is a tree with just one node and two leaves).
2. Each stump’s resolution will not be weighted equally within the ultimate resolution. Stumps with much less complete error (excessive accuracy) can have a better say.
3. The order wherein the stumps are created is necessary, as every subsequent stump emphasizes the significance of the samples that had been incorrectly categorised within the earlier stump.

Gradient Boost is analogous to AdaBoost within the sense that it builds a number of timber the place every tree is constructed off of the earlier tree. Unlike AdaBoost, which builds stumps, Gradient Boost builds timber with often 8 to 32 leaves.

More importantly, Gradient Boost differs from AdaBoost in the way in which that the selections timber are constructed. Gradient Boost begins with an preliminary prediction, often the typical. Then, a choice tree is constructed based mostly on the residuals of the samples. A brand new prediction is made by taking the preliminary prediction + a studying fee occasions the end result of the residual tree, and the method is repeated.

### XGBoost

XGBoost is actually the identical factor as Gradient Boost, however the primary distinction is how the residual timber are constructed. With XGBoost, the residual timber are constructed by calculating similarity scores between leaves and the previous nodes to decide which variables are used because the roots and the nodes.

Original. Reposted with permission.

Related: