Introduction to Naive Bayes Classifier | by Priyanka Meena | Nov, 2020 | Medium
Naive Bayes is a time period that’s collectively used for classification algorithms which can be based mostly on Bayes Theorem. For uninitiated, classification algorithms are these algorithms which can be used to categorize a brand new commentary into predefined lessons. For instance, let’s assume that you’re working as a knowledge analyst with a serious financial institution in London and you want to predict based mostly on historic information, if a buyer will fraud on a financial institution mortgage or not.
You should be questioning there is “Bayes” within the title as a result of the algorithm is predicated on Bayes Theorem however why “Naive”? Is it as a result of the algorithm is “Naive” or “Dumb”? No !! the algorithm will not be “dumb” however in reality it really works higher than some very sophisticated algorithms at instances. The algorithm is “Naive” as a result of it really works on the final assumption that the presence of a specific characteristic in a category is impartial or completely unrelated to the presence of another characteristic in the identical class. For instance, a shopper can default on a financial institution mortgage, if he/she has a low credit score rating, low applicant_income and many others. Both these options independently contribute to the chance that candidate will default, that’s, presence of 1 characteristic will not be associated to one other.
Don’t be disheartened if a number of the phrases sound alien to you. The goal of this collection of articles is to clarify machine learning algorithms within the easiest doable method. So that by the top of this collection, it is possible for you to to construct your individual machine learning fashions with nice ease. So let’s proceed with this text on Naive Bayes Classifier.
Bayes Theorem !! What precisely is Bayes Theorem?
Bayes Theorem is a highly regarded mathematical formulation that’s used to decide the conditional chance of an occasion, based mostly on prior information of situations that is perhaps associated to the occasion.
Wait! What precisely is Conditional chance?
It is the probability of an consequence occurring,on condition that one other occasion has already occured. For instance, two playing cards are drawn with out substitute from a deck of 52 playing cards. What is the chance that the second card is an ace on condition that the primary card drawn was additionally ace?
So, P(drawing the primary ace) = Total no. of aces / Total no. of playing cards = 4/52
P(drawing the second ace) = 3/51 (it is because after the primary draw we’re left with solely Three aces within the deck and the overall variety of playing cards additionally reduces to 51). So, that is what’s conditional chance all about. The second occasion is depend upon the incidence of first one.
Coming again to Bayes Theorem. It is mathematically given by the next formulation :
Now that now we have an honest understanding of what Bayes Theorem is. Let’s go forward and perceive its use in classification issues.
NAIVE BAYES INTUITION
Problem : Try to predict if a candidate with credit score rating of 180 will default on a mortgage or not. Consider the next frequency desk for calculating the probability of default.
We can observe the next steps to calculate the chances. Since the candidate has credit score rating of 180, let’s predict the label for (100–200).
Step 1: Calculate the prior chance for every class (Yes and No)
P(Yes) = 11/29 = 0.379
P(100–200) = 10/29 = 0.345
Step 2: Find Likelihood chance with every attribute for every class
P(100–200 | Yes) = 6/11 = 0.545
Step 3: Calculate posterior chance utilizing Bayes Theorem
P(Yes|100–200) = P(100–200|Yes) * P(Yes)/P(100–200)
= 0.545*0.379/0.345 = 0.5989 =0.599
Step 4: Make prediction
Since, the chance of the candidate defaulting is greater than 50%(it’s the assumed significance degree and it may well differ based mostly on the use case), we will say that the candidate will default.
END TO END NAIVE BAYES CLASSIFIER
Having realized how a naive bayes classifier works, let’s attempt to construct a classification mannequin based mostly on it utilizing sklearn. Sklearn or scikit-learn is an open supply machine learning library written in python.
For the aim of this text, we will likely be utilizing social_network_ads dataset. In this downside, we are going to attempt to predict whether or not a consumer have bought a product by clicking on the commercials proven to him/her on social media, based mostly on age and estimated wage. So let’s get began.
Step 1 : Import primary libraries
Step 2: Import Data
Step 3: Data Preprocessing
In the info preprocessing step, now we have first cut up the dataset into coaching(80%) and testing units(20%). Next, now we have achieved some primary characteristic scaling utilizing customary scalar. Standard scalar transforms the info in such a method that it has a imply worth of Zero and customary deviation of 1.
Step 4 : Model Training
Step 5: Model Testing and Evaluation
A confusion matrix offers us an concept of how good is our mannequin. It describes the efficiency of a classification mannequin on a set of take a look at information for which the
true values are identified.Each row in a confusion matrix represents an precise class, whereas every column represents a predicted class.
Step 6 : Visualization
This step will not be vital. It has been put up simply to give an concept of how information factors have been categorized by the mannequin.
In this submit, now we have a realized the arithmetic behind Naive Bayes classifier and constructed a mannequin utilizing sklearn. In the approaching articles, we will likely be studying about another classification algorithms as part of this collection similar to logistic regression, resolution bushes and many others. So keep tuned.