## Machine Learning Tutorial Python – 8: Logistic Regression (Binary Classification)

[ad_1]

Logistic regression is used for classification problems in machine learning. This tutorial will show you how to use sklearn logisticregression class to solve binary classification problem to predict if a customer would buy a life insurance. At the end we have an interesting exercise for you to solve.

Usually there are two types of machine learning problems (1) Linear regression where prediction value is continuous (2) Classification where predicted value is categorical. Logistic regression is used for classification problems mainly.

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #LogisticRegression

Code: https://github.com/codebasics/py/blob/master/ML/7_logistic_reg/7_logistic_regression.ipynb

Exercise: Open above notebook from github and go to the end.

Exercise solution: https://github.com/codebasics/py/blob/master/ML/7_logistic_reg/Exercise/7_logistic_regression_exercise.ipynb

Topics that are covered in this Video:

0:01 – Theory (Explain difference between logic regression and classification)

1:18 – What is logistic regression?

1:26 – Classification types (Binary vs multiclass classification)

1:53 – Explanation of logistic regression using the example of if person will buy insurance based on his age

5:38 – Sigmoid or Logit function

8:18 – Coding (for coding we are using an example of if a person will buy insurance or not based on his age)

14:36 – sklearn predict_proba() function

15:49 – Exercise (Solve a problem of predicting employee retention based on salary, distance to work, promotion, department etc)

Next Video:

Machine Learning Tutorial Python – 8 Logistic Regression (Multiclass Classification): https://www.youtube.com/watch?v=J5bXOOmkopc&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=9

Populor Playlist:

Data Science Full Course: https://www.youtube.com/playlist?list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV

Data Science Project: https://www.youtube.com/watch?v=rdfbcdP75KI&list=PLeo1K3hjS3uu7clOTtwsp94PcHbzqpAdg

Machine learning tutorials: https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw

Pandas: https://www.youtube.com/watch?v=CmorAWRsCAw&list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy

matplotlib: https://www.youtube.com/watch?v=qqwf4Vuj8oM&list=PLeo1K3hjS3uu4Lr8_kro2AqaO6CFYgKOl

Python: https://www.youtube.com/watch?v=eykoKxsYtow&list=PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0&index=1

Jupyter Notebook: https://www.youtube.com/watch?v=q_BzsPxwLOE&list=PLeo1K3hjS3uuZPwzACannnFSn9qHn8to8

To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website: http://codebasicshub.com/

Facebook: https://www.facebook.com/codebasicshub

Twitter: https://twitter.com/codebasicshub

Source

[ad_2]

Step by step roadmap to learn data science in 6 months: https://www.youtube.com/watch?v=H4YcqULY1-Q

How to learn coding for beginners | Learn coding for free: https://www.youtube.com/watch?v=CptrlyD0LJ8

Syntax error in Importing Model Selection.

Amazing lectures.

wrote a blog explaining Sigmoid function, interested people can view it here –

https://delightfuldata.blogspot.com/2020/10/sigmoid-function-simplified.html

Thank you, now if you have many features in your binary classification problem, and the classes overlap from visualizing data using pca, is it advisable to use Logistic Regression in this case?

This video had good information, it was really helpful. I am still a learner, new to this field. I understand how to write and basics of confusion matrix using binary classification. But some terminologies are confusing. Can you please explain what exactly are base rate, test incidence, conditional incidence, classification incidence? That would be appreciated.

This video had good information, it was really helpful. I am still a learner, new to this field. I understand how to write and basics of confusion matrix using binary classification. But some terminologies are confusing. Can you please explain what exactly are base rate, test incidence, conditional incidence, classification incidence? That would be appreciated.

sir i have copied your code and run it in jupyter notebook . that model is giving 0.666 score. why?? i didnt change anything.

Can someone help me, from where to get the dataset which is used in this video.

What is the ideal way of handling , if we have multiple features/variables

How to deal with multidimensional array in logistics regression model

Terimakasih…

Untuk teman" lainya saya juga ada nih video tutorial cara koding SVR, SVM, LR, RF, GB, dan XGB di python siapa tau cocok.

https://youtu.be/m8ZUxSyPwxc

please do the logistic regression from scratch like you did for linear regression …

thanks for explaining it so well,could you please confirm if we can use decision tree classifier instead of logistic regression for the hr analytics data set

Sir, what if model predicts wrong , is there loss function and gradient descent also used in it just like in neural network

Thanks, sir .. your explanation is really clear and so easy to understand 👍🏼

Sir but how can we do without the I built function

Sir but thinking as real world problem does the person will buy the insurance at the age of 80?

Can We use KNN Classifier in this Case?

nice explanation ever sir .

Not able to see your GitHub code . Could u please reload it

Why sir why. Why your teaching method is too good.

why logistic regression is called linear model sir why ? we are putting the formula of linear regression into sigmoid whose output is a curve so why it is called linear model?

Was going through the solution of the Exercise u gave..

didn't get df.left and df.retain operations

Would u please explain?

Sir,

in some cases you are using df.age and in some other [['df.age]](as 2d array). How to know which one to use? what is the significance behind this?

Thank you.

great work sir , it explains a lot

awesome explanation….really

When you use df.groupby('left').mean() to check what to drop and what to keep. Is it worth doing a t-test here for a more rigours method for keeping variables ?

Freaking Awesome wish you were my mentor Sir !!!

Thank You Sir, I have learned a lot from your vids :). I was really perplexed by Logistic Regression and I am glad

Youtube recommended this to me 🙂

sir i got 0.7677777777777778 % of accuracy

Got the Accuracy rate 0f 0.792 with multiple test_size settings.

I got 82.23 after binning one variable

I appreciate your work. It is very helpful. I have one comment about running Logistic regression. If you run the same model with StatsModels package, you will get different coefficients because the default setup of sklearn.linear_model.LogisticRegression is regularization of L2 penalty with lambda = 1. In order to get the same results, we can use LogisticRegression(solver='newton-cg', C= 1e09). This is addressed here https://ryxcommar.com/2019/08/30/scikit-learns-defaults-are-wrong/. I am little disappointed at sklearn package for both linear regression and logistic regression. It lacks of model diagnostics for linear regression and this default regularization setup doesn't make sense. I think most user won't check the regularization when they use it.

Sir, why didn't you drop one of the salary column after categorizing it using dummy variable in the solution ?

I got 78.833 accuracy value. In this exercise i had known lot of things thank you bro.

who else did not understand the assignment apart from me?

Awesome explanation. I like this practical math and algorithmic explanation.

Couldn't solve the 2nd and 3rd Question. Basically, couldn't plot the bar-chart as it kept saying that there is an error as 'NoneType' data has been encurred. Also, I must mention that i used label encoding to convert the salaries into 1,2,3 instead of low medium high. pls help, someone.

When I see correlation between each features and its <0.15, does it mean there is no Multicollinearity in independent features.

Is there any way which can be used to find what features are important and which are less(can be ignored) while modelling?

On first attempt, i considered 'left' as dependent variable and everything else including salary and department as independent variable, got 77% score of accuracy. Thanks for the wonderful video.

Hello sir I took the data from kaggle the data is Titanic so I trained this (X_train,y_train) this data I got score (X_train,y_train score is 78%) but my question is that when I passed my x_test value because it should match the value with my y_test right but when I predicted the value sometimes I get 0 answer 1 ..is not predicting proper

Hi Sir,

Great tutorials. I am trying to access the Excel sheet where the Age and Insurance Product are available. Pls provide the link to download the sheet.

KM

0.7757142857142857…. this is coming my model score

sir,can you explain column transfer

I am getting error while uploading data file. please help