Machine Learning Tutorial Python – 8: Logistic Regression (Binary Classification)




[ad_1]

Logistic regression is used for classification problems in machine learning. This tutorial will show you how to use sklearn logisticregression class to solve binary classification problem to predict if a customer would buy a life insurance. At the end we have an interesting exercise for you to solve.
Usually there are two types of machine learning problems (1) Linear regression where prediction value is continuous (2) Classification where predicted value is categorical. Logistic regression is used for classification problems mainly.

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #LogisticRegression

Code: https://github.com/codebasics/py/blob/master/ML/7_logistic_reg/7_logistic_regression.ipynb
Exercise: Open above notebook from github and go to the end.
Exercise solution: https://github.com/codebasics/py/blob/master/ML/7_logistic_reg/Exercise/7_logistic_regression_exercise.ipynb

Topics that are covered in this Video:
0:01 – Theory (Explain difference between logic regression and classification)
1:18 – What is logistic regression?
1:26 – Classification types (Binary vs multiclass classification)
1:53 – Explanation of logistic regression using the example of if person will buy insurance based on his age
5:38 – Sigmoid or Logit function
8:18 – Coding (for coding we are using an example of if a person will buy insurance or not based on his age)
14:36 – sklearn predict_proba() function
15:49 – Exercise (Solve a problem of predicting employee retention based on salary, distance to work, promotion, department etc)

Next Video:
Machine Learning Tutorial Python – 8 Logistic Regression (Multiclass Classification): https://www.youtube.com/watch?v=J5bXOOmkopc&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=9

Populor Playlist:
Data Science Full Course: https://www.youtube.com/playlist?list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV

Data Science Project: https://www.youtube.com/watch?v=rdfbcdP75KI&list=PLeo1K3hjS3uu7clOTtwsp94PcHbzqpAdg

Machine learning tutorials: https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw

Pandas: https://www.youtube.com/watch?v=CmorAWRsCAw&list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy

matplotlib: https://www.youtube.com/watch?v=qqwf4Vuj8oM&list=PLeo1K3hjS3uu4Lr8_kro2AqaO6CFYgKOl

Python: https://www.youtube.com/watch?v=eykoKxsYtow&list=PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0&index=1

Jupyter Notebook: https://www.youtube.com/watch?v=q_BzsPxwLOE&list=PLeo1K3hjS3uuZPwzACannnFSn9qHn8to8

To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website: http://codebasicshub.com/
Facebook: https://www.facebook.com/codebasicshub
Twitter: https://twitter.com/codebasicshub

Source


[ad_2]

Comment List

  • codebasics
    November 15, 2020

    Step by step roadmap to learn data science in 6 months: https://www.youtube.com/watch?v=H4YcqULY1-Q
    How to learn coding for beginners | Learn coding for free: https://www.youtube.com/watch?v=CptrlyD0LJ8

  • codebasics
    November 15, 2020

    Syntax error in Importing Model Selection.

  • codebasics
    November 15, 2020

    Amazing lectures.

  • codebasics
    November 15, 2020

    wrote a blog explaining Sigmoid function, interested people can view it here –
    https://delightfuldata.blogspot.com/2020/10/sigmoid-function-simplified.html

  • codebasics
    November 15, 2020

    Thank you, now if you have many features in your binary classification problem, and the classes overlap from visualizing data using pca, is it advisable to use Logistic Regression in this case?

  • codebasics
    November 15, 2020

    This video had good information, it was really helpful. I am still a learner, new to this field. I understand how to write and basics of confusion matrix using binary classification. But some terminologies are confusing. Can you please explain what exactly are base rate, test incidence, conditional incidence, classification incidence? That would be appreciated.

  • codebasics
    November 15, 2020

    This video had good information, it was really helpful. I am still a learner, new to this field. I understand how to write and basics of confusion matrix using binary classification. But some terminologies are confusing. Can you please explain what exactly are base rate, test incidence, conditional incidence, classification incidence? That would be appreciated.

  • codebasics
    November 15, 2020

    sir i have copied your code and run it in jupyter notebook . that model is giving 0.666 score. why?? i didnt change anything.

  • codebasics
    November 15, 2020

    Can someone help me, from where to get the dataset which is used in this video.

  • codebasics
    November 15, 2020

    What is the ideal way of handling , if we have multiple features/variables

  • codebasics
    November 15, 2020

    How to deal with multidimensional array in logistics regression model

  • codebasics
    November 15, 2020

    Terimakasih…
    Untuk teman" lainya saya juga ada nih video tutorial cara koding SVR, SVM, LR, RF, GB, dan XGB di python siapa tau cocok.
    https://youtu.be/m8ZUxSyPwxc

  • codebasics
    November 15, 2020

    please do the logistic regression from scratch like you did for linear regression …

  • codebasics
    November 15, 2020

    thanks for explaining it so well,could you please confirm if we can use decision tree classifier instead of logistic regression for the hr analytics data set

  • codebasics
    November 15, 2020

    Sir, what if model predicts wrong , is there loss function and gradient descent also used in it just like in neural network

  • codebasics
    November 15, 2020

    Thanks, sir .. your explanation is really clear and so easy to understand 👍🏼

  • codebasics
    November 15, 2020

    Sir but how can we do without the I built function

  • codebasics
    November 15, 2020

    Sir but thinking as real world problem does the person will buy the insurance at the age of 80?

  • codebasics
    November 15, 2020

    Can We use KNN Classifier in this Case?

  • codebasics
    November 15, 2020

    nice explanation ever sir .

  • codebasics
    November 15, 2020

    Not able to see your GitHub code . Could u please reload it

  • codebasics
    November 15, 2020

    Why sir why. Why your teaching method is too good.

  • codebasics
    November 15, 2020

    why logistic regression is called linear model sir why ? we are putting the formula of linear regression into sigmoid whose output is a curve so why it is called linear model?

  • codebasics
    November 15, 2020

    Was going through the solution of the Exercise u gave..
    didn't get df.left and df.retain operations
    Would u please explain?

  • codebasics
    November 15, 2020

    Sir,
    in some cases you are using df.age and in some other [['df.age]](as 2d array). How to know which one to use? what is the significance behind this?

  • codebasics
    November 15, 2020

    Thank you.

  • codebasics
    November 15, 2020

    great work sir , it explains a lot

  • codebasics
    November 15, 2020

    awesome explanation….really

  • codebasics
    November 15, 2020

    When you use df.groupby('left').mean() to check what to drop and what to keep. Is it worth doing a t-test here for a more rigours method for keeping variables ?

  • codebasics
    November 15, 2020

    Freaking Awesome wish you were my mentor Sir !!!

  • codebasics
    November 15, 2020

    Thank You Sir, I have learned a lot from your vids :). I was really perplexed by Logistic Regression and I am glad
    Youtube recommended this to me 🙂

  • codebasics
    November 15, 2020

    sir i got 0.7677777777777778 % of accuracy

  • codebasics
    November 15, 2020

    Got the Accuracy rate 0f 0.792 with multiple test_size settings.

  • codebasics
    November 15, 2020

    I got 82.23 after binning one variable

  • codebasics
    November 15, 2020

    I appreciate your work. It is very helpful. I have one comment about running Logistic regression. If you run the same model with StatsModels package, you will get different coefficients because the default setup of sklearn.linear_model.LogisticRegression is regularization of L2 penalty with lambda = 1. In order to get the same results, we can use LogisticRegression(solver='newton-cg', C= 1e09). This is addressed here https://ryxcommar.com/2019/08/30/scikit-learns-defaults-are-wrong/. I am little disappointed at sklearn package for both linear regression and logistic regression. It lacks of model diagnostics for linear regression and this default regularization setup doesn't make sense. I think most user won't check the regularization when they use it.

  • codebasics
    November 15, 2020

    Sir, why didn't you drop one of the salary column after categorizing it using dummy variable in the solution ?

  • codebasics
    November 15, 2020

    I got 78.833 accuracy value. In this exercise i had known lot of things thank you bro.

  • codebasics
    November 15, 2020

    who else did not understand the assignment apart from me?

  • codebasics
    November 15, 2020

    Awesome explanation. I like this practical math and algorithmic explanation.

  • codebasics
    November 15, 2020

    Couldn't solve the 2nd and 3rd Question. Basically, couldn't plot the bar-chart as it kept saying that there is an error as 'NoneType' data has been encurred. Also, I must mention that i used label encoding to convert the salaries into 1,2,3 instead of low medium high. pls help, someone.

  • codebasics
    November 15, 2020

    When I see correlation between each features and its <0.15, does it mean there is no Multicollinearity in independent features.

    Is there any way which can be used to find what features are important and which are less(can be ignored) while modelling?

  • codebasics
    November 15, 2020

    On first attempt, i considered 'left' as dependent variable and everything else including salary and department as independent variable, got 77% score of accuracy. Thanks for the wonderful video.

  • codebasics
    November 15, 2020

    Hello sir I took the data from kaggle the data is Titanic so I trained this (X_train,y_train) this data I got score (X_train,y_train score is 78%) but my question is that when I passed my x_test value because it should match the value with my y_test right but when I predicted the value sometimes I get 0 answer 1 ..is not predicting proper

  • codebasics
    November 15, 2020

    Hi Sir,

    Great tutorials. I am trying to access the Excel sheet where the Age and Insurance Product are available. Pls provide the link to download the sheet.

    KM

  • codebasics
    November 15, 2020

    0.7757142857142857…. this is coming my model score

  • codebasics
    November 15, 2020

    sir,can you explain column transfer

  • codebasics
    November 15, 2020

    I am getting error while uploading data file. please help

Write a comment