Machine Learning – Over-& Undersampling – Python/ Scikit/ Scikit-Imblearn




[ad_1]

In this video I will explain you how to use Over- & Undersampling with machine learning using python, scikit and scikit-imblearn.
The concepts shown in this video will show you what Over-and Undersampling is and how to correctly use it even when cross-validating. So let’s go!

_About the channel_____________________
TL;DR
Awesome Data science without much math!

Hello I’m Jo the “Coding Maniac”!
On my channel I will show you how to make awesome things with Data Science. Further I will present you some short Videos covering the basic fundamentals about Machine Learning and Data Science like Feature Tuning, Over/Undersampling, Overfitting, … with Python.

All videos will be simple to follow and I’ll try to reduce the complicated mathematical stuff to a minimum because I believe that you don’t need to know how a CPU works to be able to operate a PC…

GitHub: https://github.com/coding-maniacs

_Equipment _____________________

Camera: http://amzn.to/2hkVs5X
Camera lens: http://amzn.to/2fCEU9z
Audio-Recorder: http://amzn.to/2jNu2KJ
Microphone: http://amzn.to/2hloKBG
Light: http://amzn.to/2w8J92N

_More videos _____________________

More videos in german: https://youtu.be/rtyJyzqeByU, https://youtu.be/1A3JVSQZ4N0
Subscribe “Coding Maniac”: https://www.youtube.com/channel/UCG0TtnkdbMvN5OYQcgNFY1w
More videos on “Coding Maniac”: https://www.youtube.com/channel/UCG0TtnkdbMvN5OYQcgNFY1w

_Social Media_____________________

►Facebook: https://www.facebook.com/codingmaniac/

_____________________

Source


[ad_2]

Comment List

  • Johannes Frey
    December 18, 2020

    i am running the code on jupyter notebook , getting this error have you faced it earlier-'SMOTE' object has no attribute '_validate_data'

  • Johannes Frey
    December 18, 2020

    Can we all help this man to become more popular ? Please share the video guys, it's very very useful clear and awesome.

  • Johannes Frey
    December 18, 2020

    These videos are great! Please make more!!

  • Johannes Frey
    December 18, 2020

    Thank you for the great tutorial. SMOTE should be done only on training data to avoid over-fitting. @13:30 The testing data should remain untouched. My question is, how do i get the confusion matrix for each run? I am trying to find other metrics (MCC, G-Score, and FPR) and i need access to each confusion matrix for each run while using SMOTE.

  • Johannes Frey
    December 18, 2020

    hi thank you , i have a problem that I use oversampling for imbalanced dataset (image ), how i can use this for images ?

  • Johannes Frey
    December 18, 2020

    great work bro! but how is an imbalanced dataset handled in Deep Learning?

  • Johannes Frey
    December 18, 2020

    Thanks for the wonderful explanation. Could yu please share the code you shown in the screen in github

  • Johannes Frey
    December 18, 2020

    Also, for running GridSearchCV/RandomSearchCV, sklearn automatically skips over the train/test splitting at each run by sensing the imblearn pipeline. See this post for details https://stackoverflow.com/questions/48370150/how-to-implement-smote-in-cross-validation-and-gridsearchcv?rq=1

  • Johannes Frey
    December 18, 2020

    Thanks. Very clear and concise. That last part about modifying the pipeline for cross-validating really helped me out. 🙂

  • Johannes Frey
    December 18, 2020

    I couldn't find the code in the description or Github?

  • Johannes Frey
    December 18, 2020

    nice video, thanks!

  • Johannes Frey
    December 18, 2020

    Thanks for the nice video. The provided github link has no source code. Could you please update that one. Thanks.

  • Johannes Frey
    December 18, 2020

    Great video, your github link is wrong though

  • Johannes Frey
    December 18, 2020

    gr8 video.. with clear explanation, you made the topic very easily understandable. Thank you..!!

  • Johannes Frey
    December 18, 2020

    great video man

  • Johannes Frey
    December 18, 2020

    great tutorial, thank you . I just wanted to know whether you can use these oversampling and under-sampling techniques for linear regression as well

  • Johannes Frey
    December 18, 2020

    Can someone please share the link to the code?

  • Johannes Frey
    December 18, 2020

    Great video !! Thank You so much !! Now i understand ….over- & Under sampling.

  • Johannes Frey
    December 18, 2020

    Great video. Thank you very much!

  • Johannes Frey
    December 18, 2020

    Hi! Thank you soooo much for your video! But now I am doing a document classification task and I have 3 categories to classify. In my training data, I have about 20,000 data for the category 0, 20,000 data for the category 1, and about 100,000 data for the category 3. Is my training data imbalanced? Do I need to run the algorithm in this video?

  • Johannes Frey
    December 18, 2020

    What about stratified random sampling? Wouldn't that be better because what if your kfolds are very unbalanced?

  • Johannes Frey
    December 18, 2020

    Awsome video! Actually now I get it! What I didn't understand in the beginning was why was it overfitting? Aren't we using the training data? Didn't we just split it from the testing data? But just like you said, we need to remember how does the cross validation score works. If our cross validation is going to train our model five times and test it five times by splitting our training data (lets say 80% training and 20% testing), and we perform SMOTE before cross validating (fitting the "whole" training set) then there will be what we call "data leakage" when performing cross validation. I just want to make sure this is the right direction for thinking about this issue. Again, thanks for sharing this and keep up the good work.

  • Johannes Frey
    December 18, 2020

    handsome guy.

  • Johannes Frey
    December 18, 2020

    Your explanation help me.
    Also if you have any explanation for feature extraction from documents using Doc2Vev

  • Johannes Frey
    December 18, 2020

    where is the code link?

  • Johannes Frey
    December 18, 2020

    Very good video (production, content, delivery, etc..), thanks for putting this together! I'll be watching more from your channel.

  • Johannes Frey
    December 18, 2020

    can smote be applied to images?

  • Johannes Frey
    December 18, 2020

    niceeee~ tks for sharing !

  • Johannes Frey
    December 18, 2020

    Thanks for the tip on cross validation while undersampling. It helped my model boost its recall.

Write a comment