Cross Validation in Scikit Learn




[ad_1]

This is the big one. We go over cross validation and other techniques to split your data. VERY IMPORTANT. We talk about cross validated scoring and prediction and then we talk about scikit learn cross validation iterators: K-fold, stratified fold, grouped data, and time series split.

Associated Github Commit:
https://github.com/knathanieltucker/bit-of-data-science-and-scikit-learn/blob/master/notebooks/CrossValidation.ipynb

Associated Scikit Links:
http://scikit-learn.org/stable/modules/cross_validation.html

Source


[ad_2]

Comment List

  • Data Talks
    December 4, 2020

    Really nice content. Thanks a lot!

  • Data Talks
    December 4, 2020

    if you port the cross_val_predict result as y_pred to classification report (y_pred, y), it will output 3 classes, 0,1,2. Why does it output 3 classes instead of 2 since the iris dataset is binary classification?

  • Data Talks
    December 4, 2020

    nice video can you plz share me the code

  • Data Talks
    December 4, 2020

    Thank you!

  • Data Talks
    December 4, 2020

    Hi,

    A question for you! We run CV which is telling us the performance of the model. This itself doesn't make any model for us. We eventually still need to use .fit() function to train our model by the training set. Whats the point for doing CV in this case?
    Moreover, does that matter to use different kind of CV since it is not actually making the model by CV?

  • Data Talks
    December 4, 2020

    Here's my question.
    when we train our model using the StratifiedKFold we actually get "k" numbers of models in return and we can calculate the accuracy on that model. But how do we get one final model instead of these "k" number of models?
    I've read that we take the average of these models, but how do we take the average of a model.

    To put it more simply, how can we use StratifiedKFold to make a final model?

  • Data Talks
    December 4, 2020

    I am not gay but I have to say that you are one attractive personality <3 Thanks for the video

  • Data Talks
    December 4, 2020

    Thanks for your video, it helps me a lot. By the way, can you zoom in your code page, it is not easy using a 11 inches laptop to read the code. Thanks.

  • Data Talks
    December 4, 2020

    If you do model selection or hyperparameter tuning the CV isn't unbiased on the selected model. Should we hold out a separate test set to test the best model on to get an really unbiased performance estimate?

  • Data Talks
    December 4, 2020

    Good video….but why make the video in the kitchen :D:D

  • Data Talks
    December 4, 2020

    I just had a question: what's the difference between using clf.predict and clf.scores?

  • Data Talks
    December 4, 2020

    Thank you! I needed this for thesis!!

  • Data Talks
    December 4, 2020

    Lovely presented, would love to see more 😉

  • Data Talks
    December 4, 2020

    Great video as always! What approach or method you use to select a model that better represents and fits your data? Do you try a bunch of models, look at their accuracy or other metric and decide which one to use based on the minimal empirical risk? Using a bag of classifiers and selecting the best seem the way to go based on VC theory and the corresponding inequalities but want to get your opinion on this! Thanks so much!!!

  • Data Talks
    December 4, 2020

    in k-fold when using the function kf.split(X) how do we separate the data from the target? (x's, y's)
    I mean the function splits the X array but where do we define our y's?

Write a comment