Machine Learning Tutorial Python 12 – K Fold Cross Validation




[ad_1]

Many times we get in a dilemma of which machine learning model should we use for a given problem. KFold cross validation allows us to evaluate performance of a model by creating K folds of given dataset. This is better then traditional train_test_split. In this tutorial we will cover basics of cross validation and kfold. We will also look into cross_val_score function of sklearn library which provides convenient way to run cross validation on a model

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #MachineLearningModel #sklearn

Code: https://github.com/codebasics/py/blob/master/ML/12_KFold_Cross_Validation/12_k_fold.ipynb

Exercise: Exercise description is avialable in above notebook towards the end

Exercise solution: https://github.com/codebasics/py/blob/master/ML/12_KFold_Cross_Validation/Exercise/exercise_kfold_validation.ipynb

Topics that are covered in this Video:
0:21 Cross Validation
1:02 Ways to train your model( use all available data for training and test on same dataset)
2:08 Ways to train your model( split available dataset into training and test sets)
3:26 Ways to train your model (k fold cross validation)
4:26 Coding (start) (Use hand written digits dataset for kfold cross validation)
8:23 sklearn.model_selection KFold
9:10 KFold.split method
12:21 StratifiedKFold
19:45 cross_val_score

Next Video:
Machine Learning Tutorial Python – 13: K Means Clustering: https://www.youtube.com/watch?v=EItlUEPCIzM&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=14

Populor Playlist:
Data Science Full Course: https://www.youtube.com/playlist?list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV

Data Science Project: https://www.youtube.com/watch?v=rdfbcdP75KI&list=PLeo1K3hjS3uu7clOTtwsp94PcHbzqpAdg

Machine learning tutorials: https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw

Pandas: https://www.youtube.com/watch?v=CmorAWRsCAw&list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy

matplotlib: https://www.youtube.com/watch?v=qqwf4Vuj8oM&list=PLeo1K3hjS3uu4Lr8_kro2AqaO6CFYgKOl

Python: https://www.youtube.com/watch?v=eykoKxsYtow&list=PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0&index=1

Jupyter Notebook: https://www.youtube.com/watch?v=q_BzsPxwLOE&list=PLeo1K3hjS3uuZPwzACannnFSn9qHn8to8

To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website: http://codebasicshub.com/
Facebook: https://www.facebook.com/codebasicshub
Twitter: https://twitter.com/codebasicshub

Source


[ad_2]

Comment List

  • codebasics
    January 12, 2021

    amazing explanation. LOved it <3

  • codebasics
    January 12, 2021

    I watch several videos of CV but your video is well explained, thank you, thank you very much sir, keep uploading videos sir

  • codebasics
    January 12, 2021

    Thank you so much sir.

  • codebasics
    January 12, 2021

    Thank you

  • codebasics
    January 12, 2021

    I have a question, What value of k does the cross_val_score function selects? Is there a way to know/change it?

  • codebasics
    January 12, 2021

    Kindly make separated video for stratified KFold didn't understand

  • codebasics
    January 12, 2021

    Hi sir, can you please tell which technique is used for resampling data, if more than 2 classes.

  • codebasics
    January 12, 2021

    Thanks for creating rather authentic content on this topic compare to others. It is more clear!

  • codebasics
    January 12, 2021

    I got the highest score by SVM classifier is that right ?

  • codebasics
    January 12, 2021

    If we set 150 cross validation then who many set for training and testing

  • codebasics
    January 12, 2021

    cool cool cool

  • codebasics
    January 12, 2021

    i have one question here why this cross_val_score function returning 3 score we haven't passed any number of the fold and can not find any default number for Fold

  • codebasics
    January 12, 2021

    wonderful explaination. Great tutorial series

  • codebasics
    January 12, 2021

    Couldn't ask for a better teacher to teach machine learning. Truly exceptional !!!!Thank You so much for all your efforts.

  • codebasics
    January 12, 2021

    İ am really appricate it👍

  • codebasics
    January 12, 2021

    Hi, is cross validation also useful for regression problems?

  • codebasics
    January 12, 2021

    Probably the best machine learning tutorials out there… Very good job
    Thanks!

  • codebasics
    January 12, 2021

    Greatly explained man. Thank you

  • codebasics
    January 12, 2021

    Informative content. Thanks alot!!

  • codebasics
    January 12, 2021

    How does 'cross_val_score ()' decided on the no. of folds to be taken? Why it did 3 times?

  • codebasics
    January 12, 2021

    I have a question, when you use cross validation and create k folds, you also have k different instance of the method, for the generalization of the method what instance do you use?

    Thanks for the video, was great!

  • codebasics
    January 12, 2021

    Sir, really a very good explanation… finally i understood it very well…..

  • codebasics
    January 12, 2021

    my SVM score for digits came out to be 0.9814, the best among all three

  • codebasics
    January 12, 2021

    following your tutorials is the best way to learn Machine learning techniques. Please upload a video explanation on KNN as well.

  • codebasics
    January 12, 2021

    SVM is the best, with an accuracy of 97.33%.

  • codebasics
    January 12, 2021

    If i was rich i would have sent you a token of appreciation…Thank you for the content

  • codebasics
    January 12, 2021

    How the model.score() works?

  • codebasics
    January 12, 2021

    I got an accuracy of 97% with svm and you got 40 how?

  • codebasics
    January 12, 2021

    In my case Logistic regression won with 2 hundered percent in it!

  • codebasics
    January 12, 2021

    I Created a function to find the score . it uses train and split data in the ratio of 80:20. it takes a list of Machine Learning algorithms as first argument and takes x_train,y_train,x_test and y_test respectively. I used train_test_split method to split the data for testing and training. but the first time it is giving me SVC with the best accuracy with no additional configuration. Please help me to find why it is giving SVC AS THE BEST ALGORITHM.
    The code goes here:
    +++++++++++++++++++++++++++++++++++++++++++++++++
    from sklearn.datasets import load_digits

    from sklearn.model_selection import train_test_split, KFold

    from sklearn.linear_model import LogisticRegression

    from sklearn.ensemble import RandomForestClassifier

    from sklearn.svm import SVC

    from sklearn.tree import DecisionTreeClassifier

    import numpy as np

    digits = load_digits()

    inputs = digits.data

    target = digits.target

    x_train, x_test, y_train, y_test = train_test_split(inputs,target,test_size=0.2)

    modelList = [LogisticRegression(),DecisionTreeClassifier(),RandomForestClassifier()

    ,SVC()]

    def getListMLScore(modelList,x_train,y_train,x_test,y_test):

    scores = []

    for i in range(len(modelList)):

    model = modelList[i]

    model.fit(x_train,y_train)

    score = model.score(x_test,y_test)

    scores.append(f"{modelList[i]}:{score}")

    return scores

    scores = getListMLScore(ml,x_train,y_train,x_test,y_test)

    print(scores)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Output:['LogisticRegression():0.975', 'DecisionTreeClassifier():0.8583333333333333', 'RandomForestClassifier():0.9861111111111112', 'SVC():0.9944444444444445']

    ++++++++++++Code ends Here++++++++++++++++++++++++++++++++++++++++
    Please help me to find why this happens? Is this because of the testing data or anything else?
    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

    Thanks in advance for those helping minds who see this comment!
    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    Contact:9544240675

Write a comment