Machine Learning Tutorial Python 12 – K Fold Cross Validation
[ad_1]
Many times we get in a dilemma of which machine learning model should we use for a given problem. KFold cross validation allows us to evaluate performance of a model by creating K folds of given dataset. This is better then traditional train_test_split. In this tutorial we will cover basics of cross validation and kfold. We will also look into cross_val_score function of sklearn library which provides convenient way to run cross validation on a model
#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #MachineLearningModel #sklearn
Code: https://github.com/codebasics/py/blob/master/ML/12_KFold_Cross_Validation/12_k_fold.ipynb
Exercise: Exercise description is avialable in above notebook towards the end
Exercise solution: https://github.com/codebasics/py/blob/master/ML/12_KFold_Cross_Validation/Exercise/exercise_kfold_validation.ipynb
Topics that are covered in this Video:
0:21 Cross Validation
1:02 Ways to train your model( use all available data for training and test on same dataset)
2:08 Ways to train your model( split available dataset into training and test sets)
3:26 Ways to train your model (k fold cross validation)
4:26 Coding (start) (Use hand written digits dataset for kfold cross validation)
8:23 sklearn.model_selection KFold
9:10 KFold.split method
12:21 StratifiedKFold
19:45 cross_val_score
Next Video:
Machine Learning Tutorial Python – 13: K Means Clustering: https://www.youtube.com/watch?v=EItlUEPCIzM&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=14
Populor Playlist:
Data Science Full Course: https://www.youtube.com/playlist?list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV
Data Science Project: https://www.youtube.com/watch?v=rdfbcdP75KI&list=PLeo1K3hjS3uu7clOTtwsp94PcHbzqpAdg
Machine learning tutorials: https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw
Pandas: https://www.youtube.com/watch?v=CmorAWRsCAw&list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy
matplotlib: https://www.youtube.com/watch?v=qqwf4Vuj8oM&list=PLeo1K3hjS3uu4Lr8_kro2AqaO6CFYgKOl
Python: https://www.youtube.com/watch?v=eykoKxsYtow&list=PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0&index=1
Jupyter Notebook: https://www.youtube.com/watch?v=q_BzsPxwLOE&list=PLeo1K3hjS3uuZPwzACannnFSn9qHn8to8
To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.
Website: http://codebasicshub.com/
Facebook: https://www.facebook.com/codebasicshub
Twitter: https://twitter.com/codebasicshub
Source
[ad_2]
amazing explanation. LOved it <3
I watch several videos of CV but your video is well explained, thank you, thank you very much sir, keep uploading videos sir
Thank you so much sir.
Thank you
I have a question, What value of k does the cross_val_score function selects? Is there a way to know/change it?
Kindly make separated video for stratified KFold didn't understand
Hi sir, can you please tell which technique is used for resampling data, if more than 2 classes.
Thanks for creating rather authentic content on this topic compare to others. It is more clear!
I got the highest score by SVM classifier is that right ?
If we set 150 cross validation then who many set for training and testing
cool cool cool
i have one question here why this cross_val_score function returning 3 score we haven't passed any number of the fold and can not find any default number for Fold
wonderful explaination. Great tutorial series
Couldn't ask for a better teacher to teach machine learning. Truly exceptional !!!!Thank You so much for all your efforts.
İ am really appricate it👍
Hi, is cross validation also useful for regression problems?
Probably the best machine learning tutorials out there… Very good job
Thanks!
Greatly explained man. Thank you
Informative content. Thanks alot!!
How does 'cross_val_score ()' decided on the no. of folds to be taken? Why it did 3 times?
I have a question, when you use cross validation and create k folds, you also have k different instance of the method, for the generalization of the method what instance do you use?
Thanks for the video, was great!
Sir, really a very good explanation… finally i understood it very well…..
my SVM score for digits came out to be 0.9814, the best among all three
following your tutorials is the best way to learn Machine learning techniques. Please upload a video explanation on KNN as well.
SVM is the best, with an accuracy of 97.33%.
If i was rich i would have sent you a token of appreciation…Thank you for the content
How the model.score() works?
I got an accuracy of 97% with svm and you got 40 how?
In my case Logistic regression won with 2 hundered percent in it!
I Created a function to find the score . it uses train and split data in the ratio of 80:20. it takes a list of Machine Learning algorithms as first argument and takes x_train,y_train,x_test and y_test respectively. I used train_test_split method to split the data for testing and training. but the first time it is giving me SVC with the best accuracy with no additional configuration. Please help me to find why it is giving SVC AS THE BEST ALGORITHM.
The code goes here:
+++++++++++++++++++++++++++++++++++++++++++++++++
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split, KFold
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
import numpy as np
digits = load_digits()
inputs = digits.data
target = digits.target
x_train, x_test, y_train, y_test = train_test_split(inputs,target,test_size=0.2)
modelList = [LogisticRegression(),DecisionTreeClassifier(),RandomForestClassifier()
,SVC()]
def getListMLScore(modelList,x_train,y_train,x_test,y_test):
scores = []
for i in range(len(modelList)):
model = modelList[i]
model.fit(x_train,y_train)
score = model.score(x_test,y_test)
scores.append(f"{modelList[i]}:{score}")
return scores
scores = getListMLScore(ml,x_train,y_train,x_test,y_test)
print(scores)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Output:['LogisticRegression():0.975', 'DecisionTreeClassifier():0.8583333333333333', 'RandomForestClassifier():0.9861111111111112', 'SVC():0.9944444444444445']
++++++++++++Code ends Here++++++++++++++++++++++++++++++++++++++++
Please help me to find why this happens? Is this because of the testing data or anything else?
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thanks in advance for those helping minds who see this comment!
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Contact:9544240675