Cross Validation in Scikit Learn
[ad_1]
This is the big one. We go over cross validation and other techniques to split your data. VERY IMPORTANT. We talk about cross validated scoring and prediction and then we talk about scikit learn cross validation iterators: K-fold, stratified fold, grouped data, and time series split.
Associated Github Commit:
https://github.com/knathanieltucker/bit-of-data-science-and-scikit-learn/blob/master/notebooks/CrossValidation.ipynb
Associated Scikit Links:
http://scikit-learn.org/stable/modules/cross_validation.html
Source
[ad_2]
Really nice content. Thanks a lot!
if you port the cross_val_predict result as y_pred to classification report (y_pred, y), it will output 3 classes, 0,1,2. Why does it output 3 classes instead of 2 since the iris dataset is binary classification?
nice video can you plz share me the code
Thank you!
Hi,
A question for you! We run CV which is telling us the performance of the model. This itself doesn't make any model for us. We eventually still need to use .fit() function to train our model by the training set. Whats the point for doing CV in this case?
Moreover, does that matter to use different kind of CV since it is not actually making the model by CV?
Here's my question.
when we train our model using the StratifiedKFold we actually get "k" numbers of models in return and we can calculate the accuracy on that model. But how do we get one final model instead of these "k" number of models?
I've read that we take the average of these models, but how do we take the average of a model.
To put it more simply, how can we use StratifiedKFold to make a final model?
I am not gay but I have to say that you are one attractive personality <3 Thanks for the video
Thanks for your video, it helps me a lot. By the way, can you zoom in your code page, it is not easy using a 11 inches laptop to read the code. Thanks.
If you do model selection or hyperparameter tuning the CV isn't unbiased on the selected model. Should we hold out a separate test set to test the best model on to get an really unbiased performance estimate?
Good video….but why make the video in the kitchen :D:D
I just had a question: what's the difference between using clf.predict and clf.scores?
Thank you! I needed this for thesis!!
Lovely presented, would love to see more 😉
Great video as always! What approach or method you use to select a model that better represents and fits your data? Do you try a bunch of models, look at their accuracy or other metric and decide which one to use based on the minimal empirical risk? Using a bag of classifiers and selecting the best seem the way to go based on VC theory and the corresponding inequalities but want to get your opinion on this! Thanks so much!!!
in k-fold when using the function kf.split(X) how do we separate the data from the target? (x's, y's)
I mean the function splits the X array but where do we define our y's?