Machine Learning Tutorial Python – 7: Training and Testing Data




[ad_1]

sklearn.model_selection.train_test_split method is used in machine learning projects to split available dataset into training and test set. This way you can train and test on separate datasets. When you test your model using dataset that model didn’t see during training phase, it will give you better idea on the accuracy of a model.

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #MachineLearningMethod #DataTraining

Code: https://github.com/codebasics/py/blob/master/ML/6_train_test_split/train_test_split.ipynb

Topics that are covered in this Video:
0:01 – Theory behind why we need to split given dataset into training and test using sklearn train set split method.
0:54 – Coding (Here we use car price prediction problem to demonstrate train test split)
2:14 – Use train_test_split from sklearn
3:54 – Use of random state method
4:49 – Use of fit() method to train your model
5:35 – Score() method (to check the accuracy of the model)

Next Video:
Machine Learning Tutorial Python – 8: Logistic Regression (Binary Classification): https://www.youtube.com/watch?v=zM4VZR0px8E&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=8

Very Simple Explanation Of Neural Network: https://www.youtube.com/watch?v=ER2It2mIagI

Populor Playlist:
Data Science Full Course: https://www.youtube.com/playlist?list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV

Data Science Project: https://www.youtube.com/watch?v=rdfbcdP75KI&list=PLeo1K3hjS3uu7clOTtwsp94PcHbzqpAdg

Machine learning tutorials: https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw

Pandas: https://www.youtube.com/watch?v=CmorAWRsCAw&list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy

matplotlib: https://www.youtube.com/watch?v=qqwf4Vuj8oM&list=PLeo1K3hjS3uu4Lr8_kro2AqaO6CFYgKOl

Python: https://www.youtube.com/watch?v=eykoKxsYtow&list=PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0&index=1

Jupyter Notebook: https://www.youtube.com/watch?v=q_BzsPxwLOE&list=PLeo1K3hjS3uuZPwzACannnFSn9qHn8to8

To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website: http://codebasicshub.com/
Facebook: https://www.facebook.com/codebasicshub
Twitter: https://twitter.com/codebasicshub

Source


[ad_2]

Comment List

  • codebasics
    December 5, 2020

    Step by step roadmap to learn data science in 6 months: https://www.youtube.com/watch?v=H4YcqULY1-Q

  • codebasics
    December 5, 2020

    Thanks a ton…

  • codebasics
    December 5, 2020

    Is the train and test model only used for checking the accuracy ? can the train and split model predict a new value?

  • codebasics
    December 5, 2020

    "from sklearn.model_selection import train_test_split
    x_train , x_test , y_train ,y_test = train_test_split(x,y,test_size=0.2)"
    I am getting error in this step

  • codebasics
    December 5, 2020

    x_train is not defined why

  • codebasics
    December 5, 2020

    Thank you so much for explaining all this…I wanted to know is score results change…I got 94 % for the same data set

  • codebasics
    December 5, 2020

    How to split own image dataset by xtrain and xtest

  • codebasics
    December 5, 2020

    Thank you very much. Your explanation was very clear and straight forwarding.

  • codebasics
    December 5, 2020

    while running getting an error –>model_lr.fit(X_train,y_train)
    ValueError: could not convert string to float: 'Medium'

  • codebasics
    December 5, 2020

    Hi, I have an issue when I try to split my data "ValueError: Found input variables with inconsistent numbers of samples: [6, 3]
    " have you any idea to solve this problem thank you.

  • codebasics
    December 5, 2020

    Greetins from Brazil. you got a powerfull content here. thanks a lot

  • codebasics
    December 5, 2020

    TypeError Traceback (most recent call last)

    <ipython-input-27-3c183c7f1d3b> in <module>()

    —-> 1 y = df['Mortality Rate(%)']

    TypeError: list indices must be integers or slices, not str
    Sir, the above error is coming as soon as i am writting the code x = df[['Milege….., y = df[['Sell price….
    Till the scatter plot everything worked perfectly

  • codebasics
    December 5, 2020

    Thankyou so much

  • codebasics
    December 5, 2020

    Wiil there be any scenario where i use the same data but i am getting accuracy as 44%.As i am getting the same even though i am using the same data .

  • codebasics
    December 5, 2020

    Excellent explanation! Thanks a lot!!, methodology is quite straight forward, however in my experience (short by now), we need to determinate which is the best classifier to use based on the data investigation, mainly using data visualization tools and then prepare the data previously to process it.

  • codebasics
    December 5, 2020

    Lovely video, thank you. Do you do private sessions or queries?

  • codebasics
    December 5, 2020

    What happens if we don't specify test_size, but instead of it, we use random_state=1 ? I don't know what will be behind the scenes.

  • codebasics
    December 5, 2020

    I have a question. When do you test the model on the y_test variable. I don't think you have used it.

  • codebasics
    December 5, 2020

    my train_test_split does not let me import a test_size without an error message

  • codebasics
    December 5, 2020

    Brilliant videos my friend, keep it up!

  • codebasics
    December 5, 2020

    cannot import name '__check_build' from 'sklearn' What is this error? How to rectify?

  • codebasics
    December 5, 2020

    Thanks so much mate, yours is the first video I've found which hasn't used the iris dataset for this. The iris dataset from Sklearn confuses the hell out of me with its different commands.

  • codebasics
    December 5, 2020

    May I have ur mail-id sir

  • codebasics
    December 5, 2020

    How to select first 80 rows for training and 20 rows for testing instead of randomly?

  • codebasics
    December 5, 2020

    I am going through all the videos one by one.. they are awesome. I want to install the tech stack you are using. can you point me to the video please where I can follow along and just install the stack and get ready for these ?

  • codebasics
    December 5, 2020

    u rock

  • codebasics
    December 5, 2020

    exactly what i was looking for thankyou so much

  • codebasics
    December 5, 2020

    why 4 parameters back and why you are continuosly pronouncing test as 'taste'?

  • codebasics
    December 5, 2020

    got error while test out algo "ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

    "

  • codebasics
    December 5, 2020

    Very well explained thank u

  • codebasics
    December 5, 2020

    Thank you so much. I love you.

  • codebasics
    December 5, 2020

    I have a question,

    Is it possible to divide the data set without it being random, that is, taking the first 70 observations for the train and the remaining 30 for the exam?

  • codebasics
    December 5, 2020

    Thanks for the explanation. I think the test set shouldn't be touched, so I'm not sure what you did in the end was correct.

  • codebasics
    December 5, 2020

    how does predict function works in the model after we train the x_test on it
    ?

  • codebasics
    December 5, 2020

    Hello! Can you please tell me that how to split multiple dependent columns into train _test_split function

  • codebasics
    December 5, 2020

    How do randomstate and kfold interact? Does it just mean that the same sets are being used when executed aka then the same average?

  • codebasics
    December 5, 2020

    clf.predict(X_test) shows y_test values in same orders or in random order? I mean, how to compare test predict price with data frame values?

  • codebasics
    December 5, 2020

    Just to get things clear, if I do predict(X_test), I'm getting the label of that X_test, which is basically y_test?

  • codebasics
    December 5, 2020

    Superb Sir, It really helps me a lot, I took a course from udemy data science by Bootcamp, but It didn't explain like you.

  • codebasics
    December 5, 2020

    Thank you so much that was so good.

  • codebasics
    December 5, 2020

    Sir, In most of the cases it is showing 'could not convert string to float'.

  • codebasics
    December 5, 2020

    you made boring interesting.Thanks….

  • codebasics
    December 5, 2020

    Thanks Its really help to complete my assignemnt

  • codebasics
    December 5, 2020

    Hello sir, I'm regularly watching your videos. These are very helpful. You are doing a great job. Could you upload more projects of data science? Thanks a lot.

Write a comment