Data science in Python: pandas, seaborn, scikit-learn




[ad_1]

In this video, we’ll cover the data science pipeline from data ingestion (with pandas) to data visualization (with seaborn) to machine learning (with scikit-learn). We’ll learn how to train and interpret a linear regression model, and then compare three possible evaluation metrics for regression problems. Finally, we’ll apply the train/test split procedure to decide which features to include in our model.

Download the notebook: https://github.com/justmarkham/scikit-learn-videos
pandas installation instructions: http://pandas.pydata.org/pandas-docs/stable/install.html
seaborn installation instructions: http://seaborn.pydata.org/installing.html
Longer linear regression notebook: https://github.com/justmarkham/DAT5/blob/master/notebooks/09_linear_regression.ipynb
Chapter 3 of Introduction to Statistical Learning: http://www-bcf.usc.edu/~gareth/ISL/
Videos related to Chapter 3: https://www.dataschool.io/15-hours-of-expert-machine-learning-videos/
Quick reference guide to linear regression: https://www.dataschool.io/applying-and-interpreting-linear-regression/
Introduction to linear regression: http://people.duke.edu/~rnau/regintro.htm
pandas Q&A video series: https://www.dataschool.io/easier-data-analysis-with-pandas/
pandas 3-part tutorial: http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/
pandas read_csv documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
pandas read_table documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_table.html
seaborn tutorial: http://seaborn.pydata.org/tutorial.html
seaborn example gallery: http://seaborn.pydata.org/examples/index.html

WANT TO GET BETTER AT MACHINE LEARNING? HERE ARE YOUR NEXT STEPS:

1) WATCH my scikit-learn video series:
https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A

2) SUBSCRIBE for more videos:
https://www.youtube.com/dataschool?sub_confirmation=1

3) JOIN “Data School Insiders” to access bonus content:
https://www.patreon.com/dataschool

4) ENROLL in my Machine Learning course:
https://www.dataschool.io/learn/

5) LET’S CONNECT!
– Newsletter: https://www.dataschool.io/subscribe/
– Twitter: https://twitter.com/justmarkham
– Facebook: https://www.facebook.com/DataScienceSchool/
– LinkedIn: https://www.linkedin.com/in/justmarkham/

Source


[ad_2]

Comment List

  • Data School
    November 16, 2020

    Note: This video was recorded using Python 2.7 and scikit-learn 0.16. Recently, I updated the code to use Python 3.6 and scikit-learn 0.19.1. You can download the updated code here: https://github.com/justmarkham/scikit-learn-videos

  • Data School
    November 16, 2020

    when I use seaborn to pairplot the data, it doesn't show data for first column i.e. 'TV'

  • Data School
    November 16, 2020

    Very helpful

  • Data School
    November 16, 2020

    Dude, you're the best professor ever! Thanks a lot

  • Data School
    November 16, 2020

    Do you have a tutorial that covers sklearn.datasets?

  • Data School
    November 16, 2020

    a nice video

  • Data School
    November 16, 2020

    Hi, the file URL isn' valid. Can you please share it?

  • Data School
    November 16, 2020

    If I start from hundred of features, is there a way to automatically test combinations?

  • Data School
    November 16, 2020

    Hi Kevin, I'm new to both Python and machine learning. Your tutorials are great learning materials. I understanding this is a 5-year old presentation and I'm wondering if you would still answer a question I have related to this tutorial. Specifically, when I was trying to get the pairplots you demonstrated, I got the following error: KeyError: "['Sales'] not in index" and I got three blank boxes. What was wrong? Many Thanks for your help. FYI, I also tried to find answers by Googling online and haven't been able to find any answers that work.

  • Data School
    November 16, 2020
  • Data School
    November 16, 2020

    dude you're one of the best

  • Data School
    November 16, 2020

    Impressive teacher!

  • Data School
    November 16, 2020

    thank you! very clear and helpful

  • Data School
    November 16, 2020

    I am getting a parser error for reading the csv file from the website. (3:00)

  • Data School
    November 16, 2020

    Really appreciate that you also explain the algorithms and how to find the coefficient governing the equations. Thank you so much!

  • Data School
    November 16, 2020

    Parse error while reading csv
    Is the url still valid?

  • Data School
    November 16, 2020

    I dont see, http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv is accessible now. Any alternative for that link? Thanks.

  • Data School
    November 16, 2020

    Thank you so much for putting together this amazing series. I have a qq though : We get an RMSE of ~1.4, and you say that its good "given that the Sales range from 5 -25", could you elaborate a bit on this please? Thanks once again!

  • Data School
    November 16, 2020

    I am answering your question 5 years later but I would love to see more video tutorials from you about scikit-learn (e.g Neural network models (supervised)) or
    scikit-multilearn if you want!! 🙂 Thnx a lot Kevin!

  • Data School
    November 16, 2020

    I have changed the code as follow but the problem still exists:

    model = LogisticRegression(solver='lbfgs', multi_class='auto')

    /opt/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations.
    "of iterations.", ConvergenceWarning)

  • Data School
    November 16, 2020

    Thanks a lot, the shift+Tab is no longer available, what else could do?

  • Data School
    November 16, 2020

    I hate the limited functionality scikit-learn provides – if you do linear regression you most certainly want to (and should) look at confidence intervals – why aren’t they implemented!?

  • Data School
    November 16, 2020

    thanks alot !!

  • Data School
    November 16, 2020

    Great video. With that being said, I'm still having some trouble understanding the 0.046 result for tv. I guess I'm having trouble seeing how useful the result is. What would an actionable insight based on the result? SO if a company spent, say, 5k each on tv, newspaper, and radio ads, they can expect 233 increase in sales numbers?

  • Data School
    November 16, 2020

    To be candid, this is the best video I've ever watched on scikit-learn. Thumbs up!!!

  • Data School
    November 16, 2020
  • Data School
    November 16, 2020

    Kinda complete one, putting together all at-once! The best, I have watched until now!

  • Data School
    November 16, 2020

    Thank you very much
    Your teaching methodology is awesome making things crystal clear.

  • Data School
    November 16, 2020

    can I use the heatmap to see a relation

  • Data School
    November 16, 2020

    Your video tutorial is outstanding! You can simplify complex concepts in an elegant manner. And unlike other instructors you don't show-off on how smart you are. That's why we know that you're really a smart guy 🙂

  • Data School
    November 16, 2020

    Hi, Kevin! Thank you for your videos about pandas and scikit-learn. You help me, to learn about Data science very well. But, at this point I got an issue to access the csv file from this link. http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv. I think the URL is already expired, and no csv file inside. Could you please to re-upload the file at bit.ly or something? Thank you, and hopefully you read this comment.

  • Data School
    November 16, 2020

    Thanks a lot for this great material you've put together. Very very helpful!

  • Data School
    November 16, 2020

    Thanks for the wonderful video. I have one ques as we can predict data for test dataset using train/test predict function. how to predict test data using cross_val_score / cross_val_predict bcos here X & y both are needed but Test data does not have y it has only X

  • Data School
    November 16, 2020

    the best tutorial on watsapp

  • Data School
    November 16, 2020

    Your teaching methodology is best,you step by step teaching method is very helpful for me to understand.You are the best.

  • Data School
    November 16, 2020

    Wow, one of the best YT tutorials about this topic, thank you!

  • Data School
    November 16, 2020

    …also, can you tell us more about 'random_state=1' parameter you used to split the data into test and train. Thanks a lot!

  • Data School
    November 16, 2020

    Thank you for the great tutorials, Kevin! I got a problem importing data: tried the url mentioned in the video as well as the file cited in ur GitHub. Could you please help with that. Many thanks, Sarnai

  • Data School
    November 16, 2020

    sir your videos are very good help me a lot thanks a lot for such a wonderful lectures.

  • Data School
    November 16, 2020

    Please add more videos to the series. It is really helpful and amazing to watch your videos. You are a great teacher.

  • Data School
    November 16, 2020

Write a comment