Data science in Python: pandas, seaborn, scikit-learn
[ad_1]
In this video, we’ll cover the data science pipeline from data ingestion (with pandas) to data visualization (with seaborn) to machine learning (with scikit-learn). We’ll learn how to train and interpret a linear regression model, and then compare three possible evaluation metrics for regression problems. Finally, we’ll apply the train/test split procedure to decide which features to include in our model.
Download the notebook: https://github.com/justmarkham/scikit-learn-videos
pandas installation instructions: http://pandas.pydata.org/pandas-docs/stable/install.html
seaborn installation instructions: http://seaborn.pydata.org/installing.html
Longer linear regression notebook: https://github.com/justmarkham/DAT5/blob/master/notebooks/09_linear_regression.ipynb
Chapter 3 of Introduction to Statistical Learning: http://www-bcf.usc.edu/~gareth/ISL/
Videos related to Chapter 3: https://www.dataschool.io/15-hours-of-expert-machine-learning-videos/
Quick reference guide to linear regression: https://www.dataschool.io/applying-and-interpreting-linear-regression/
Introduction to linear regression: http://people.duke.edu/~rnau/regintro.htm
pandas Q&A video series: https://www.dataschool.io/easier-data-analysis-with-pandas/
pandas 3-part tutorial: http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/
pandas read_csv documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
pandas read_table documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_table.html
seaborn tutorial: http://seaborn.pydata.org/tutorial.html
seaborn example gallery: http://seaborn.pydata.org/examples/index.html
WANT TO GET BETTER AT MACHINE LEARNING? HERE ARE YOUR NEXT STEPS:
1) WATCH my scikit-learn video series:
https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A
2) SUBSCRIBE for more videos:
https://www.youtube.com/dataschool?sub_confirmation=1
3) JOIN “Data School Insiders” to access bonus content:
https://www.patreon.com/dataschool
4) ENROLL in my Machine Learning course:
https://www.dataschool.io/learn/
5) LET’S CONNECT!
– Newsletter: https://www.dataschool.io/subscribe/
– Twitter: https://twitter.com/justmarkham
– Facebook: https://www.facebook.com/DataScienceSchool/
– LinkedIn: https://www.linkedin.com/in/justmarkham/
Source
[ad_2]
Note: This video was recorded using Python 2.7 and scikit-learn 0.16. Recently, I updated the code to use Python 3.6 and scikit-learn 0.19.1. You can download the updated code here: https://github.com/justmarkham/scikit-learn-videos
when I use seaborn to pairplot the data, it doesn't show data for first column i.e. 'TV'
Very helpful
Dude, you're the best professor ever! Thanks a lot
Do you have a tutorial that covers sklearn.datasets?
a nice video
Hi, the file URL isn' valid. Can you please share it?
If I start from hundred of features, is there a way to automatically test combinations?
Hi Kevin, I'm new to both Python and machine learning. Your tutorials are great learning materials. I understanding this is a 5-year old presentation and I'm wondering if you would still answer a question I have related to this tutorial. Specifically, when I was trying to get the pairplots you demonstrated, I got the following error: KeyError: "['Sales'] not in index" and I got three blank boxes. What was wrong? Many Thanks for your help. FYI, I also tried to find answers by Googling online and haven't been able to find any answers that work.
data = pd.read_csv('http://faculty.marshall.usc.edu/gareth-james/ISL/Advertising.csv'😉
dude you're one of the best
Impressive teacher!
thank you! very clear and helpful
I am getting a parser error for reading the csv file from the website. (3:00)
Really appreciate that you also explain the algorithms and how to find the coefficient governing the equations. Thank you so much!
Parse error while reading csv
Is the url still valid?
I dont see, http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv is accessible now. Any alternative for that link? Thanks.
Thank you so much for putting together this amazing series. I have a qq though : We get an RMSE of ~1.4, and you say that its good "given that the Sales range from 5 -25", could you elaborate a bit on this please? Thanks once again!
I am answering your question 5 years later but I would love to see more video tutorials from you about scikit-learn (e.g Neural network models (supervised)) or
scikit-multilearn if you want!! 🙂 Thnx a lot Kevin!
I have changed the code as follow but the problem still exists:
model = LogisticRegression(solver='lbfgs', multi_class='auto')
/opt/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations.
"of iterations.", ConvergenceWarning)
Thanks a lot, the shift+Tab is no longer available, what else could do?
I hate the limited functionality scikit-learn provides – if you do linear regression you most certainly want to (and should) look at confidence intervals – why aren’t they implemented!?
thanks alot !!
Great video. With that being said, I'm still having some trouble understanding the 0.046 result for tv. I guess I'm having trouble seeing how useful the result is. What would an actionable insight based on the result? SO if a company spent, say, 5k each on tv, newspaper, and radio ads, they can expect 233 increase in sales numbers?
To be candid, this is the best video I've ever watched on scikit-learn. Thumbs up!!!
Currently the url for the dataset is : http://faculty.marshall.usc.edu/gareth-james/ISL/Advertising.csv
Kinda complete one, putting together all at-once! The best, I have watched until now!
Thank you very much
Your teaching methodology is awesome making things crystal clear.
can I use the heatmap to see a relation
Your video tutorial is outstanding! You can simplify complex concepts in an elegant manner. And unlike other instructors you don't show-off on how smart you are. That's why we know that you're really a smart guy 🙂
Hi, Kevin! Thank you for your videos about pandas and scikit-learn. You help me, to learn about Data science very well. But, at this point I got an issue to access the csv file from this link. http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv. I think the URL is already expired, and no csv file inside. Could you please to re-upload the file at bit.ly or something? Thank you, and hopefully you read this comment.
Thanks a lot for this great material you've put together. Very very helpful!
Thanks for the wonderful video. I have one ques as we can predict data for test dataset using train/test predict function. how to predict test data using cross_val_score / cross_val_predict bcos here X & y both are needed but Test data does not have y it has only X
the best tutorial on watsapp
Your teaching methodology is best,you step by step teaching method is very helpful for me to understand.You are the best.
Wow, one of the best YT tutorials about this topic, thank you!
…also, can you tell us more about 'random_state=1' parameter you used to split the data into test and train. Thanks a lot!
Thank you for the great tutorials, Kevin! I got a problem importing data: tried the url mentioned in the video as well as the file cited in ur GitHub. Could you please help with that. Many thanks, Sarnai
sir your videos are very good help me a lot thanks a lot for such a wonderful lectures.
Please add more videos to the series. It is really helpful and amazing to watch your videos. You are a great teacher.
URL for the data set is : ' http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv '