Data Science Best Practices with pandas (PyCon 2019)
The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, the size and complexity of the pandas library makes it challenging to discover the best way to accomplish any given task.
In this tutorial, you’ll use pandas to answer questions about a real-world dataset. Through each exercise, you’ll learn important data science skills as well as “best practices” for using pandas. By the end of the tutorial, you’ll be more fluent at using pandas to correctly and efficiently answer your own data science questions.
05:14 1. Introduction to the TED Talks dataset
10:45 2. Which talks provoke the most online discussion?
18:58 3. Visualize the distribution of comments
34:20 4. Plot the number of talks that took place each year
50:30 5. What were the “best” events in TED history to attend?
1:01:28 6. Unpack the ratings data
1:13:36 7. Count the total number of ratings received by each talk
1:22:55 8. Which occupations deliver the funniest TED talks on average?
DOWNLOAD the dataset and Jupyter notebook:
WATCH my introductory series, Data Analysis with pandas:
JOIN the “Data School Insiders” community:
– Email Newsletter: https://www.dataschool.io/subscribe/
– LinkedIn: https://www.linkedin.com/in/justmarkham/
– Twitter: https://twitter.com/justmarkham
– Facebook: https://www.facebook.com/DataScienceSchool/
– YouTube: https://www.youtube.com/dataschool?sub_confirmation=1