Data Cleaning Tutorial (2020) | Cleaning Data With Python and Pandas
This data cleaning tutorial will introduce you to Python’s Pandas Library in 2020.
Get access to my complete data cleaning course for $79 (normal price $297) – LIMITED TIME OFFER
✅ Data Wrangling with Python – https://bit.ly/3b6Luz6
✅Subscribe for even more Data Science tutorials!
✅Check out our website for the best Data Science tips in 2020:
✅Follow us on Twitter!
Error handling in Python: https://docs.python.org/3/tutorial/errors.html
Matt Brems material on missing values: https://github.com/matthewbrems/ODSC-missing-data-may-18/blob/master/Analysis%20with%20Missing%20Data.pdf
It’s the start of a new project and you’re excited to apply some machine learning models.
You take a look at the data and quickly realize it’s an absolute mess.
According to IBM Data Analytics you can expect to spend up to 80% of your time on a project cleaning data.
There’s all different types of messy data, but today we’re going to focus on one of the most common, missing values.
We’ll take a look at standard types that Pandas recognizes out of the box.
Next we’ll take a look at some non-standard types. These are inputs that Pandas won’t automatically recognize as missing values.
After that we’ll take a look at unexpected types. Let’s say you have a column of names that contains a 12, technically that’s a missing value.
After we’ve finished detecting missing values we’ll learn how to summarize and do simple replacements.