Data Cleaning Tutorial (2020) | Cleaning Data With Python and Pandas




[ad_1]

This data cleaning tutorial will introduce you to Python’s Pandas Library in 2020.

Get access to my complete data cleaning course for $79 (normal price $297) – LIMITED TIME OFFER
✅ Data Wrangling with Python – https://bit.ly/3b6Luz6

✅Subscribe for even more Data Science tutorials!
https://bit.ly/2J2O5N8

✅Check out our website for the best Data Science tips in 2020:
https://www.dataoptimal.com

✅Follow us on Twitter!
https://twitter.com/DataOptimal

**Video Resources**
Full article:
https://www.dataoptimal.com/data-cleaning-with-python-2018/

Dataset: https://github.com/dataoptimal/videos/tree/master/cleaning%20messy%20data%20with%20pandas

Pandas link: http://pandas.pydata.org/pandas-docs/version/0.21/indexing.html#indexing-label

Error handling in Python: https://docs.python.org/3/tutorial/errors.html

Matt Brems material on missing values: https://github.com/matthewbrems/ODSC-missing-data-may-18/blob/master/Analysis%20with%20Missing%20Data.pdf

It’s the start of a new project and you’re excited to apply some machine learning models.

You take a look at the data and quickly realize it’s an absolute mess.

According to IBM Data Analytics you can expect to spend up to 80% of your time on a project cleaning data.

There’s all different types of messy data, but today we’re going to focus on one of the most common, missing values.

We’ll take a look at standard types that Pandas recognizes out of the box.

Next we’ll take a look at some non-standard types. These are inputs that Pandas won’t automatically recognize as missing values.

After that we’ll take a look at unexpected types. Let’s say you have a column of names that contains a 12, technically that’s a missing value.

After we’ve finished detecting missing values we’ll learn how to summarize and do simple replacements.

Source


[ad_2]

Comment List

  • DataOptimal
    November 30, 2020

    great video, thanks!

    I am looking for data cleaning approaches for predictive maintenance; for like wind turbine, rotary machinery, etc.

    I am wondering if you can share resources, techniques, title of books, etc. that you know are useful

    or, if you know people who might know about this.
    my twitter is @davaninavid

  • DataOptimal
    November 30, 2020

    Hey, this is highly helpful, especially 'for loop' to replace errors

  • DataOptimal
    November 30, 2020

    Thanks this was helpful, keep it up 👏🏽

  • DataOptimal
    November 30, 2020

    Try cleaning data after Elon's boy grows up…

  • DataOptimal
    November 30, 2020

    I’m a U.K. United Kingdom 🇬🇧 based Business Analyst and have recently noticed Data analytics tasks and responsibilities being included in Business Analyst job descriptions. It’s quite a lot to deal with

  • DataOptimal
    November 30, 2020

    Hey guys, check out my selfmade csv data cleaner!
    @t

  • DataOptimal
    November 30, 2020

    THis is exactly what i wanted for the past 10 hours lmao thanksss

  • DataOptimal
    November 30, 2020

    I'm looking for a tool that can do more. I don't want to look at huge columns with "true" and "false" and wonder what to do. I wouldn't mind a tool try and fill the blanks, like ask if the 2nd "Lexington" also has the 197 in ST_NUM ( probably not useful in this example but it's at least a hint). Then there are probably patterns in data quality, that a tool could approach systematically.

  • DataOptimal
    November 30, 2020

    t

  • DataOptimal
    November 30, 2020

    great video. please answer me. What do I need to know to build a deep learning library? tell me the courses and books

  • DataOptimal
    November 30, 2020

    👍

  • DataOptimal
    November 30, 2020

    Thanks a lot, it was very helpful. But my question is: How do we recognize different types of NON-standard and UN-expected missing values when the dataframe has, for instance, 1,000,000 rows?

  • DataOptimal
    November 30, 2020

    Nice and easily understandable code.

  • DataOptimal
    November 30, 2020

    great video but do go a bit slow while explaining the code. keep it up👍

  • DataOptimal
    November 30, 2020

    Keep on doing bro, good job😀

  • DataOptimal
    November 30, 2020

    Thanks! Very helpful.

Write a comment