Python Pandas Tutorial (Part 9): Cleaning Data – Casting Datatypes and Handling Missing Values




[ad_1]

In this video, we will be learning how to clean our data and cast datatypes.

This video is sponsored by Brilliant. Go to https://brilliant.org/cms to sign up for free. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription.

In this Python Programming video, we will be learning how to clean our data. We will be learning how to handle remove missing values, fill missing values, cast datatypes, and more. This is an essential skill in Pandas because we will frequently need to modify our data to our needs. Let’s get started…

The code for this video can be found at:
http://bit.ly/Pandas-09

StackOverflow Survey Download Page – http://bit.ly/SO-Survey-Download

✅ Support My Channel Through Patreon:
https://www.patreon.com/coreyms

✅ Become a Channel Member:
https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g/join

✅ One-Time Contribution Through PayPal:
https://goo.gl/649HFY

✅ Cryptocurrency Donations:
Bitcoin Wallet – 3MPH8oY2EAgbLVy7RBMinwcBntggi7qeG3
Ethereum Wallet – 0x151649418616068fB46C3598083817101d3bCD33
Litecoin Wallet – MPvEBY5fxGkmPQgocfJbxP6EmTo5UUXMot

✅ Corey’s Public Amazon Wishlist
http://a.co/inIyro1

✅ Equipment I Use and Books I Recommend:
https://www.amazon.com/shop/coreyschafer

▶️ You Can Find Me On:
My Website – http://coreyms.com/
My Second Channel – https://www.youtube.com/c/coreymschafer
Facebook – https://www.facebook.com/CoreyMSchafer
Twitter – https://twitter.com/CoreyMSchafer
Instagram – https://www.instagram.com/coreymschafer/

#Python #Pandas

Source


[ad_2]

Comment List

  • Corey Schafer
    November 18, 2020

    Hey everyone. Hope you all had a great weekend! I will be traveling to Vancouver this week to visit a Quantum Computing company and learn more about the work they're doing, so I'm not sure when the next Pandas video will be ready for release. I will be working on it while I'm there, but I likely won't have it recorded and released until midway through next week. Let me know if anyone has any questions they would like me to ask them about Quantum Computing!

  • Corey Schafer
    November 18, 2020

    great job brow

  • Corey Schafer
    November 18, 2020

    good job 🙂

  • Corey Schafer
    November 18, 2020

    Dear Corey,

    I have a dataframe with heading code which has both numeric as well as alphanumeric codes.

    While doing a merge it merges with the int codes but doesn't do it with the str codes (when converted to excel, I found some trailing spaces in the alphanum codes)

    How to resolve this so that merge works on both the numeric as well as alphanumeric codes?

  • Corey Schafer
    November 18, 2020

    Hi Corey,
    Thanks for the awesome series. While I have not yet finished the series, I would like to know, how we can deal with duplicates.
    If you have a column let's say with duplicate apps and the apps have reviews, size, installations and you want to let's say get a mean for the reviews, take the first size and sum of the installations and merge the rest of the columns for those apps as they were, like Ratings. How would one do that?

  • Corey Schafer
    November 18, 2020

    Thank you

  • Corey Schafer
    November 18, 2020

    Your content is awsome….!
    How do replace nan values with other values only in a particular column?
    Please Help
    Thank You

  • Corey Schafer
    November 18, 2020

    The series is very helpful to me. Thank you sir.

  • Corey Schafer
    November 18, 2020

    Thanks again for such a helpful video.

  • Corey Schafer
    November 18, 2020

    Hey , truly glad for your all series . If possible , please do make a course video on Pyspark .

  • Corey Schafer
    November 18, 2020

    Thank you Corey for this. My parents urged me to join your community. They are saying you are doing wonderful job. Thank you Corey for enabling us

  • Corey Schafer
    November 18, 2020

    This also work if we want to drop a column if 0 and 1 index have NaN
    df.dropna(axis='columns', how='any', subset=[0, 1])

  • Corey Schafer
    November 18, 2020

    If teaching is an art, you’d be a Corey Schafer… Oh wait…

  • Corey Schafer
    November 18, 2020

    Why is NaN able to be converted to a float?

  • Corey Schafer
    November 18, 2020

    This is pandas made easy

  • Corey Schafer
    November 18, 2020

    Your videos have an epic like/dislike ratio

  • Corey Schafer
    November 18, 2020

    In 27:28 of the video. For a one liner code. df['YearsCode'].replace(['Less than 1 year','More than 50 years'],[0,51]), inplace=True). Correct me if I'm wrong I'm new to Python. But great video again Corey! Hats off!

  • Corey Schafer
    November 18, 2020

    @Corey Schafer – Please do a video series on PySpark.

  • Corey Schafer
    November 18, 2020

    Hi Corey, and thank you for these great videos 🙂
    I got a question about that data types part. In the last video, we were able to calculate the median of salaries (after grouping the data). I checked that and saw the 'ConvertedComp' column's type is float, but a column like 'YearsCode' is an object. Is it because of those two strings in this column? Or is it something else that determines the type of these numerical columns?

  • Corey Schafer
    November 18, 2020

    11:36 we can use df.replace(['NA', 'Missing'], np.nan, inplace=True) instead

  • Corey Schafer
    November 18, 2020

    is brilliant give certificates after completion of course?

Write a comment