How do I handle missing values in pandas?




[ad_1]

Most datasets contain “missing values”, meaning that the data is incomplete. Deciding how to handle missing values can be challenging! In this video, I’ll cover all of the basics: how missing values are represented in pandas, how to locate them, and options for how to drop them or fill them in.

SUBSCRIBE to learn data science with Python:
https://www.youtube.com/dataschool?sub_confirmation=1

JOIN the “Data School Insiders” community and receive exclusive rewards:
https://www.patreon.com/dataschool

== RESOURCES ==
GitHub repository for the series: https://github.com/justmarkham/pandas-videos
“read_csv” documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
“isnull” documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isnull.html
“notnull” documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.notnull.html
“dropna” documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html
“value_counts” documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html
“fillna” documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html
Working with missing data: http://pandas.pydata.org/pandas-docs/stable/missing_data.html

== LET’S CONNECT! ==
Newsletter: https://www.dataschool.io/subscribe/
Twitter: https://twitter.com/justmarkham
Facebook: https://www.facebook.com/DataScienceSchool/
LinkedIn: https://www.linkedin.com/in/justmarkham/

Source


[ad_2]

Comment List

  • Data School
    November 25, 2020

    In pandas version 0.21 (released October 2017), they added 'isna' and 'notna' as aliases for 'isnull' and 'notnull'. Learn more in my latest video, "5 new changes in pandas you need to know about": https://www.youtube.com/watch?v=te5JrSCW-LY&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=33

  • Data School
    November 25, 2020

    Great videos covering the basics. I enjoy how the additional values within the functions are covered, i.e. axis, etc.

  • Data School
    November 25, 2020

    Truly amazing videos. Can you do a series on Matplotlib and Seaborn

  • Data School
    November 25, 2020

    What about displaying the rows where columns 'A' and 'B' both of them have any missing values?

  • Data School
    November 25, 2020

    Great video btw. Just a quick question. I am trying to build a benchmark, would it be okay to make the data standardized before creating it or?

  • Data School
    November 25, 2020

    Why you take various??

  • Data School
    November 25, 2020

    You told how to handle NaN values but if there are some other values such as "Not Provided" then what to do?
    How to ignore them?

  • Data School
    November 25, 2020

    I never leave this place unsatisfied or without answers, total treasure.

  • Data School
    November 25, 2020

    I just would like to thank you man.

  • Data School
    November 25, 2020

    Lots of thanks from NEPAL✌✌✌

  • Data School
    November 25, 2020

    Thanks a lot for your video!

  • Data School
    November 25, 2020

    great explanation thanks

  • Data School
    November 25, 2020

    Superb video! Thanks a lot it helps alot !

  • Data School
    November 25, 2020

    This is an excellent explanation! I have a question regarding fillna. In my dataFrame, I want to fill all 'nan' to an empty value. Please see my situation below and help me.

    excel_file_path = theData.xlsx'
    df = pd.read_excel(excel_file_path)
    df = df.astype(str)
    for (column) in df.columns:
    df[column] = df[column].str.replace(r'[^._!a-zA-Z0-9s-]', '', flags=re.I, regex=True)
    df[column].fillna(value='', inplace=True)

    print(df)

    Can you please tell me why fillna is not working here?

  • Data School
    November 25, 2020

    Great explanation. This was a huge help. Thanks so much!

  • Data School
    November 25, 2020

    Fantastic

  • Data School
    November 25, 2020

    I am really loving your videos. Explored your channel just 2 days back!! Earlier I had no idea about pandas but after watching your video, I feel that I will be able to work on my assignment. Great Work! Thank you!

  • Data School
    November 25, 2020

    Sir pls make a series on NUMPY pls pls. Earnest request

  • Data School
    November 25, 2020

    Do you handle missing data before splitting the data set (training set and test set) ?

  • Data School
    November 25, 2020

    Hi Teacher,

    I just bumped into your youtube video, and in my opinion you are simply the best. Please I do have a question for, and I will be most greatful if I can get a response from you.

    Pls how do I handle rows with errors in a dataframe, using pandas? I have this bloodpressure dataframe in which some rows have word "ERROR" in it, how do I take care of this errors pls. As a matter of here is the error message that keeps coming back:
    ValueError: invalid literal for int() with base 10: 'ERROR'.

    Pls I need your help, and here is my email ad obinnac562@gmail.com, in case you may want to email me the code, thank you.

  • Data School
    November 25, 2020

    Excellent video Data School, very helpful, your explanations are clear and objective. Thank you !

  • Data School
    November 25, 2020

    Amazing video, thanks!

  • Data School
    November 25, 2020

    i rarely leave youtube comment but thank you!! if it werent for your video i wouldn't understand how to do my assignment at all, you did a great job at explaining!

  • Data School
    November 25, 2020

    what happens if you have missing values while training the model, e.g. xgboost?

  • Data School
    November 25, 2020

    Love from india😀

Write a comment