Combining multiple datasets – Data Analysis with Python and Pandas p.5




[ad_1]

How to go about working with multiple datasets in Python and pandas for data analysis.

Text-based tutorial: https://pythonprogramming.net/combining-datasets-python3-pandas-data-analysis/

Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex

Source


[ad_2]

Comment List

  • sentdex
    December 29, 2020

    Thank you!

  • sentdex
    December 29, 2020

    @Sentdex, I've been a fan of your video series ever since I found them last year. Thanks for sharing your knowledge about Python, ML, AI, etc. At ~15:50 in this video, you start filtering your unemployment DF for February 2015. It appears that, rather than being arbitrary, you inadvertently thought that Feb-2015 is the latest month and year of unemployment rate data. If so, you're mistaken. If you further explore the unemployment data, you'll find that unemployment rates are available all the way through December 2016. Given the presidential election politics in the US (and in any country for that matter), the unemployment rates and resulting public sentiment become most relevant/influential just before such important elections. Therefore, a better example would have explored the correlation/covariance between the county-wise unemployment rates and the voting data as of October/November 2016. Just to be sure, I do understand that your intent was just to show how one could find some correlation between seemingly disparate data. Of course, the actual conclusions/interpretations from such analysis are subject to personal opinions. Admittedly, in the grand scheme of things, it's the data frames – not the specific time frames – that matter. Cheers!

  • sentdex
    December 29, 2020

    IMPORT

  • sentdex
    December 29, 2020

    I love how he changes mug over video

  • sentdex
    December 29, 2020

    Your tutorials are really fantastic!!
    and you have a wonderful mug collection

  • sentdex
    December 29, 2020

    cool coffee cup… i am smiling watching this coz it is gonna save my day!

  • sentdex
    December 29, 2020

    9:09 UsageError: Can't use statement directly after '%%time'!

  • sentdex
    December 29, 2020

    Would have made sense to replace the Rate with the average rate, grouped by State, then keep the unique values for State and then do the mapping of the new column? to speed up things

  • sentdex
    December 29, 2020

    I get a: "KeyError: 'County'", in the for loop.

    Someone knows how to fix it?

  • sentdex
    December 29, 2020

    if anyone recently had trouble importing the second dataset, try encoding the file:
    df = pd.read_csv("us-minimum-wage-by-state-from-1968-to-2017Minimum Wage Data.csv", encoding= 'unicode_escape')

  • sentdex
    December 29, 2020

    You did a great job Harrison I am having some trouble following along though I have gone through your basics a few times and have learnt them from other places too, any tips?

  • sentdex
    December 29, 2020

    FileNotFoundError: File b'datasets/state_abbv.csv' does not exist

    I am getting this error. Please can anyone help me out.

  • sentdex
    December 29, 2020

    I love your sense of humor. You really love what you are doing and really appreciate your efforts mate! Thanks

  • sentdex
    December 29, 2020

    For anyone else that is wondering about the %%time method @ 8:04 here is the link to python docs:
    https://docs.python.org/3.7/library/timeit.html?highlight=time#module-timeit

  • sentdex
    December 29, 2020

    https://youtu.be/QGeQqGd6LPc?t=1641 in this part, i hava a problem, when a run this part of the code, pres16 turn into 0, and lost pres16. How can i fix that?

  • sentdex
    December 29, 2020

    what: inplace = true stands for ?

  • sentdex
    December 29, 2020

    the donut cup has always been one of my favorites

  • sentdex
    December 29, 2020

    IMPORT!!!

  • sentdex
    December 29, 2020

    At what point of time is that state_abbv df created and saved. I can't find it anywhere. Also from where is it saved? Niether i am able to create such df using the given data nor i am able to find it anywhere.

  • sentdex
    December 29, 2020

    at 20:30 you don't get screwed if you forget the double bracket [[]] even if you get a series, you will be okay and I don't know why…

  • sentdex
    December 29, 2020

    9:55 "unemp_county['min_wage']= list(map(……….) HOW THE HELL DID U DO SO FAST?????? I've have to wait minutes!!!!

  • sentdex
    December 29, 2020

    Fantastic tutorial. Thank you.

  • sentdex
    December 29, 2020

    The reason Mississippi has a bunch of NaNs is because the min_wage data does not have Mississippi as an index, they left that state out for some reason. min_wage.columns.unique() you will find Mississippi does not exist.

Write a comment