Python Pandas Tutorial (Part 8): Grouping and Aggregating – Analyzing and Exploring Your Data




[ad_1]

In this video, we will be learning how to group and aggregate our data.

This video is sponsored by Brilliant. Go to https://brilliant.org/cms to sign up for free. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription.

In this Python Programming video, we will be learning how to group and aggregate our data. This will allow us to explore our data in ways we have not yet done in this series. We will be able to answer questions such as: “What is the most popular social media site for each country?” We will be using the groupby method, and also some aggregate functions such as mean, median, value_counts, etc. Let’s get started…

Video Timestamps:
Aggregate Column – 2:00
Aggregate DataFrame – 3:55
Value Counts – 7:51
Grouping – 12:30
Multiple Aggregates on Group – 26:00
People Who Know Python By Country – 27:20
Practice Question – 34:20
Concat Series – 37:27

The code for this video can be found at:
http://bit.ly/Pandas-08

StackOverflow Survey Download Page – http://bit.ly/SO-Survey-Download

✅ Support My Channel Through Patreon:
https://www.patreon.com/coreyms

✅ Become a Channel Member:
https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g/join

✅ One-Time Contribution Through PayPal:
https://goo.gl/649HFY

✅ Cryptocurrency Donations:
Bitcoin Wallet – 3MPH8oY2EAgbLVy7RBMinwcBntggi7qeG3
Ethereum Wallet – 0x151649418616068fB46C3598083817101d3bCD33
Litecoin Wallet – MPvEBY5fxGkmPQgocfJbxP6EmTo5UUXMot

✅ Corey’s Public Amazon Wishlist
http://a.co/inIyro1

✅ Equipment I Use and Books I Recommend:
https://www.amazon.com/shop/coreyschafer

▶️ You Can Find Me On:
My Website – http://coreyms.com/
My Second Channel – https://www.youtube.com/c/coreymschafer
Facebook – https://www.facebook.com/CoreyMSchafer
Twitter – https://twitter.com/CoreyMSchafer
Instagram – https://www.instagram.com/coreymschafer/

#Python #Pandas

Source


[ad_2]

Comment List

  • Corey Schafer
    November 12, 2020

    I hope everyone had a great week! We've got a long video this week, but we go over a lot of important topics about how to analyze data in Pandas. We will learn how to answer very interesting questions such as "What is the most popular social media site by country?". I put timestamps together for this video so that you all can skip around if you need to go back and watch a specific section. Here are those timestamps:
    Aggregate Column – 2:00
    Aggregate DataFrame – 3:55
    Value Counts – 7:51
    Grouping – 12:30
    Multiple Aggregates on Group – 26:00
    People Who Know Python By Country – 27:20
    Practice Question – 34:20
    Concat Series – 37:27

    Have a great weekend everybody!

  • Corey Schafer
    November 12, 2020

    There are no ads interrupting in the middle. Your presentation is very good.
    Thanks corey

  • Corey Schafer
    November 12, 2020

    My solution :
    knows_python = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())
    knows_general = country_grp['LanguageWorkedWith'].apply(len)

    percent_knows_Py = knows_python/knows_general * 100
    percent_knows_Py.sort_values(ascending=False)

  • Corey Schafer
    November 12, 2020

    47 minutes of a pure pandas tutorial from a god in python, man you're a hero🔥🔥

  • Corey Schafer
    November 12, 2020

    Love you video! best one out there thank you so much!

  • Corey Schafer
    November 12, 2020

    I solved the coding exercise by doing:
    country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum()/x.count())
    The difference with this solution compared to Corey's one is that Corey's takes the NaNs in when computing the ratio, whereas this one does not. Depends on the purpose I guess!

  • Corey Schafer
    November 12, 2020

    Great description.

  • Corey Schafer
    November 12, 2020

    I truely appreciate your hardwork and knowledge and you effort to make things easier for learns.. cheers Corey

  • Corey Schafer
    November 12, 2020

    Yes please do make a video on multiple indexes in a series

  • Corey Schafer
    November 12, 2020

    I love pandas I even have a animal group

    I am a panda girl I’ll tell you more members of my group

    Fox girl

    Bunny girl

    Lion girl

    Cat girl

    Tiger boy

    Gorilla boy

    Dog boy

    Uni girl

    Dragon girl

    Golden griffin girl

    Think that’s all now

  • Corey Schafer
    November 12, 2020

    Where can we find the csv file? Thank you.

  • Corey Schafer
    November 12, 2020

    Hey Corey, Thanks a lot for these tutorials. It would be great if you can prepare some tutorials on Numpy, Matplotlib as well. Thanks in Advance!

  • Corey Schafer
    November 12, 2020

    Great videos but unfortunately, it's so hard to follow as the data source is not provided. It would be nice if you can provide the data source that you used so we can follow along.

  • Corey Schafer
    November 12, 2020

    So you can also put *100 after the .value_counts(normalize=True) to get more readable %s

  • Corey Schafer
    November 12, 2020

    Where to get this data set ?

  • Corey Schafer
    November 12, 2020

    I realize that you don't know much on China, my Hero Corey. hahaha

  • Corey Schafer
    November 12, 2020

    Is it just me or is he drawing out his explanations a lot? Like he could explain this in half the time probably

  • Corey Schafer
    November 12, 2020

    Lots of love from India sir Thank you for the great work.Thank you for making easy python very easy

  • Corey Schafer
    November 12, 2020

    Thank you Corey for this wonderful tutorial ! I had a query. I tried to use count() method instead of sum() considering it will give me same result. But it does not. I thought, if Series.str.contains('Python') will return True, in that case, count() will keep adding such cases but it does not. Then what it counts ?
    country_uses_python_sum = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())
    country_uses_python_count = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').count())

  • Corey Schafer
    November 12, 2020

    @Corey Schafer thanks man! you exactly told me what i needed to do and just like that was done with something which i was crying about for past two days !!!

  • Corey Schafer
    November 12, 2020

    Yes Corey, having a future video on multi-index will be very helpful!

  • Corey Schafer
    November 12, 2020

    How to find nth highest salary each geoup wise
    Below example data frame

    Employee={'EMPNO':(111,112,114,115,223,226,228,300,333,345,356,320),'Salary':(4000,6000,2000,8000,2000,1000,3000,500,700,300,200,700),'EMPCODE':('MGF','MGR','MGR','MGR','CLERK','CLERK','CLERK','PEON','PEON','PEON','PEON','PEON')}

    Employee

    emp_df=pd.DataFrame(Employee)

    emp_df

  • Corey Schafer
    November 12, 2020

    Hey Corey Schafer Thanks a lot for this amazing series which helped me to upgrade my skills which I was unaware of that.

  • Corey Schafer
    November 12, 2020

    note : when applying groupby passing the column name to the fctn didn't work for me
    you have to pass a dataFrame you want apply an aggregation fctn on to the groupby() fctn like this
    county_group=df,groupby(df['Country'])

  • Corey Schafer
    November 12, 2020

    happy to see Algeria in the stackoverflow's coutry list in the survey_results datasset LOL . btw thanks a lit man .your vids are very helpful

Write a comment