Group By – Pandas




[ad_1]

“There should be one—and preferably only one—obvious way to do it,” — Zen of Python. I certainly wish that were the case with pandas. In reading the docs it feels like there are a thousand ways to do each operation. And it is hard to tell if they do the exact same thing or which one you should use. That’s why I made An Opinionated Guide to pandas—to present you one consistent (and a bit opinionated) way of doing data science with pandas and cut out all the confusion and cruft.

I’ll talk about which methods I use, why I use them and most importantly tell you the stuff that I’ve never touched in my years of data science practice. If this sounds helpful to you then please watch and provide feedback in your comments.

This series is beginner-friendly but aimed most directly at intermediate users.

“Opinionated Guide – Group Operations” contents:
https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Group%20Operations.ipynb

Helpful links:
pandas.DataFrame.groupby(): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

An Opinionated Guide to pandas — Intro and Environment Setup: https://youtu.be/S0RPvghGmlQ

An Opinionate Guide to pandas – Intro to Data Structures: Series: https://youtu.be/HKVsVksViXo

An Opinionate Guide to pandas – Intro to Data Structures: DataFrames: https://youtu.be/HKVsVksViXo

An Opinionate Guide to pandas – Intro to Data Structures P3: https://youtu.be/Z6RU_MFjevU

An Opinionated Guide to pandas — Indexing and Selecting: https://youtu.be/Pau9An-fQZk

Link to GitHub repo including environment setup for tutorials: https://github.com/knathanieltucker/pandas-tutorial

Link to GitHub Intro To Data Structures Jupyter Notebook: https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Pandas%20Intro%20to%20Data%20Structures.ipynb

PEP 20 – The Zen of Python link: https://www.python.org/dev/peps/pep-0020/

Source


[ad_2]

Comment List

  • Data Talks
    December 14, 2020

    Dear students, There will be a class( Hands-on-online practical Class ) on Python Program ( sem-3 syllabus) From 5:00 pm onwards. Check your Python software updates and revise previous parts for better understanding. Youtube live links are provided here.. https://youtu.be/SkDlpTirxmM

    https://youtu.be/09SFWPrKkNA

    https://youtu.be/3aNKOx1_KuY

    https://youtu.be/mWUQrvTXN4o

    After each Youtube live session, You will get 10-15 mins to practice in Your respective laptops/desktops and clear your doubts with me. Also , Submit scan pics/images of Print out of all Programs via email. Thanking You Victoria Mam

  • Data Talks
    December 14, 2020

    Dear students, There will be a class( Hands-on-online practical Class ) on Python Program ( sem-3 syllabus) From 5:00 pm onwards. Check your Python software updates and revise previous parts for better understanding. Youtube live links are provided here.. https://youtu.be/SkDlpTirxmM

    https://youtu.be/09SFWPrKkNA

    https://youtu.be/3aNKOx1_KuY

    https://youtu.be/mWUQrvTXN4o

    After each Youtube live session, You will get 10-15 mins to practice in Your respective laptops/desktops and clear your doubts with me. Also , Submit scan pics/images of Print out of all Programs via email. Thanking You Victoria Mam

  • Data Talks
    December 14, 2020

    Muchísimas gracias, es clarísimo como explica. Voy a buscar el tutorial para trabajar las columnas de índices múltiples

  • Data Talks
    December 14, 2020

    Brilliant content dude. Thank you.

  • Data Talks
    December 14, 2020

    Lecture notes – Group By
    1. Groupby – specify the column(s) -> df_gb = df.groupby(['aa', 'bb'])
    2. Agg: df_gb.agg({dictionary}) -> weird structures… to fix this
    2-1. Multi index(rows): use df_agg.reset_index()
    2-2. Multi index(columns):
    df_agg.columns = ['__'.join(col).strip() for col in df_agg.columns.values]
    3. filter, transform

  • Data Talks
    December 14, 2020

    Great video. I am an experienced sql / etl engineer working on my (weak) pandas game. I ran into this exact multi index column stuff yesterday and it made my head hurt for a few minutes… funny I would stumble on this video today.

  • Data Talks
    December 14, 2020

    I use .apply() and I find it very slow.

Write a comment