Python pandas — Chipotle Exercises




[ad_1]

“There should be one—and preferably only one—obvious way to do it,” — Zen of Python. I certainly wish that were the case with pandas. In reading the docs it feels like there are a thousand ways to do each operation. And it is hard to tell if they do the exact same thing or which one you should use. That’s why I made An Opinionated Guide to pandas—to present you one consistent (and a bit opinionated) way of doing data science with pandas and cut out all the confusion and cruft.

In this video I work through the examples—cold! So, this should be entertaining for you. 

I’ll talk about which methods I use, why I use them and most importantly tell you the stuff that I’ve never touched in my years of data science practice. If this sounds helpful to you then please watch and provide feedback in your comments.

This series is beginner-friendly but aimed most directly at intermediate users.

“Getting and Knowing – Chipotle” contents:
Exercise

13:31. Step 14. This answer might be incorrect. Summing item_price fails to account for quantity. Perhaps a better solution was: revenue = (chipo.item_price * chipo.quantity).sum()
13:44. Step 15. Quantity of orders. This is an ambiguous question. Perhaps a better solution was count_unique_orders = chipo.order_id.nunique()
13:55. Step 16. Avg revenue/order. The two approaches above change the result, but obviously avg revenue/order = revenue/count_unique_orders.

Helpful links:
An Opinionated Guide to pandas – Intro to Data Structures P1:
https://www.yo​utube.com/watch?v=HKVsVksViXo&feature=youtu.be
An Opinionated Guide to pandas – Intro to Data Structures P2:
https://www.youtube.com/watch?v=KB-19V-cSs4&feature=youtu.be
An Opinionated Guide to pandas – Intro to Data Structures P3:
https://www.youtube.com/watch?v=Z6RU_MFjevU&feature=youtu.be
Link to GitHub repo including environment setup for tutorials:
https://github.com/knathanieltucker/pandas-tutorial
Link to GitHub Intro To Data Structures Jupyter Notebook:
https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Pandas%20Intro%20to%20Data%20Structures.ipynb
PEP 20 – The Zen of Python link:
https://www.python.org/dev/peps/pep-0020/

Source


[ad_2]

Comment List

  • Data Talks
    December 8, 2020

    Step 13b: lambda x: float(x[1:-1]). Why do we have "-1" here? I can't understand this :c

  • Data Talks
    December 8, 2020

    For me the easiest way to complete steps 9 and 10 is to use values.count function:
    chipo.item_name.value_counts().nlargest(1)
    chipo.choice_description.value_counts().nlargest(1)

    To filter the data, after values.count we can use head(1) or nlargest(1)

  • Data Talks
    December 8, 2020

    Shouldn't we use groupby on order_id for question.15?

  • Data Talks
    December 8, 2020

    This may sound a bit naive but how is he opening the repository in his notebook????

  • Data Talks
    December 8, 2020

    Step 15: chipo['order_id'].max()
    Step 16: total_ordered = chipo.groupby('order_id').sum()

    total_ordered.mean()

  • Data Talks
    December 8, 2020

    Phew! Took me 1 hour to get it all done, learned a lot.
    Functions that are used:
    df.shape
    df.info
    df.columns
    df.index
    df.groupby()
    df['a'].sort_values()
    df['a].sum()
    df['a'] = df['a'].str[1:]
    df['a'] = pd.to_numeric(df['a'])
    df['a'].nunique()

  • Data Talks
    December 8, 2020

    the url cannot be opened /what should i do?

  • Data Talks
    December 8, 2020

    1. chipo['item_price'] = chipo['item_price'].str.slice(1).astype(float)
    2. chipo['item_price'] = chipo['item_price'].apply(lambda x: float(x.strip('$')))
    3. chipo['item_price'] = chipo['item_price'].str.strip('$').astype(float)

    The first one runs faster but the second one may be more readable.
    The third one is sort of a combination of the first and second ones.

  • Data Talks
    December 8, 2020

    The methods idxmax() and max() can be used to obtain the max values in steps 9 and 10:

    item_quants['quantity'].idxmax()
    item_quants['quantity'].max()

  • Data Talks
    December 8, 2020

    At step 3, chipo = pd.read_table(url) is possible, too.

  • Data Talks
    December 8, 2020

    i got the same answer with one line code at 6:52
    df['item_name'].value_counts().head(1)

  • Data Talks
    December 8, 2020

    Hello friends ;

    In the "item_name" column, there are some "-".

    See index 30 ( can be seen via chipo.loc[30] ) == Chips and Tomatillo-Green Chili Salsa

    see index 38 (can be seen via chipo.loc[38] ) == Chips and Tomatillo Green Chili Salsa

    Double-checking ==> chipo.loc[30].item_name==chipo.loc[38].item_name
    output is False
    So before the count, we need to remove "-".

    chipo.item_name=chipo.item_name.str.replace("-" , " ")

    these exercises are really helpful! Thanks again for " pandas learning by doing " videos.

  • Data Talks
    December 8, 2020

    We will come here to learn and not to a marathon run, so no need to hurry bro go slow go correct. Very nice job keep it up.

  • Data Talks
    December 8, 2020

    For question 9, one can boil it down to one command-
    chipo.mode()
    Although your method did make me wonder how I would have done it otherwise.

  • Data Talks
    December 8, 2020

    This is pretty sweet

  • Data Talks
    December 8, 2020

    Hi, wondering why none else commented about Step 14; revenue is not just sum of item_prices but sum of item_prices * quantity, which is then $39237.02 🙂

  • Data Talks
    December 8, 2020

    Thank you!

  • Data Talks
    December 8, 2020

    Very helpful, nicely done!

  • Data Talks
    December 8, 2020

    Blind speed-runs of exercises like this are fantastic. Great content, very helpful!

  • Data Talks
    December 8, 2020

    Thanks for doing this but read a one of the questions differently. There are multiple rows for each order I did this but need to add up the quantity returned. chipo.groupby('order_id').count()

  • Data Talks
    December 8, 2020

    why are you so cute! really appreciate your updates especially the one about deep learning. Thank you so much! looking forward to your updates often

  • Data Talks
    December 8, 2020

    a little rusty haha

    those python techniques with slicing in the float part is amazing, a truly pythonic way to solve things

Write a comment