Python pandas — Chipotle Exercises
[ad_1]
“There should be one—and preferably only one—obvious way to do it,” — Zen of Python. I certainly wish that were the case with pandas. In reading the docs it feels like there are a thousand ways to do each operation. And it is hard to tell if they do the exact same thing or which one you should use. That’s why I made An Opinionated Guide to pandas—to present you one consistent (and a bit opinionated) way of doing data science with pandas and cut out all the confusion and cruft.
In this video I work through the examples—cold! So, this should be entertaining for you.
I’ll talk about which methods I use, why I use them and most importantly tell you the stuff that I’ve never touched in my years of data science practice. If this sounds helpful to you then please watch and provide feedback in your comments.
This series is beginner-friendly but aimed most directly at intermediate users.
“Getting and Knowing – Chipotle” contents:
Exercise
13:31. Step 14. This answer might be incorrect. Summing item_price fails to account for quantity. Perhaps a better solution was: revenue = (chipo.item_price * chipo.quantity).sum()
13:44. Step 15. Quantity of orders. This is an ambiguous question. Perhaps a better solution was count_unique_orders = chipo.order_id.nunique()
13:55. Step 16. Avg revenue/order. The two approaches above change the result, but obviously avg revenue/order = revenue/count_unique_orders.
Helpful links:
An Opinionated Guide to pandas – Intro to Data Structures P1:
https://www.youtube.com/watch?v=HKVsVksViXo&feature=youtu.be
An Opinionated Guide to pandas – Intro to Data Structures P2:
https://www.youtube.com/watch?v=KB-19V-cSs4&feature=youtu.be
An Opinionated Guide to pandas – Intro to Data Structures P3:
https://www.youtube.com/watch?v=Z6RU_MFjevU&feature=youtu.be
Link to GitHub repo including environment setup for tutorials:
https://github.com/knathanieltucker/pandas-tutorial
Link to GitHub Intro To Data Structures Jupyter Notebook:
https://github.com/knathanieltucker/pandas-tutorial/blob/master/notebooks/Pandas%20Intro%20to%20Data%20Structures.ipynb
PEP 20 – The Zen of Python link:
https://www.python.org/dev/peps/pep-0020/
Source
[ad_2]
Step 13b: lambda x: float(x[1:-1]). Why do we have "-1" here? I can't understand this :c
For me the easiest way to complete steps 9 and 10 is to use values.count function:
chipo.item_name.value_counts().nlargest(1)
chipo.choice_description.value_counts().nlargest(1)
To filter the data, after values.count we can use head(1) or nlargest(1)
Shouldn't we use groupby on order_id for question.15?
This may sound a bit naive but how is he opening the repository in his notebook????
Step 15: chipo['order_id'].max()
Step 16: total_ordered = chipo.groupby('order_id').sum()
total_ordered.mean()
Phew! Took me 1 hour to get it all done, learned a lot.
Functions that are used:
df.shape
df.info
df.columns
df.index
df.groupby()
df['a'].sort_values()
df['a].sum()
df['a'] = df['a'].str[1:]
df['a'] = pd.to_numeric(df['a'])
df['a'].nunique()
the url cannot be opened /what should i do?
1. chipo['item_price'] = chipo['item_price'].str.slice(1).astype(float)
2. chipo['item_price'] = chipo['item_price'].apply(lambda x: float(x.strip('$')))
3. chipo['item_price'] = chipo['item_price'].str.strip('$').astype(float)
The first one runs faster but the second one may be more readable.
The third one is sort of a combination of the first and second ones.
The methods idxmax() and max() can be used to obtain the max values in steps 9 and 10:
item_quants['quantity'].idxmax()
item_quants['quantity'].max()
At step 3, chipo = pd.read_table(url) is possible, too.
i got the same answer with one line code at 6:52
df['item_name'].value_counts().head(1)
Hello friends ;
In the "item_name" column, there are some "-".
See index 30 ( can be seen via chipo.loc[30] ) == Chips and Tomatillo-Green Chili Salsa
see index 38 (can be seen via chipo.loc[38] ) == Chips and Tomatillo Green Chili Salsa
Double-checking ==> chipo.loc[30].item_name==chipo.loc[38].item_name
output is False
So before the count, we need to remove "-".
chipo.item_name=chipo.item_name.str.replace("-" , " ")
these exercises are really helpful! Thanks again for " pandas learning by doing " videos.
We will come here to learn and not to a marathon run, so no need to hurry bro go slow go correct. Very nice job keep it up.
For question 9, one can boil it down to one command-
chipo.mode()
Although your method did make me wonder how I would have done it otherwise.
This is pretty sweet
Hi, wondering why none else commented about Step 14; revenue is not just sum of item_prices but sum of item_prices * quantity, which is then $39237.02 🙂
Thank you!
Very helpful, nicely done!
Blind speed-runs of exercises like this are fantastic. Great content, very helpful!
Thanks for doing this but read a one of the questions differently. There are multiple rows for each order I did this but need to add up the quantity returned. chipo.groupby('order_id').count()
why are you so cute! really appreciate your updates especially the one about deep learning. Thank you so much! looking forward to your updates often
a little rusty haha
those python techniques with slicing in the float part is amazing, a truly pythonic way to solve things