Graphing/visualization – Data Analysis with Python and Pandas p.2
[ad_1]
Doing some basic visualizations with our Pandas dataframe in Python with Matplotlib.
Text-based tutorial: https://pythonprogramming.net/graph-visualization-python3-pandas-data-analysis/
Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex
Source
[ad_2]
How many cups do you have dude ??
Why did with organic goes more quickly?
For me albandy_df.plot() didn't work
I changed to
import matplotlib.pyplot as plt
plt.show()
i am getting this error continously
MemoryError: Unable to allocate 84.5 MiB for an array with shape (11075584, 1) and data type float64
MAN I LOVE YOU, YOU ROXXX
I really dont understand โrollingโ can anyone explain?
Why albany?
at 18:14, graph_df.join(region_df[f'{region_price25ma'}]) is a left join? does it mean that the data on the right with different dates than the left will get dropped? It implies that all region data have the same date? Couldnt one region have more data(like from 2000 – 2019 and other one region is from 1996-2017)? Example, lets say California has no data on 9-15-2020 but Oregon has data on 9-15-2020 then the join will drop data from Oregon because left join. Is pandas default join a left join? Thanks.
Another part, another mug ๐ Panda got too much attention ๐
First of all thanks for the tutorial , it is really perfect for beginners
I have a question the when I follow the code the graphs doesn't show without using matplot is there something I can do to show the graph without using matplot ?
I have keyerror in pandas using albany region
Please tell me how to set it
Can you provide your CSV file??
I just see new york on graph,i did exactly what u did.
Inches? Oh, my goodness? Must have been a North American programmer who wrote pandas.
What does .dropna mean? NaN means "not a number." So, you want to drop NaNs, right? So, .dropnan? So, does the "na" in .dropna mean 'drop "not a"?'
Good tutorial. Easy to follow. Humorous. A little strange with the English, though. "I don't know why . . . RAM is exploding, but I know why. It's because of the date." I had to go back and listen to that sentence a few times to figure out what the instructor meant exactly.
I dropped the null values and I still get a graph with gap, only this time it has smaller gaps on both ends rather than a single huge one in the beginning, why is that?
I'm gettin' some pretty damn hinky dates out of Matplotlib.plot() here!
Somehow, the date for 2015-01 gets beautifully formatted, correctly angled and printed as 3984-01 (and so on up to 3987-05), a difference of 1969 years.
It also spits out this warning:
/home/…/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
base = dates.epoch2num(dt.asi8 / 1.0e9)
So, it seems to have something to do with the way that Matplotlib handles Unix dates. That kinda makes some sense of the 1969 year difference, but I'm not sure how to fix it. On the other hand, I am still on Matplotlib v3.3, so a deprecated Matplotlib.epoch2num() shouldn't cause that sort of error yet.
Weird, huh?
I liked the first part, but this one here leaves me quite puzzled. If I understood correctly, you kind of dropped the "type" column which leaves you with just a subset of the original data. But what if you wanted to have both "organic" and "conventional" avocados in your data set?
I really lost at the graph_df and the following code so someone explain it please.
The rolling graph without sorting the indices gives me a pretty nice curve on google collab? maybe the dataset changed?
As you said, RAM is exploding))). In 19:40, it gives error like:
MemoryError: Unable to allocate 84.5 MiB for an array with shape (11075584, 1) and data type float64
Is there a way to create the albany_df selection from both region and type (ie something like albany_df = (df[df['region'] == 'Albany' and df['type'] == 'organic']) ?
graph_df = region_df[[f"{region}_price25ma"]], I understand the double brackets but I don't understand f"{region}, what is this? Re expression? Sentdex didn't explain it at all in his vid. I tried to check regular expression but I don 't understand what is the purpose of f"{region}. Can someone who knows it explain?
wow, this tutorial is a brand new experience for me to relearn Pandas. Never notice some simple codes could cause RAM explode and I had my first RAM crash.
File "<ipython-input-74-a9eaa4b5bef1>", line 11
graph_df = region_df[[f'{region}_price25ma"]]
^
SyntaxError: EOL while scanning string literal
Does anyone know what is happening?
I typed exactly the same as the video tells
I struggle to follow up…..
Why someone would dislike this tutorial?. I replayed at least twice and re winded a couple of times and end up learning. That's what matters right. Thanks!
guys, the reason that the graph is 'noisy' is because it has 2 dates, one for the type = organic and other for type = conventional. If you filter by type like this you should get a smooth gaph:
import pandas as pd
df = pd.read_csv("C:/Users/windows 7/Downloads/avocado-prices/avocado.csv")
df['Date'] = pd.to_datetime(df["Date"])
albany_df = df[df['region']=='Albany']
albany_df.set_index('Date', inplace = True)
albany_df[albany_df['type']=='organic']["AveragePrice"].plot()
Hello sir,
first 15 min was awesome but after that, you start speaking Alien language. so what should I do to understand it. coz I wanted to follow your tutorials just not able to speak that language.
i get a gap in the chart even after dropna(), why is that happening?
Why did we do if statements? after the for loop?
I got such values under price25ma "AxesSubplot(0.125,0.2;0.775×0.68) ". What's wrong? @sentdex
Hello, @Sentdex… I am a beginner in python and your videos are awesome.. I am trying to run this code but have encountered an error and can't figure the problem.
graph_df.join (region_df['f{region}_price25ma'] is seems to ask for suffix…how can i deal with this ??..i have include the error also… Anyways thankyou for your videos.
def _join_compat(self, other, on=None, how='left', lsuffix='', rsuffix='',
Error: columns overlap but no suffix specified: Index([u'{region}_price25ma'], dtype='object')
this is priceless for my project, love you dude <3
Yeah you are right ram is really exploding and for checking: when I print the graph_df then there are more than 10 million rows.