Graphing/visualization – Data Analysis with Python and Pandas p.2




[ad_1]

Doing some basic visualizations with our Pandas dataframe in Python with Matplotlib.

Text-based tutorial: https://pythonprogramming.net/graph-visualization-python3-pandas-data-analysis/

Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex

Source


[ad_2]

Comment List

  • sentdex
    November 25, 2020

    How many cups do you have dude ??

  • sentdex
    November 25, 2020

    Why did with organic goes more quickly?

  • sentdex
    November 25, 2020

    For me albandy_df.plot() didn't work

    I changed to
    import matplotlib.pyplot as plt

    plt.show()

  • sentdex
    November 25, 2020

    i am getting this error continously
    MemoryError: Unable to allocate 84.5 MiB for an array with shape (11075584, 1) and data type float64

  • sentdex
    November 25, 2020

    MAN I LOVE YOU, YOU ROXXX

  • sentdex
    November 25, 2020

    I really dont understand “rolling” can anyone explain?

  • sentdex
    November 25, 2020

    Why albany?

  • sentdex
    November 25, 2020

    at 18:14, graph_df.join(region_df[f'{region_price25ma'}]) is a left join? does it mean that the data on the right with different dates than the left will get dropped? It implies that all region data have the same date? Couldnt one region have more data(like from 2000 – 2019 and other one region is from 1996-2017)? Example, lets say California has no data on 9-15-2020 but Oregon has data on 9-15-2020 then the join will drop data from Oregon because left join. Is pandas default join a left join? Thanks.

  • sentdex
    November 25, 2020

    Another part, another mug 🙂 Panda got too much attention 😀

  • sentdex
    November 25, 2020

    First of all thanks for the tutorial , it is really perfect for beginners
    I have a question the when I follow the code the graphs doesn't show without using matplot is there something I can do to show the graph without using matplot ?

  • sentdex
    November 25, 2020

    I have keyerror in pandas using albany region
    Please tell me how to set it

  • sentdex
    November 25, 2020

    Can you provide your CSV file??

  • sentdex
    November 25, 2020

    I just see new york on graph,i did exactly what u did.

  • sentdex
    November 25, 2020

    Inches? Oh, my goodness? Must have been a North American programmer who wrote pandas.

  • sentdex
    November 25, 2020

    What does .dropna mean? NaN means "not a number." So, you want to drop NaNs, right? So, .dropnan? So, does the "na" in .dropna mean 'drop "not a"?'

  • sentdex
    November 25, 2020

    Good tutorial. Easy to follow. Humorous. A little strange with the English, though. "I don't know why . . . RAM is exploding, but I know why. It's because of the date." I had to go back and listen to that sentence a few times to figure out what the instructor meant exactly.

  • sentdex
    November 25, 2020

    I dropped the null values and I still get a graph with gap, only this time it has smaller gaps on both ends rather than a single huge one in the beginning, why is that?

  • sentdex
    November 25, 2020

    I'm gettin' some pretty damn hinky dates out of Matplotlib.plot() here!

    Somehow, the date for 2015-01 gets beautifully formatted, correctly angled and printed as 3984-01 (and so on up to 3987-05), a difference of 1969 years.

    It also spits out this warning:
    /home/…/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
    The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
    base = dates.epoch2num(dt.asi8 / 1.0e9)

    So, it seems to have something to do with the way that Matplotlib handles Unix dates. That kinda makes some sense of the 1969 year difference, but I'm not sure how to fix it. On the other hand, I am still on Matplotlib v3.3, so a deprecated Matplotlib.epoch2num() shouldn't cause that sort of error yet.

    Weird, huh?

  • sentdex
    November 25, 2020

    I liked the first part, but this one here leaves me quite puzzled. If I understood correctly, you kind of dropped the "type" column which leaves you with just a subset of the original data. But what if you wanted to have both "organic" and "conventional" avocados in your data set?

  • sentdex
    November 25, 2020

    I really lost at the graph_df and the following code so someone explain it please.

  • sentdex
    November 25, 2020

    The rolling graph without sorting the indices gives me a pretty nice curve on google collab? maybe the dataset changed?

  • sentdex
    November 25, 2020

    As you said, RAM is exploding))). In 19:40, it gives error like:
    MemoryError: Unable to allocate 84.5 MiB for an array with shape (11075584, 1) and data type float64

  • sentdex
    November 25, 2020

    Is there a way to create the albany_df selection from both region and type (ie something like albany_df = (df[df['region'] == 'Albany' and df['type'] == 'organic']) ?

  • sentdex
    November 25, 2020

    graph_df = region_df[[f"{region}_price25ma"]], I understand the double brackets but I don't understand f"{region}, what is this? Re expression? Sentdex didn't explain it at all in his vid. I tried to check regular expression but I don 't understand what is the purpose of f"{region}. Can someone who knows it explain?

  • sentdex
    November 25, 2020

    wow, this tutorial is a brand new experience for me to relearn Pandas. Never notice some simple codes could cause RAM explode and I had my first RAM crash.

  • sentdex
    November 25, 2020

    File "<ipython-input-74-a9eaa4b5bef1>", line 11

    graph_df = region_df[[f'{region}_price25ma"]]

    ^

    SyntaxError: EOL while scanning string literal

    Does anyone know what is happening?
    I typed exactly the same as the video tells

  • sentdex
    November 25, 2020

    I struggle to follow up…..

  • sentdex
    November 25, 2020

    Why someone would dislike this tutorial?. I replayed at least twice and re winded a couple of times and end up learning. That's what matters right. Thanks!

  • sentdex
    November 25, 2020

    guys, the reason that the graph is 'noisy' is because it has 2 dates, one for the type = organic and other for type = conventional. If you filter by type like this you should get a smooth gaph:

    import pandas as pd

    df = pd.read_csv("C:/Users/windows 7/Downloads/avocado-prices/avocado.csv")

    df['Date'] = pd.to_datetime(df["Date"])

    albany_df = df[df['region']=='Albany']

    albany_df.set_index('Date', inplace = True)

    albany_df[albany_df['type']=='organic']["AveragePrice"].plot()

  • sentdex
    November 25, 2020

    Hello sir,
    first 15 min was awesome but after that, you start speaking Alien language. so what should I do to understand it. coz I wanted to follow your tutorials just not able to speak that language.

  • sentdex
    November 25, 2020

    i get a gap in the chart even after dropna(), why is that happening?

  • sentdex
    November 25, 2020

    Why did we do if statements? after the for loop?

  • sentdex
    November 25, 2020

    I got such values under price25ma "AxesSubplot(0.125,0.2;0.775×0.68) ". What's wrong? @sentdex

  • sentdex
    November 25, 2020

    Hello, @Sentdex… I am a beginner in python and your videos are awesome.. I am trying to run this code but have encountered an error and can't figure the problem.

    graph_df.join (region_df['f{region}_price25ma'] is seems to ask for suffix…how can i deal with this ??..i have include the error also… Anyways thankyou for your videos.
    def _join_compat(self, other, on=None, how='left', lsuffix='', rsuffix='',

    Error: columns overlap but no suffix specified: Index([u'{region}_price25ma'], dtype='object')

  • sentdex
    November 25, 2020

    this is priceless for my project, love you dude <3

  • sentdex
    November 25, 2020

    Yeah you are right ram is really exploding and for checking: when I print the graph_df then there are more than 10 million rows.

Write a comment