Solving real world data science tasks with Python Pandas!


[ad_1]

In this video we use Python Pandas & Python Matplotlib to research and reply enterprise questions on 12 months value of gross sales data. The data accommodates a whole bunch of hundreds of electronics retailer purchases damaged down by month, product sort, price, buy tackle, and so on.

⭐ Kite is a free AI-powered coding assistant that can assist you code sooner and smarter. The Kite plugin integrates with all the highest editors and IDEs to present you sensible completions and documentation when you’re typing. I’ve been utilizing Kite for six months and I like it! https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=keithgalli&utm_content=description-only

Setup!
Github supply code & data: https://github.com/KeithGalli/Pandas-Data-Science-Tasks
Installing Jupyter Notebook: https://jupyter.readthedocs.io/en/newest/set up.html
Installing Pandas library: https://pandas.pydata.org/pandas-docs/secure/set up.html

Check out the primary video I did on Pandas:
https://youtu.be/vmEHCJofslg

Check out the movies I did on Matplotlib:
https://youtu.be/DAQNHzOcO5A
https://youtu.be/0P7QnIQDBJY

Detailed video description! (timeline might be present in feedback)

We begin by cleansing our data. Tasks throughout this part embody:
– Drop NaN values from DataBody
– Removing rows primarily based on a situation
– Change the kind of columns (to_numeric, to_datetime, astype)

Once we now have cleaned up our data a bit, we transfer the data exploration part. In this part we discover 5 excessive stage enterprise questions associated to our data:
– What was the most effective month for gross sales? How a lot was earned that month?
– What metropolis bought essentially the most product?
– What time ought to we show advertisemens to maximise the probability of buyer’s shopping for product?
– What merchandise are most frequently bought collectively?
– What product bought essentially the most? Why do you assume it bought essentially the most?

To reply these questions we stroll via many various pandas & matplotlib strategies. They embody:
– Concatenating a number of csvs collectively to create a brand new DataBody (pd.concat)
– Adding columns
– Parsing cells as strings to make new columns (.str)
– Using the .apply() technique
– Using groupby to carry out combination evaluation
– Plotting bar charts and features graphs to visualise our outcomes
– Labeling our graphs

If you take pleasure in this video, be certain that to depart it a like and subscribe to not miss any future related tutorials :).

Check out the brand new “solving real world data science tasks” video I posted!
https://youtu.be/Ewgy-G9cmbg

———————————————

Follow me on social media!
Instagram | https://www.instagram.com/keithgalli/
Twitter | https://twitter.com/keithgalli

———————————————

Video Timeline!
0:00 – Intro
1:22 – Downloading the Data
2:57 – Getting began with the code (Jupyter Notebook)

Task #1: Merging 12 csvs right into a single dataframe (3:35)
4:25 – Read single CSV file
5:44 – List all recordsdata in a listing
7:06 – Concatenating recordsdata
11:00 – Reading in Updated dataframe

Task #2: Add a Month column (12:48)
14:12 – Parse string in Pandas cell (.str)

Cleaning our data!
17:31 – Drop NaN values from df
21:25 – Remove rows primarily based on situation

Task #3: Add a gross sales column (24:58)
25:58 – Another solution to convert a column to numeric (ints & floats)

Question #1: What was the most effective month for gross sales? (29:20)
30:35 – Visualizing our outcomes with bar chart in matplotlib

Question #2: What metropolis bought essentially the most product? (34:17)
35:32 – Add a metropolis column
36:10 – Using the .apply() technique (tremendous helpful!!)
40:35 – Why can we use the lambda x ?
40:57 – Dropping a column
46:45 – Answering the query (utilizing groupby)
47:34 – Plotting our outcomes

Question #3: What time ought to we show ads to maximise the probability of purchases? (52:13)
53:16 – Using to_datetime() technique
56:01 – Creating hour & minute columns
58:17 – Matplotlib line graph to plot our outcomes
1:00:15 – Interpreting our outcomes

Question #4: What merchandise are most frequently bought collectively? (1:02:17)
1:03:31 – Finding duplicate values in our DataBody
1:05:43 – Use remodel() technique to hitch values from two rows right into a single row
1:08:00 – Dropping rows with duplicate values
1:09:39 – Counting pairs of merchandise (itertools, collections)

Question #5: What product bought essentially the most? Why do you assume it did? (1:14:04)
1:15:28 – Graphing data
1:18:41 – Overlaying a second Y-axis on present chart
1:23:41 – Interpreting our outcomes

———————
If you’re curious to learn the way I make my tutorials, try this video: https://youtu.be/LEO4igyXbLs

*I take advantage of affiliate hyperlinks on the merchandise that I like to recommend. I’ll earn a purchase order fee or a referral bonus from the utilization of those hyperlinks.

supply
[ad_2]

Comment List

  • Keith Galli
    November 10, 2020

    Posted a new "Solving real world data science tasks" video! Check it out here: https://youtu.be/Ewgy-G9cmbg

  • Keith Galli
    November 10, 2020

    it was really great!!!

  • Keith Galli
    November 10, 2020

    Assalamualaikum brother…

    I am the CTO of Light Theory LLC

    I believe that TF Automation, among many innovative things, will save our planet, careers, families, for the long-term, IF we can unite, and work on projects that are needed TODAY!

    I have a project, only meant for a select few…

    in Tensor Flow

    for Pattern Recognition…

    We already have a buyer lined up.

    The project valuation is 120M usd.

    A Strong Dividend of that profit share goes directly to each Developer.

    This is a Project of a Life-time.

    in sha ALLAH, Our Small team, would be very excited to have you join fulltime or even as an Adjunct.

    Please email us

    info@LightTheory.tech

    Salam

    LT

  • Keith Galli
    November 10, 2020

    I'm about an hour in, following along, but my code is taking over 30s to load each time I check it. Is this normal?

  • Keith Galli
    November 10, 2020

    Hi Keith, I download the csv files from your GitHub , after successful creation of single data file, The dates are not in same order whereas in your video the Order Date is in specific format . Could you please tell me the solution for this

  • Keith Galli
    November 10, 2020

    Nice vid! I think there's a mistake in how you select the most popular pair though, and didn't see someone commenting on it (didn't read them all though).

    Your Counter will contain some duplicate pairs due to different ordering. Because the items don't have a predictable order within each purchase, itertools.combinations might make the pair (itemA, itemB) from one purchase but (itemB, itemA) from another. The Counter will consider those to be different, but we shouldn't. One solution is to sort each list before using combinations() on it, so that it only generates alphabetically ordered pairs. No duplicates that way!

    Anyway, I learned stuff. Thanks!

  • Keith Galli
    November 10, 2020

    You really seems a nice guy, I discovered you recently and I'm in love with your data science video. great job dude 🙂 You're look young ! Maybe it's tactless, but how old are you ?

  • Keith Galli
    November 10, 2020

    well organize presentation.Thanks Keith!

  • Keith Galli
    November 10, 2020

    Thank you, there are tons of brilliant programmers on youtube but only a few programmers who are good communicators and teachers.

  • Keith Galli
    November 10, 2020

    Loved this video – especially the real-world approach! Please keep creating more such content! Thank you so much!!

  • Keith Galli
    November 10, 2020

    <pandas.concat> does this concatenate files without column header by default?

  • Keith Galli
    November 10, 2020

    what's the software used to record this video?? if anyone knows.. please let me know

  • Keith Galli
    November 10, 2020

    Wow..awesome video! Nice way to learn pandas in action!

  • Keith Galli
    November 10, 2020

    Hey Keith I am beginning in Python and know only bits and pieces, how much Python is needed to jump into Pandas follow your assignments.

  • Keith Galli
    November 10, 2020

    Nice tutorial but it would be extremely helpful if you took a minute at the beginning to show the final output so people could make a better determination to invest their effort & time.

  • Keith Galli
    November 10, 2020

    Nice tutorials! Just finished your pandas videos! Thanks, man!

  • Keith Galli
    November 10, 2020

    Really useful video.. I subscribed for more.

  • Keith Galli
    November 10, 2020

    I am unable to get the count column while doing.
    >> all_months_data.groupby(['Hour']).count()

  • Keith Galli
    November 10, 2020

    at 23:57 i guess the or data is there because 12 files were concatenated and it also took their header that should be the reason 5 similar header lines

  • Keith Galli
    November 10, 2020

    Thanks Keith

  • Keith Galli
    November 10, 2020

    I don't know why, but I don't have the same result here 33:46

    The lowest month for me is november :
    m qt price sales
    11 3989 6.272135e+05 6.313690e+05

  • Keith Galli
    November 10, 2020

    loved it .. learned so much watching your video. can you please tell how to download the input csv files.

  • Keith Galli
    November 10, 2020

    Great video Keith!

  • Keith Galli
    November 10, 2020

    AMAZING video…even though I followed along I also was able to debug my own mistakes and customize the output from this. Thanks

  • Keith Galli
    November 10, 2020

    Another great video, thanks! Who s the girl standing against the wall on 2 photos on your desktop? Your girlfriend?))

  • Keith Galli
    November 10, 2020

    Eyvallah bro! Thanks from İzmir – Karsiyaka!

  • Keith Galli
    November 10, 2020

    Great video! I enjoyed to watching this 🙂

  • Keith Galli
    November 10, 2020

    The first time I let the ads on a youtube video, because I wanted to watch every second of it. Many thanks Keith, you' re just amazing !

  • Keith Galli
    November 10, 2020

    I can't like this video enough, pls do more data science/analysis related videos

  • Keith Galli
    November 10, 2020

    Loving this tutorial mate, good one.
    can you explain this function for me please?
    Cities = [city for city, df in all_data.groupby('City')]

    thanks

  • Keith Galli
    November 10, 2020

    Hi sir, ur so cute

  • Keith Galli
    November 10, 2020

    Hi – After merging the files – all_data file is being saved in Jupyter homepage and not in the desktop folder – am using the correct file location – can you please help

  • Keith Galli
    November 10, 2020

    Yep…Thank you. Nice job. Keep posting…

  • Keith Galli
    November 10, 2020

    the best part part was watching some one google the answer an seeing how they implement the solution instead of just acting like they know everything. man your tutorials are the best an down to earth

Write a comment