Solving real world data science tasks with Python Pandas!
In this video we use Python Pandas & Python Matplotlib to research and reply enterprise questions on 12 months value of gross sales data. The data accommodates a whole bunch of hundreds of electronics retailer purchases damaged down by month, product sort, price, buy tackle, and so on.
⭐ Kite is a free AI-powered coding assistant that can assist you code sooner and smarter. The Kite plugin integrates with all the highest editors and IDEs to present you sensible completions and documentation when you’re typing. I’ve been utilizing Kite for six months and I like it! https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=keithgalli&utm_content=description-only
Github supply code & data: https://github.com/KeithGalli/Pandas-Data-Science-Tasks
Installing Jupyter Notebook: https://jupyter.readthedocs.io/en/newest/set up.html
Installing Pandas library: https://pandas.pydata.org/pandas-docs/secure/set up.html
Check out the primary video I did on Pandas:
Detailed video description! (timeline might be present in feedback)
We begin by cleansing our data. Tasks throughout this part embody:
– Drop NaN values from DataBody
– Removing rows primarily based on a situation
– Change the kind of columns (to_numeric, to_datetime, astype)
Once we now have cleaned up our data a bit, we transfer the data exploration part. In this part we discover 5 excessive stage enterprise questions associated to our data:
– What was the most effective month for gross sales? How a lot was earned that month?
– What metropolis bought essentially the most product?
– What time ought to we show advertisemens to maximise the probability of buyer’s shopping for product?
– What merchandise are most frequently bought collectively?
– What product bought essentially the most? Why do you assume it bought essentially the most?
To reply these questions we stroll via many various pandas & matplotlib strategies. They embody:
– Concatenating a number of csvs collectively to create a brand new DataBody (pd.concat)
– Adding columns
– Parsing cells as strings to make new columns (.str)
– Using the .apply() technique
– Using groupby to carry out combination evaluation
– Plotting bar charts and features graphs to visualise our outcomes
– Labeling our graphs
If you take pleasure in this video, be certain that to depart it a like and subscribe to not miss any future related tutorials :).
Check out the brand new “solving real world data science tasks” video I posted!
0:00 – Intro
1:22 – Downloading the Data
2:57 – Getting began with the code (Jupyter Notebook)
Task #1: Merging 12 csvs right into a single dataframe (3:35)
4:25 – Read single CSV file
5:44 – List all recordsdata in a listing
7:06 – Concatenating recordsdata
11:00 – Reading in Updated dataframe
Task #2: Add a Month column (12:48)
14:12 – Parse string in Pandas cell (.str)
Cleaning our data!
17:31 – Drop NaN values from df
21:25 – Remove rows primarily based on situation
Task #3: Add a gross sales column (24:58)
25:58 – Another solution to convert a column to numeric (ints & floats)
Question #1: What was the most effective month for gross sales? (29:20)
30:35 – Visualizing our outcomes with bar chart in matplotlib
Question #2: What metropolis bought essentially the most product? (34:17)
35:32 – Add a metropolis column
36:10 – Using the .apply() technique (tremendous helpful!!)
40:35 – Why can we use the lambda x ?
40:57 – Dropping a column
46:45 – Answering the query (utilizing groupby)
47:34 – Plotting our outcomes
Question #3: What time ought to we show ads to maximise the probability of purchases? (52:13)
53:16 – Using to_datetime() technique
56:01 – Creating hour & minute columns
58:17 – Matplotlib line graph to plot our outcomes
1:00:15 – Interpreting our outcomes
Question #4: What merchandise are most frequently bought collectively? (1:02:17)
1:03:31 – Finding duplicate values in our DataBody
1:05:43 – Use remodel() technique to hitch values from two rows right into a single row
1:08:00 – Dropping rows with duplicate values
1:09:39 – Counting pairs of merchandise (itertools, collections)
Question #5: What product bought essentially the most? Why do you assume it did? (1:14:04)
1:15:28 – Graphing data
1:18:41 – Overlaying a second Y-axis on present chart
1:23:41 – Interpreting our outcomes
If you’re curious to learn the way I make my tutorials, try this video: https://youtu.be/LEO4igyXbLs
*I take advantage of affiliate hyperlinks on the merchandise that I like to recommend. I’ll earn a purchase order fee or a referral bonus from the utilization of those hyperlinks.