Data Visualization Cheat Sheet with Seaborn and Matplotlib | by Chi Nguyen | Nov, 2020
Exploratory Data Analysis — EDA is an indispensable step in data mining. To interpret various aspects of a data set like its distribution, principal or interference, it is necessary to visualize our data in different graphs or images. Fortunately, Python offers a lot of libraries to make visualization more convenient and easier than ever. Some of which are widely used today such as Matplotlib, Seaborn, Plotly or Bokeh.
Since my job concentrates on scrutinizing all angles of data, I have been exposed to many types of graphs. However, because there are way too many functions and the codes are not easy to remember, I sometimes forget the syntax and have to review or search for similar codes on the Internet. Without doubt, it has wasted a lot of my time, hence my motivation for writing this article. Hopefully, it can be a small help to anyone who has a memory of a goldfish like me.
My dataset is downloaded from public Kaggle dataset. It is a grocery dataset, and you can easily get the data from the link below:
This grocery data consists of 3 columns, which are:
- Member_number: id numbers of customers
- Date: date of purchasing
- itemDescription: Item name
Now, let’s have a look at the data frame and its information:
There are some packages that we should import first.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
For this section, I will use a line graph to visualize sales the grocery store during the time of 2 years 2014 and 2015.
First, I will transform the data frame a bit to get the items counted by month and year.
After we have our data, let’s try to visualize it:
Bar chart is used to simulate the changing trend of objects over time or to compare the figures / factors of objects. Bar charts usually have two axes: one axis is the object / factor that needs to be analyzed, the other axis is the parameters of the objects.
For this dataset, I will use a bar chart to visualize 10 best categories sold in 2014 and 2015. You can either display it by horizontal or vertical bar chart. Let’s lsee how it looks.
Horizontal Bar Chart
If you prefer vertical bar chart, try this:
Bar Chart with Hue Value
If you want to compare each category’s sales by year, what would your visualization look like? You can draw the graph with an addition of an element called hue value.
Now, can you see it more clearly?
Imagine that I want to discover the frequency of customers buying whole milk, the best seller category. I will use histogram to obtain this information.
By looking at the visualization, we can see that customers hardly repurchase this item more than twice, and a lot of customers cease to buy this product after their first purchases.
Actually, pie charts are quite poor at communicating the data. However, it does not hurt to learn this visualization technique.
For this data, I want to compare the sales of top 10 categories with the rest in both year 2014 and 2015. Now, let’s transform our data to get this information visualized.
Our data is now ready. Let’s see the pies!
So, it is obvious that top 10 categories were less purchased in 2015 compared to 2014, by 5.5%.
Another way to review your data is swarm plot. In swarm plot, points are adjusted (vertical classification only) so that they do not overlap. This is helpful as it complements box plot when you want to display all observations along with some representation of the underlying distribution.
As I want to see the number of items sold in each day of the week, I may use this type of chart to display the information. As usual, let’s first calculate the items sold and group them by categories and days.
After we obtain the data, let’s see how the graph looks like.
In this article, I have shown you how to customize your data with different types of visualizations. If you find it helpful, you can save it and review anytime you want. It can save you tons of time down the road. 😀