Data Visualization Cheat Sheet with Seaborn and Matplotlib | by Chi Nguyen | Nov, 2020


Exploratory Data Analysis — EDA is an indispensable step in data mining. To interpret various aspects of a data set like its distribution, principal or interference, it is necessary to visualize our data in different graphs or images. Fortunately, Python offers a lot of libraries to make visualization more convenient and easier than ever. Some of which are widely used today such as Matplotlib, Seaborn, Plotly or Bokeh.

My dataset is downloaded from public Kaggle dataset. It is a grocery dataset, and you can easily get the data from the link below:

  • Date: date of purchasing
  • itemDescription: Item name
Figure 1: Data frame
Figure 2: Data’s description

There are some packages that we should import first.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Line Chart

For this section, I will use a line graph to visualize sales the grocery store during the time of 2 years 2014 and 2015.

Figure 3: Items Counted by Month-Year
Figure 4: Line Chart of Items Counted by Month-Year

Bar Chart

Bar chart is used to simulate the changing trend of objects over time or to compare the figures / factors of objects. Bar charts usually have two axes: one axis is the object / factor that needs to be analyzed, the other axis is the parameters of the objects.

Figure 4: Items Counted by Categories
Figure 5: Horizontal Bar Chart
Figure 6: Vertical Bar Chart
Figure 7: Bar Chart with Hue Value


Imagine that I want to discover the frequency of customers buying whole milk, the best seller category. I will use histogram to obtain this information.

Figure 8: Frequency of customers buying whole milk in 2014 and 2015

Pie chart

Actually, pie charts are quite poor at communicating the data. However, it does not hurt to learn this visualization technique.

Figure 9: Pie Charts

Swarm Plot

Another way to review your data is swarm plot. In swarm plot, points are adjusted (vertical classification only) so that they do not overlap. This is helpful as it complements box plot when you want to display all observations along with some representation of the underlying distribution.

Figure 10: Swarm Chart

In this article, I have shown you how to customize your data with different types of visualizations. If you find it helpful, you can save it and review anytime you want. It can save you tons of time down the road. 😀


Source link

Write a comment