Data + Science

[ad_1]


7/23/2018
How to Separate a Box Plot and Unit Histogram in Tableau


In a recent blog post by Steve Wexler entitled Is this better than a Jitterplot? Could be, Steve explores alternate ways to visualize a jitterplot (a dot plot with random jitter to separate the dots). In this post, I show how you can create a box plot separated from the unit histogram.

A Unit Histogram

Steve outlines the steps on how to make this unit histogram in his post and his Tableau visualization is here. I will use this as the starting point for this blog post, so if you want to follow along, download his workbook here.

Here is Steve’s Unit Histogram.

Removing the Distribution Bands and Adding a Box Plot


   Once you have Steve’s Tableau workbook downloaded and open, right-click on the main dashboard tab “Salary Comparison Jitterplot Dashboard (2)” and select “Unhide All Sheets”.

   Go to the worksheet Breakout with Quartiles. Click on the line in the distribution band and drag it off the Tableau Canvas. Another option is to right-click on the line and select “Remove”. Repeat this step for the distribution band as well. You should now have a Unit Histogram without the distribution bands.

   Click on the Analytics Tab in the top left corner. Click and drag Box Plot to the Tableau canvas and place it on SUM(Continuous Bin). This will add a box plot to the unit histogram.

That was pretty easy, but notice that the box plot centers in the pane and covers some of the dots.

Let’s explore a few options to shift the box plot out of the way.

Shifting the Position of the Box Plot

The First method will shift the box plot by creating more dots. We will do this by creating another pill and using a dual axis.

   Double click on the Column Shelf in the white area to the right of the INDEX() pill. This will allow you to enter an in line formula. Type -20 and hit enter.
   Right-click on the new pill and select “Dual Axis”. Right-click on the new secondary X-Axis at the top of the chart and select “Synchronize Axis”.
   On the Marks card for SUM(-20), click on Color and set the Opacity to 0% and the Border to “None”.

You should now have a Unit Histogram that starts in the center of the Box Plot.

However, you will notice that the dots are still hidden behind the box plot for many of the points. You can change the X-Axis to Fixed to adjust the position, but I think you will find that it’s difficult to position the box plot in a way that doesn’t overlap the dots and still shows all of the dots.

Separating the Box Plot from the Unit Histogram

The most obvious way to separate the box plot and the dot plot is to create a worksheet for each box plot and unit histogram. In this case, there are four generations, so that would be eight worksheets. This approach allows you to create a different view for each generation, one with a box plot and one with a unit histogram and it’s pretty straight forward to duplicate the sheets and adjust the filters accordingly to create each view. This isn’t ideal and as you have more categories this could quickly get unwieldy.

Another method to separate them is to duplicate the data. This can be done with Custom SQL, Tableau Prep, Alteryx or preprocessed in the original data source. To make things easy for this demonstration, you can download the duplicated data here. This spreadsheet simply duplicates the original data along with a new column called “Column”. The first set of the data has a value of “box” in this new column and the second set of data has the value of “dots”.

   Select “Data” from the top menu and select “New Data Source”. Select “Microsoft Excel” and select the file that you downloaded, BoxandUnitHisto.xlsx. This will open the new data source. Click on the Breakout with quartiles tab again to go back to the Tableau canvas.
   Select “Data” from the top menu again and select “Replace Data Source”. The “Current:” data source should be set to Reshaped Survey Data and the “Replacement:” data source should be set to Sheet1 (BoxandUnitHisto). This will replace the single data source with the new data source that duplicates all of the records and has a new field called “Column”.

One of the big issues with duplicating a data set is that calculation can be affected. For example, Continuous Bin, which is the main field on the Y-Axis has now doubled. There are a number of ways to deal with this, but we’ll apply a simple fix to this, which is to divide the number by 2. Note – this may not be the best solution in every case.

   right-click on Continuous Bin and add /2 to the formula so that the value will divide by 2.
   Drag the Column pill to the Column shelf after Breakdown. Drag SUM(-20) up and off of the canvas to remove the pill or right-click and select “Remove”.
   Right-click on Continuous Bin and select “Duplicate”
   Right-click on the new field Continuous Pill (copy) and select “Edit”
   Adjust the Formula: IF [Column] = ‘box’ then INT( [Value] / [Bin Size]) * [Bin Size]/2 ELSE null END
   Drag the new Continuous Pill (copy) to the Rows shelf next to Continuous Bin
   Right-click on Continuous Pill (copy) and select “Dual Axis”
   Right-click on new secondary Y-Axis on the right-hand side and select “Synchronize Axis”
   Drag the box plot from the Dots pane off of the Tableau Canvas to remove it, or right-click on the box plot in the Dots pane and select “Remove”

You will now have a box plot that is only in the left pane window marked “box” and not in the second pane marked “dots”. Making the dots disappear in “box” pane is a bit tricky. I think you will find that the Opacity, Color, Borders and Size don’t work for this. So we will create two more fields to deal with this.

   Calculated field: Col Num
   Formula: IF [Column] = ‘box’ THEN 0 ELSE 1 END

   Calculated field: Count
   Formula: IF avg([Col Num]) = 0 THEN null ELSE index() END

   Replace INDEX() on the Column shelf with the new field Count
   Right-click Count on the Columns shelf and select “Compute Using” and set it to Resp ID

You should now have a box plot in the first column pane and a unit histogram in the second column pane. The rest is simply formatting as desired.

   Right-click in the chart area and select “Format”
   Select “Borders” and slide the “Column Divider Level:” one position to the left.
   Right-click on the secondary Y-Axis Continuous Bin (copy) and uncheck “Show Header” to remove
   Right-click on the column header with “box” and “dots” and uncheck “Show Header” to remove
   Adjust the color of the dots as needed.

There are certainly draw backs to duplicating the data, however, the end result here is a box plot that does not overlap the unit histogram and that has some major advantages over the typical box plot that is plotted over top of the dots. This approach can also be used with distribution bands. Simply replace the box plot pane with a distribution band.

This could also be used with a jitterplot as well, but I find the unit histogram to be much more informative than the random placement of the dots using jitter.

I hope you find this information useful. If you have any questions feel free to email me at Jeff@DataPlusScience.com

Jeffrey A. Shaffer

Follow on Twitter @HighVizAbility



Read More …

[ad_2]


Write a comment