Data + Science

[ad_1]


3/10/2020
A Journey to Custom Subtotals with Table Calculations


This past weekend Tableau Zen Master Klaus Schulte published a great blog post showing four different ways of Adding Custom Sub-Totals in Tableau. In this blog post, he demonstrates how to do this with data densification and table calculations. I posted my own solution later that evening, which I will describe below, but I wanted to discuss the journey to get to the solution.

We often see finished visualizations or technical blog posts relating to Tableau, but when we see the end result, we don’t see the hours of iteration that it can take to get there. Those iterations can include dead ends, mistakes and sometimes overlooking the simplest of solutions. So in an effort to show this process more, I thought I would walk through the process that I took. Keep in mind that I didn’t record every step along the way, but I will highlight my process as best I can.

In addition, I would point out that we often find these solutions through community work and collaboration. Had Klaus not posted this blog post, which references work by Rody Zakovich, then I surely wouldn’t be writing this post. As with many Tableau techniques, we build on the work of many others that come before us.

When I read Klaus’s blog post, I suspected there was an easier way to do this with tableau calculations. I can’t pinpoint exactly why I thought this, but there were a good many steps in his process and some long formulas. So I download Klaus’s workbook to see exactly what he did with the Tableau Table calculations.

Let’s start with the goal. This is what Klaus created. A table that shows the individual values for a subcategory, but instead of the typical subtotals, he has a running sum for the subtotals for each category. This is one of those things that would take 30 seconds in Excel, but as we’ll see, there’s a bit more involved in Tableau.

Using the Superstore data, the first thing I did was create a basic table with Category and Subcategory and then I added all subtotals from the analysis tab. My immediate thought was to put an index() function on this table so that I could see what was going on with these subtotals. Notice that the subcategories and categories each have their own sequence down the table. Not unexpected.

My next series of steps were based on this logic; “if the number is in the Subcategory sequence then use the Sum([Sales]), otherwise use a Running_Sum(Sum([Sales]))”. After a few quick attempts at a calculation around that logic, I decided to add a size() function to the table. When I set Size to compute using Subcategory it looks like this.

This was my first breadcrumb because this isolates the pieces for each part, one sum and one running sum.

My next step involved a few calculations. First, I tried a single calculation using the logic I outlined above.

   Calculated Field: New Calc
   Formula:
   if size() = 1 then RUNNING_SUM(SUM([Sales]))
   else sum([Sales])
   end

While the logic might appear to make sense, this was a failed attempt, because no matter how I set Compute Using or adjusted the table calculation, it didn’t solve the problem. This formula doesn’t calculate the running sum as I need it. We’ll come back to this, because this failure is actually a step in the solution that comes back later.

My next step was to break the calculation into two parts so that I could see what was going on for each part of the calculation. I figured that if I could break them up them I could control them easier for sum and running sum.

   Calculated Field: New Calc 1
   Formula:
   if size() = 1 then null
   else sum([Sales])
   end

   Calculated Field: New Calc 2
   Formula:
   if size() = 1 then RUNNING_SUM(SUM([Sales]))
   else null
   end

This New Calc 2 didn’t create a Running_Sum based on my calculations, so I created one more calculation to accomplish that piece.

   Calculated Field: Running Sum of New Calc 2
   Formula:
   RUNNING_SUM([New Calc 2])

Here’s what all of these calculations look look like at this point.

This was my first glimmer of hope for the correct result. In fact, it’s a variation of the solution where the values are in two separate columns. I’d call that a win. When I remove all of the other columns that I created it looks like this.

I went back to add Sum([Sales]) back into the forumla in place of the NULL.
   Calculated Field: New Calc 2
   Formula:
   if size() = 1 then RUNNING_SUM(SUM([Sales]))
   else Sum([Sales])
   end

At this point it looks like a step backwards. It’s now the running sum down the entire table. Not exactly what we are looking for.

However, when I changed the “Compute Using” to Category, I get this.

This is the moment where I explained, “Yes!” while making a fist, and my family watching TV looked at me funny. Don’t judge. This is exciting stuff. These ARE the droids we’ve been looking for.

My next thought was to clean up some of these calculations. Remember back at the beginning when the Running_Sum of Sales calculation wasn’t getting the result I wanted in New Calc. Well, if it’s not working the way I want, then why do I need it? So I removed Running_Sum from the equation.

   Calculated Field: New Calc 2
   Formula:
   if size() = 1 then Sum([Sales])
   else Sum([Sales])
   end

After making that change I verified that calculation was still working as expected. Yep, still good. What was a road block early on in the process and incorrect now helped me simplify the formula.

If both conditions of the IF statement produce the same result, Sum([Sales]), then I can just use Sum([Sales]) into the Running Sum Calculation and do this all with a single calculation. I didn’t need this IF calculation at all. It doesn’t get much simpler than this.

   Calculated Field: Running Sum of New Calc 2
   Formula:
   RUNNING_SUM(Sum([Sales]))

I now have the result I was looking for in a single calculation. And it makes complete sense. It’s very simple. It computes the running sum on Category, so we get individual values for the Subcategories and the Running Sum for the Subtotals which are the Category Subtotals.

The next step is dealing with the label. Klaus’s workbook used this solution from Rody Zakovich.

   Calculated Field: Label
   Formula:
   IF [first]=0 AND [last] =0
   THEN MIN(“Running Total after “+[Category])
   ELSE MIN([Sub-Category])
   END

I knew I could do this same calculation with the size() function that I used in my earlier iterations. Those earlier steps paid off again.

   Calculated Field: Label
   Formula:
   IF size()=1
   THEN MIN(“Running Total after “+[Category])
   ELSE MIN([Sub-Category])
   END

The key to this one is setting the Compute Using to Subcategory for the Label, instead of at Category like we did for the Running Sum Calc. This solution allows me to remove First() and Last() and therefore Measure Names and Measure Values. I’m down to two calculations, one for the numbers and one for the label. I adjusted the “spacer” calculation, and ended up doing it in a different way than Klaus did.

I also experimented with making the table using a dual-axis circle so that I could leverage the labels of the circles and control alignment. This approach eliminates the need for the spacer calculations.

The end result is a very simple and elegant solution, but I hope by showing you all of these steps that you will appreciate that I didn’t simply open up Klaus’s Tableau workbook and type two simple calculations to achieve this result. While the solution seems very simple when you see it, to end up at this solution required multiple steps, a few missteps and some silly mistakes. Ultimately, I broke things down, put them back together and worked through the logic. This took some time, but it was satisfying to solve the problem in this way and I learned something from the process.

Below is a Tableau Public Visualization putting all of this together.



Read More …

[ad_2]


Write a comment