Tukey, Design Thinking, and Better Questions · Simply Statistics
Roughly annually, I learn John Tukey’s paper “The Future of Data Analysis”, initially printed in 1962 within the Annals of Mathematical Statistics. I’ve been doing this for the previous 17 years, every time hoping to actually perceive what it was he was speaking about. Fortunately, every time I learn it I appear to get one thing new out of it. For instance, in 2017 I wrote a whole talk round among the fundamental concepts.
Effectively, it’s that point of yr once more, and I’ve been doing a little studying.
Most likely essentially the most well-known line from this paper is
Much better an approximate reply to the proper query, which is usually imprecise, than an precise reply to the flawed query, which might all the time be made exact.
The underlying concept on this sentence arises in not less than two methods in Tukey’s paper. First is his warning that statisticians shouldn’t be known as upon to provide the “proper” solutions. He argues that the concept that statistics is a “monolithic, authoritarian construction designed to provide the ‘official’ outcomes” presents a “actual hazard to knowledge evaluation”. Second, Tukey criticizes the concept that a lot of statistical observe facilities round optimizing statistical strategies round exact (and insufficient) standards. One can be happy to establish a technique that minimizes imply squared error, however that shouldn’t be seen because the aim of information evaluation.
However that obtained me considering—what is the last word aim of information evaluation? In 64 pages of writing, I’ve discovered it troublesome to establish a sentence or two the place Tukey describes the last word aim, why it’s we’re bothering to research all this knowledge. It occurred to me on this yr’s studying of the paper, that perhaps the rationale Tukey’s writing about knowledge evaluation is usually so complicated to me is as a result of his aim is definitely fairly totally different from that of the remainder of us.
Extra Questions, Higher Questions
More often than not in knowledge evaluation, we try to reply a query with knowledge. I don’t suppose it’s controversial to say that, however perhaps that’s the flawed strategy? Or perhaps, that’s what we’re not making an attempt to do at first. Perhaps what we spend most of our time doing is determining a greater query.
Hilary Parker and I’ve mentioned at size the concept of design considering on our podcast. One of many basic concepts from design considering entails figuring out the issue. It’s the primary “diamond” within the “double diamond” strategy to design.
Tukey describes the primary three steps in a knowledge evaluation as:
- Recognition of drawback
- One method used
- Competing strategies used
In different phrases, strive one strategy, then strive a bunch of different approaches! You could be considering, why not simply strive the perfect strategy (or maybe the proper strategy) and save your self all that work? Effectively, that’s the type of path you go down if you’re making an attempt to reply the query. Cease doing that! There are two explanation why it’s best to cease interested by answering the query:
- You’re most likely asking the flawed query anyway, so don’t take your self too severely;
- The “finest” strategy is simply outlined as “finest” based on some arbitrary criterion that most likely isn’t appropriate in your drawback/query.
After interested by all this I used to be impressed to attract the next diagram.
The aim on this image is to get to the higher proper nook, the place you might have a top quality query and really robust proof. In my expertise, most individuals assume that they’re beginning within the backside proper nook, the place the standard of the query is at its highest. In that case, the one factor left to do is to decide on the optimum process so as to squeeze as a lot data out of your knowledge. The truth is that we nearly all the time begin within the backside left nook, with a imprecise and poorly outlined query and a equally imprecise sense of what process to make use of. In that case, what’s a knowledge scientist to do?
For my part, essentially the most helpful factor a knowledge scientist can do is to commit critical effort in direction of bettering the standard and sharpness of the query being requested. On the diagram, the aim is to maneuver us as a lot as doable to the appropriate hand facet. Alongside the best way, we’ll take a look at knowledge, we’ll think about issues outdoors the information like context, assets and material experience, and we’ll strive a bunch of various procedures (some optimum, some much less so).
Finally, we’ll develop a few of concept of what the information inform us, however extra importantly we could have a greater sense of what sorts of questions we will ask of the information and what sorts of questions we really need to have answered. In different phrases, we will be taught extra about ourselves by trying on the knowledge.
Exploring the Information
It will appear that the message right here is that the aim of information evaluation is to discover the information. In different phrases, knowledge evaluation is exploratory knowledge evaluation. Perhaps this shouldn’t be so stunning provided that Tukey wrote the book on exploratory knowledge evaluation. On this paper, not less than, he basically dismisses different targets as overly optimistic or probably not significant.
For essentially the most half I agree with that sentiment, within the sense that on the lookout for “the reply” in a single set of information goes to end in disappointment. At finest, you’ll accumulate proof that can level you in a brand new and promising path. Then you possibly can iterate, maybe by amassing new knowledge, or by asking totally different questions. At worst, you’ll conclude that you just’ve “figured it out” after which be shocked when another person, one other dataset, concludes one thing utterly totally different. In gentle of this, discussions about p-values and statistical significance are very a lot inappropriate.
The next is from the very opening of Tukey’s e-book *Exploratory Information Evaluation:
You will need to perceive what you CAN DO earlier than you be taught to measure how WELL you appear to have DONE it
(Observe that the all caps are initially his!) Given this, it’s not too stunning that Tukey appears to equate exploratory knowledge evaluation with basically all of information evaluation.
There’s one story that, for me, completely captures the spirit of exploratory knowledge evaluation. Legend has it that Tukey as soon as requested a pupil what have been the advantages of the median polish technique, a method he invented to research two-way tabular knowledge. The scholar dutifully answered that the good thing about the method is that it supplied summaries of the rows and columns through the row- and column-medians. In different phrases, like all good statistical method, it summarized the information by decreasing it indirectly. Tukey fired again, saying that this was incorrect—the profit was that the method created extra knowledge. That “extra knowledge” was the residuals which are leftover within the desk itself after operating the median polish. It’s the residuals that basically allow you to be taught in regards to the knowledge, uncover whether or not there may be something uncommon, whether or not your query is well-formulated, and the way you may transfer on to the following step. So in the long run, you bought row medians, column medians, and residuals, i.e. extra knowledge.
If a very good exploratory method offers you extra knowledge, then perhaps good exploratory knowledge evaluation offers you extra questions, or higher questions. Extra refined, extra centered, and with a sharper level. The good thing about creating a sharper query is that it has a better potential to supply discriminating data. With a imprecise query, the perfect you possibly can hope for is a imprecise reply that won’t result in any helpful selections. Exploratory knowledge evaluation (or perhaps simply knowledge evaluation) offers you the instruments that permit the information information you in direction of a greater query.