## A statistical tool to use when the results are not significant (with example in R)

Let’s say you run a study to test differences between kids’ and adolescents’ moral disgust to unfair treatment. After you collected data and run the analysis, you see that there were no significant statistical differences between the two. When this happens, it is common to interpret the results as evidence for the null-hypothesis, that is, that there are no real differences between kids’ and adolescents’ regarding moral disgust. However, this is a misinterpretation of the non-significant result, since it is impossible to show the total absence of an effect in a population.

Quertemont (2011) stated that non-significant results can occur for three different reasons:

1. Mistakes have been made during the collection or encoding of the data, which mask otherwise significant results. This also includes measurement error (imprecision).

2. The study did not have enough statistical power to prove the existence of an otherwise real effect at the population level. The result is a “false equivalence”, due to a sampling error.

3. There is actually no real effect (or a negligible effect) at the population level. The result is a “true equivalence”.

Although it’s impossible to show the absence of an effect in a population, we can use statistics to show the likelihood that the size of an effect in the population is lower than some value considered to be too low to be useful (Quertemont, 2011). That is the case for equivalence testing.

Equivalence testing arose in bioequivalence research, where drugs were considered to be bioequivalent if their absorption rate and concentration levels in the blood after a certain amount of time were the same.

Equivalence tests, as said before, examine whether the hypothesis that there are effects extreme enough to be considered meaningful can be rejected (Lakens et al., 2018).

As you can see by the image above, to do equivalence testing the researcher must define the smallest effect size of interest. Let’s look at Lakens et al. (2018) example to make it more clear:

After an extensive discussion with experts, the researcher decides that as long as the gender difference does not deviate from the population difference by more than .06, it is too small to care about. Given an expected true difference in the population of .015, the researcher will test if the observed difference falls outside the boundary values (or equivalence bounds) of −.055 and .075. If differences at least as extreme as these boundary values can be rejected in two one-sided tests […], the researcher will conclude that the application rates are statistically equivalent; the gender difference will be considered trivially small, and no money will be spent on addressing a gender difference in participation.

To know how to justify the smallest effect size of interest, look at Lakens et al. (2018) paper.