vignettes/Truncated_t_test.Rmd
Truncated_t_test.Rmd
This vignettes accompanies our recent manuscript ‘’A truncated t-test - Excluding outliers without biasing the Bayes factor’’ (Godmann et al., 2024) and shows how to use the RoBTT
R package to estimate a truncated Bayesian model-averaged independent samples \(t\)-test (TrBTT). TrBTT adapts the t-test to researchers’ outlier handling and thus mitigates the unwanted side effects of outlier exclusion on the inferences. For a general introduction to the RoBTT package, see the Introduction to RoBTT vignette.
Outliers can lead to biased analysis results. However, the widely applied approach of simply excluding extreme observations without changing the analysis is also not appropriate, as it often leads to inflated evidence. This vignette introduces a truncated version of the Bayesian model-averaged independent samples \(t\)-test and demonstrates an alternative way of handling outliers in a Bayesian hypothesis testing framework. TrBTT incorporates the Bayesian model-averaging approach with a truncated likelihood. As such, TrBTT offers a robust solution for conducting independent samples \(t\)-tests that are less susceptible to the influence of outlier.
The TrBTT truncates the likelihood identically to the truncation applied to data. As such, it overcomes the otherwise biased variance estimates due to outlier exclusion. It simultaneously model-averages across \(4\) different models;
For all models, the likelihood is adjusted according to the specified values. Inferences are based on a weighted average of each model’s predictive performance.
First, we ensure that the RoBTT package is installed and loaded into the R session:
First, we demonstrate how to manually exclude outliers using specific cut-offs and then apply truncation to the likelihood function. It is possible to specify specific cut-offs for each group separately, as would be the case for instance with the box plot method for identifying outliers. Further, it is possible to define a cut-off that was applied to both groups, for instance when all response times slower than \(200\) ms and higher than \(1000\) ms should be excluded in both groups.
First, we apply the box plot method for excluding outliers and specify the cut-off range for each group:
# Identify outliers using boxplot statistics for each group
stats1 <- boxplot.stats(x1)
lower_whisker1 <- stats1$stats[1]
upper_whisker1 <- stats1$stats[5]
stats2 <- boxplot.stats(x2)
lower_whisker2 <- stats2$stats[1]
upper_whisker2 <- stats2$stats[5]
# Exclude outliers based on identified whiskers
x1_filtered <- x1[x1 >= lower_whisker1 & x1 <= upper_whisker1]
x2_filtered <- x2[x2 >= lower_whisker2 & x2 <= upper_whisker2]
# Define whiskers for truncated likelihood application
whisker1 <- c(lower_whisker1, upper_whisker1)
whisker2 <- c(lower_whisker2, upper_whisker2)
We can then fit the truncated RoBTT:
# Fit the RoBTT model with truncation using the filtered data
fit1_trunc <- RoBTT(
x1 = x1_filtered, x2 = x2_filtered,
truncation = list(x1 = whisker1, x2 = whisker2),
seed = 1, parallel = FALSE)
We can summarize the fitted model using the summary()
function.
summary(fit1_trunc, group_estimates = TRUE)
#> Call:
#> RoBTT(x1 = x1_filtered, x2 = x2_filtered, truncation = list(x1 = whisker1,
#> x2 = whisker2), parallel = FALSE, seed = 1)
#>
#> Robust Bayesian t-test
#> Components summary:
#> Models Prior prob. Post. prob. Inclusion BF
#> Effect 2/4 0.500 0.319 0.468
#> Heterogeneity 2/4 0.500 0.171 0.207
#>
#> Model-averaged estimates:
#> Mean Median 0.025 0.975
#> delta -0.070 0.000 -0.442 0.008
#> rho 0.498 0.500 0.406 0.574
#>
#> Model-averaged group parameter estimates:
#> Mean Median 0.025 0.975
#> mu[1] 0.041 0.034 -0.151 0.278
#> mu[2] -0.031 -0.022 -0.290 0.169
#> sigma[1] 1.055 1.047 0.906 1.258
#> sigma[2] 1.052 1.043 0.887 1.270
The printed output is structured into three sections. First, the Components summary
table which contains the inclusion Bayes factor for the presence of an effect and heterogeneity computed using all specified models. Second, the Model-averaged estimates
table which contains the model-averaged posterior mean, median estimate, and 95% central credible interval for the effect (Cohen’s d) and variance allocation rho. Third, the Model-averaged group parameter estimates
table (generated by setting the group_estimates = TRUE
argument) which summarizes the model-averaged mean and standard deviation estimates of each group.
We can also summarize information about the specified models by setting the type = "models"
argument in the summary() function.
summary(fit1_trunc, group_estimates = TRUE, type = "models")
#> Call:
#> RoBTT(x1 = x1_filtered, x2 = x2_filtered, truncation = list(x1 = whisker1,
#> x2 = whisker2), parallel = FALSE, seed = 1)
#>
#> Robust Bayesian t-test
#> Models overview:
#> Model Distribution Prior delta Prior rho Prior prob. log(marglik)
#> 1 truncated normal Spike(0) Spike(0.5) 0.250 -261.28
#> 2 truncated normal Spike(0) Beta(1, 1) 0.250 -262.86
#> 3 truncated normal Cauchy(0, 0.71) Spike(0.5) 0.250 -262.04
#> 4 truncated normal Cauchy(0, 0.71) Beta(1, 1) 0.250 -263.62
#> Post. prob. Inclusion BF
#> 0.564 3.884
#> 0.117 0.397
#> 0.264 1.078
#> 0.055 0.173
This output contains a table summarizing the specifics for each model: The type of likelihood distribution, the prior distributions on the effect parameter, the prior distributions on the rho parameter, the prior model probabilities, the log marginal likelihoods, posterior model probabilities, and the inclusion Bayes factors.
Second, we can also specify the cut-off range for each group separately. Here, we specify identical cut-offs across groups:
cut_off <- c(-2,2)
x1 <- x1[x1 >= -2 & x1 <= 2]
x2 <- x2[x2 >= -2 & x2 <= 2]
# fit RoBTT with truncated likelihood
fit2_trunc <- RoBTT(
x1 = x1, x2 = x2,
truncation = list(x = cut_off),
seed = 1, parallel = FALSE)
The results can again be obtained using the summary()
function (see above).
The RoBTT
package also allows specifying truncation directly based on standard deviations, simplifying the process of outlier handling. The function proceeds by excluding extreme observations and truncating the likelihood accordingly. Note that the analyst should not exclude outliers manually and then specify sigma
truncation, as the data would be truncated twice.
This is again possible for the same standard deviation value sigma to be applied to both groups, as well as to specify different standard deviations per group.
First, a cut-off range sigma for both groups:
# Fit the model with direct truncation based on standard deviations
fit1_trunc <- RoBTT(
x1 = x1, x2 = x2,
truncation = list(sigma = 2.5),
seed = 1, parallel = FALSE)
Second, a different standard deviation sigma for each group:
# Fit the model with direct truncation based on standard deviations
fit1_trunc <- RoBTT(
x1 = x1, x2 = x2,
truncation = list(sigma1 = 2, sigma2 = 2.5),
seed = 1, parallel = FALSE)
Just like before, the results can be obtained using the summary()
function.