Abstract
The parametric Welch t-test and the non-parametric Wilcoxon-Mann-Whitney, empirical and exponential empirical likelihood tests are commonly used for hypothesis testing of two population means. In order to circumvent the inflated type I error problem of the non-parametric likelihood testing procedures, a simple calibration using the t distribution and bootstrapping is proposed. Those testing procedures are then being compared via extensive Monte Carlo simulations on the grounds of type I error and power. Evidence is provided supporting that (a) the t calibration and bootstrap improve the type I error of the non-parametric likelihoods, (b) the Welch t-test attains the type I error and produces high levels of power, and (c) the Wilcoxon-Mann-Whitney test produces inflated type I error while computation of the exact p-value is not feasible in the presence of ties. An application to real gene expression data illustrates the computational superiority of the Welch t-test.