It is well documented that t-tests have a high type 1 error and low replicability when comparing model performance across repeated k-fold cross-validation. Various proposals have tired to correct t-tests for type 1 errors. The corrected repeated k-fold cv test implemented here does this while also improving replicability.
See: Bouckaert, R. R., & Frank, E. (2004, May). Evaluating the replicability of significance tests for comparing learning algorithms. In Pacific-Asia conference on knowledge discovery and data mining (pp.3-12). Springer, Berlin, Heidelberg. (yes, it's a conference, but consider the # of citations!)