# 11 Mar Automatic test selection

The purpose of this page is to describe the methodological choices regarding the tests performed on pvalue.io. This page is technical, and is intended for users wondering why one test is performed rather than another. The corresponding R code is available here.

## Univariable test

The test performed depends on the type of response variable (variable to explain Y), and the explanatory variable X.

### Unpaired tests (independant measures)

#### Y is categorical with 2 classes

• If the variable X is numerical
• If the sample size is over 30 in the 2 classes: the test performed is the Welch’s T test. The Welch’s T-test is used because it is more robust to an imbalance of variances than the Student’s T-test, for an almost similar power.
• Otherwise Mann-Whitney’s non parametric test
• If the variable X is categorical
• If the theoretical count in each cell of the contingency table is greater than 5: Chi Square test
• Otherwise Fisher’s test
• Exact test if X contains 2 classes
• Otherwise : Fisher’s test whose p is obtained by a Monte-Carlo simulation with 100 000 iterations.
• If we perform survival analyses, the test performed is a Log-rank test

#### Y is categorical with over 2 classes

• If the variable X is numerical
• If the sample size is over 30 in all the classes with an homoskedaticity: ANOVA
• Otherwise Kruskal-Wallis’ non parametric test
• If the variable X is categorical
• If the theoretical count in each cell of the contingency table is greater than 5: Chi Square test
• Otherwise : Fisher’s test whose p is obtained by a Monte-Carlo simulation with 100 000 iterations.

#### Y is numerical

• If the variable X is numerical
• If the sample size is over 30 in all the classes with an homoskedaticity: Pearson’s correlation coefficient
• Otherwise  Spearman’s correlation coefficiento Rho
• If the variable X is categorical with 2 classes
• If the sample size is over 30 in the 2 classes: the test performed is the Welch’s T test.
• Otherwise Mann-Whitney’s non parametric test
• If the variable X is categorical with over 2 classes
• If the sample size is over 30 in all the classes with an homoskedaticity: ANOVA
• sinon test non paramétrique de Kruskal-Wallis

### Paired tests (2 measures for the same patient)

#### X is categorical

• McNemar’s test
• If X contains 2 classes: McNemar’s exact test
• If X contains over 2 classes: McNemar-Bowker’s test

#### X is numerical

• If the sample size is over 30 for both measures: the test performed is the paired Welch’s T test
• Otherwise the non parametric paired Mann-Whitney’s test

## Multivariable statistical models

The choice of the multivariable model depends on the response variable Y:

• If the variable Y is numerical, a linear regression model is performed
• If the variable Y is categorical with two classes
• if survival analysis, the model performed is the Cox model
• otherwise, a logistic regression model is performed
• If the variable Y is categorical with over two classes: no analysis possible

 1.Welch, B. L. The generalisation of student’s problems when several different population variances are involved. Biometrika 34, 28–35 (1947).