11 Mar Automatic test selection

Posted at 13:25h in Methodological choices by Kevin 0 Comments

The purpose of this page is to describe the methodological choices regarding the tests performed on pvalue.io. This page is technical, and is intended for users wondering why one test is performed rather than another. The corresponding R code is available here.

Univariable test

The test performed depends on the type of response variable (variable to explain Y), and the explanatory variable X.

Unpaired tests (independant measures)

Y is categorical with 2 classes

If the variable X is numerical
- If the distribution of the X is normal* for the 2 classes or the sample size is over 30 for both classes: the test performed is the Welch’s T test. The Welch’s T-test is used because it is more robust to an imbalance of variances than the Student’s T-test, for an almost similar power[1].
- Otherwise Mann-Whitney’s non parametric test
If the variable X is categorical
- If the theoretical count in each cell of the contingency table is greater than 5: Chi Square test
- Otherwise Fisher’s test
  - Exact test if X contains 2 classes
  - Otherwise : Fisher’s test whose p is obtained by a Monte-Carlo simulation with 100 000 iterations.
If we perform survival analyses, the test performed is a Log-rank test

Y is categorical with over 2 classes

If the variable X is numerical
- If the distribution of X for all the classes is normal* or the sample size is over 30 for all the classes, and with homoskedaticity: ANOVA
- Otherwise Kruskal-Wallis’ non parametric test
If the variable X is categorical
- If the theoretical count in each cell of the contingency table is greater than 5: Chi Square test
- Otherwise : Fisher’s test whose p is obtained by a Monte-Carlo simulation with 100 000 iterations.

Y is numerical

If the variable X is numerical
- If the distribution of the variables Y and X is normal* or the sample size is over 30, and with homoskedaticity: Pearson’s correlation coefficient
- Otherwise Spearman’s correlation coefficiento Rho
If the variable X is categorical with 2 classes
- If the distribution of the variable Y is normal* or the sample size is over 30 for both classes: the test performed is the Welch’s T test.
- Otherwise Mann-Whitney’s non parametric test
If the variable X is categorical with over 2 classes
- If the distribution of the variable Y is normal* or the sample size is over 30 for all the classes with an homoskedaticity: ANOVA
- Otherwise Kruskal-Wallis’ non parametric test

Paired tests (2 measures for the same patient)

X is categorical

McNemar’s test
- If X contains 2 classes: McNemar’s exact test
- If X contains over 2 classes: McNemar-Bowker’s test

X is numerical

If the distribution of the variable X is normal* or the sample size is over 30
for both measures: the test performed is the paired Welch’s T test
Otherwise the non parametric paired Mann-Whitney’s test

Multivariable statistical models

The choice of the multivariable model depends on the response variable Y:

If the variable Y is numerical, a linear regression model is performed
If the variable Y is categorical with two classes
- if survival analysis, the model performed is the Cox model
- otherwise, a logistic regression model is performed
If the variable Y is categorical with over two classes: no analysis possible

* pvalue.io considers a distribution to be normal when 1) the mean of the distribution is between the 40th and 60th percentiles and 2) the skewness value is less than 0.6.

[1] 1.Welch, B. L. The generalisation of student’s problems when several different population variances are involved. Biometrika 34, 28–35 (1947).

Print page

11 Mar Automatic test selection

Univariable test

Unpaired tests (independant measures)

Y is categorical with 2 classes

Y is categorical with over 2 classes

Y is numerical

Paired tests (2 measures for the same patient)

X is categorical

X is numerical

Multivariable statistical models

No Comments

Post A Comment