03 Jul How to perform a multivariable analysis when you have too few observations
It is sometimes surprising not to be able to carry out a multivariable analysis because the number of subjects is too small while the file contains several hundred observations (patients, subjects).
For linear regressions, i. e. multivariable analyses for which the outcome variable is numerical, it is necessary to have at least 10 observations per explanatory variable.
A small subtlety, when the explanatory variable is categorical with N classes, it counts as N-1 variables. For example, let us take the categorical variable “satisfaction” with the following 5 classes:
- Not at all satisfied
- Somewhat dissatisfied
- Moderately satisfied
- Somewhat satisfied
- Very satisfied
When this variable is used in a statistical model, it is automatically recoded into 4 binary variables, each of which is 0 or 1.
|Satisfaction||Very satisfied||Somewhat satisfied||Moderately satisfied||Somewhat dissatisfied|
|Not at all satisfied||0||0||0||0|
Logistical regressions and survival analyses
For logistic regressions and survival analyses, i.e. when the outcome variable is binary, it is slightly more complex. There must be at least 10 observations per variable, but be careful, not calculated on the total number of subjects, but on the number of subjects for whom the outcome variable is 0 and for whom the outcome variable is 1.
Thus, if the number of subjects is 179 distributed as follows: 29 patients with Y = 0 and 150 with Y = 1, the maximum number of explanatory variables is 2.