How to perform a multivariate analysis when you have too few number of subjects

It is sometimes surprising not to be able to carry out a multivariate analysis because the number of subjects is too small while the file contains several hundred observations (patients, subjects).

Linear regressions

For linear regressions, i. e. multivariate analyses for which the outcome variable is numerical, it is necessary to have at least 10 observations per explanatory variable.
A small subtlety, when the explanatory variable is categorical with N classes, it counts as N-1 variables. For example, let us take the categorical variable “satisfaction” with the following 5 classes:

  • Not at all satisfied
  • Somewhat dissatisfied
  • Moderately satisfied
  • Somewhat satisfied
  • Very satisfied

When this variable is used in a statistical model, it is automatically recoded into 4 binary variables, each of which is 0 or 1.

SatisfactionVery satisfiedSomewhat satisfiedModerately satisfiedSomewhat dissatisfied
Very satisfied1000
Somewhat satisfied0100
Moderately satisfied0010
Somewhat dissatisfied0001
Not at all satisfied0000
Tip: if you do not have enough number of subjects, start by grouping the classes of the categorical variables.

Logistical regressions and survival analyses

For logistic regressions and survival analyses, i.e. when the outcome variable is binary, it is slightly more complex. There must be at least 10 observations per variable, but be careful, not calculated on the total number of subjects, but on the number of subjects for whom the outcome variable is 0 and for whom the outcome variable is 1.

Thus, if the number of subjects is 179 distributed as follows: 29 patients with Y = 0 and 150 with Y = 1, the maximum number of explanatory variables is 2.

No Comments

Post A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.