# 03 Jul How to perform a multivariable analysis when you have too few observations

It is sometimes surprising not to be able to carry out a multivariable analysis because the number of subjects is too small while the file contains several hundred observations (patients, subjects).

## Linear regressions

For linear regressions, i. e. multivariable analyses for which the outcome variable is numerical, it is necessary to have at least 10 observations per explanatory variable.

A small subtlety, when the explanatory variable is categorical with N classes, it counts as N-1 variables. For example, let us take the categorical variable “satisfaction” with the following 5 classes:

- Not at all satisfied
- Somewhat dissatisfied
- Moderately satisfied
- Somewhat satisfied
- Very satisfied

When this variable is used in a statistical model, it is automatically recoded into 4 binary variables, each of which is 0 or 1.

Satisfaction | Very satisfied | Somewhat satisfied | Moderately satisfied | Somewhat dissatisfied |

Very satisfied | 1 | 0 | 0 | 0 |

Somewhat satisfied | 0 | 1 | 0 | 0 |

Moderately satisfied | 0 | 0 | 1 | 0 |

Somewhat dissatisfied | 0 | 0 | 0 | 1 |

Not at all satisfied | 0 | 0 | 0 | 0 |

## Logistical regressions and survival analyses

For logistic regressions and survival analyses, i.e. when the outcome variable is binary, it is slightly more complex. There must be at least 10 observations per variable, but be careful, not calculated on the total number of subjects, but on the number of subjects for whom the outcome variable is 0 and for whom the outcome variable is 1.

Thus, if the number of subjects is 179 distributed as follows: 29 patients with Y = 0 and 150 with Y = 1, the maximum number of explanatory variables is 2.

## HHShah

Posted at 18:44h, 02 DecemberWe want to evaluate the three different components each having three or more variable , to analyze the outcome variable (four outcome variable) and need to do inference that which component are determinant of outcome variable?

## Kevin

Posted at 13:08h, 03 DecemberI’m not sure I understand perfectly. Do you mean that you want to perform an explanatory analysis of a categorical outcome variable having more than 2 categories? If that’s the case, you need to perform a so-called multinomial logistic regression, which is currently not possible with pvalue.io.