03 Jul How to perform a multivariable analysis when you have too few observations

Posted at 14:20h in Statistical Methods, Tutorials by Kevin 6 Comments

It is sometimes surprising not to be able to carry out a multivariable analysis because the number of subjects is too small while the file contains several hundred observations (patients, subjects).

Linear regressions

For linear regressions, i. e. multivariable analyses for which the outcome variable is numerical, it is necessary to have at least 10 observations per covariate.
A small refinement, when the covariate is categorical with N classes, it counts as N-1 variables. For example, let us take the categorical variable “satisfaction” with the following 5 classes:

Not at all satisfied
Somewhat dissatisfied
Moderately satisfied
Somewhat satisfied
Very satisfied

When this variable is used in a statistical model, it is automatically coded into 4 dummy variables, each of which is 0 or 1.

Satisfaction	Very satisfied	Somewhat satisfied	Moderately satisfied	Somewhat dissatisfied
Very satisfied	1	0	0	0
Somewhat satisfied	0	1	0	0
Moderately satisfied	0	0	1	0
Somewhat dissatisfied	0	0	0	1
Not at all satisfied	0	0	0	0

Tip: if you do not have enough number of subjects, start by grouping the classes of the categorical variables.

Logistical regressions and survival analyses

For logistic regressions and survival analyses, i.e. when the outcome variable is binary, it is slightly more complex. There must be at least 10 observations per variable, but be careful, not calculated on the total number of subjects, but on the number of subjects for whom the outcome variable is 0 and for whom the outcome variable is 1.

Thus, if the number of subjects is 179 distributed as follows: 29 patients with Y = 0 and 150 with Y = 1, the maximum number of covariates will 2.

Print page

6 Comments

HHShah
Posted at 18:44h, 02 December Reply
We want to evaluate the three different components each having three or more variable , to analyze the outcome variable (four outcome variable) and need to do inference that which component are determinant of outcome variable?
- Kevin
  Posted at 13:08h, 03 December Reply
  I’m not sure I understand perfectly. Do you mean that you want to perform an explanatory analysis of a categorical outcome variable having more than 2 categories? If that’s the case, you need to perform a so-called multinomial logistic regression, which is currently not possible with pvalue.io.
alaa roushdy
Posted at 00:44h, 14 June Reply
i want to do a multivariate analysis with a binary outcome variable and want to include in the explanatory variable all the variables which were significant in the univariate analysis
but your site give me a pop up message that the explanatory variables chosen are too much
i used to do this with medcalc software without any problem
why are you putting a limit to the number of variables that can be used in a multivariate model
- Kevin
  Posted at 06:24h, 14 June Reply
  The difference between pvalue.io and other statistical software is that it is aimed at people who are not professionals in statistical analysis. In particular, there are a certain number of conditions to be met in the statistical models, including a limited number of covariates. Typically these are not checked in other software. This is why pvalue.io allows you to make correct statistical analyses even without advanced knowledge in statistics.
Teshome Gensa GETA
Posted at 14:12h, 11 February Reply
Is it proper to run logistic regression analysis with extremely few observations on outcome variable? For example, with total sample of five thousand participants, 99% having outcome yes (Y=0) whereas 1% having outcome no (Y=1). Any one can help me with explanation behind it.
- Kevin
  Posted at 09:09h, 26 July Reply
  Yes, please visit: How to perform a multivariable analysis when you have too few observations, based on Peduzzi, P., Concato, J., Kemper, E., Holford, T. R. & Feinstein, A. R. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 49, 1373–1379 (1996).

03 Jul How to perform a multivariable analysis when you have too few observations

Linear regressions

Logistical regressions and survival analyses

6 Comments

HHShah

Kevin

alaa roushdy

Kevin

Teshome Gensa GETA

Kevin

Post A Comment