08 Jul Logistic regression
- When the outcome variable is binary and not censored, the appropriate statistical model is logistic regression;
- When there is only one explanatory variable which is qualitative, the logistic regression yields a result similar to a Chi2 test;
For example, if we want to explain the probability of being born male as a function of dietary intake, Y is the gender, and X is the dietary intake.
What is the purpose of a logistic regression?
As with the linear regression and statistical models in general, logistic regression allows to perform multivariable analyses, i.e. to take into account the confounding factors, unlike the tests widely used in medicine (T-tests, Khi2, Mc Nemmar, etc); the latter allowing a comparison of only 2 variables between with each other.
However, it is very often necessary to take into account the confounding factors. And in this case, a univariate test is not appropriate.
How to perform logistic regressions with pvalue.io
Performing logistic regressions should not cause you any problems with pvalue.io :
- Select the outcome variable (Y) and the variables known to have an impact on it (X)
- Check that there are no errors according to the descriptive analysis (by looking at the figures and tables).
- Deselect the factors with a statistical link to Y that have been automatically selected by pvalue.io but that do not make clinical sense
- Transform variables that are not linearly related to the outcome variable
Interpretation of the results
By default, statistical software provides for each variable X included in the model, a coefficient, the confidence interval of this coefficient and a p-value.
The coefficient and its confidence interval are between -Inf and +Inf.
However, it is common practice to present not the coefficient, but rather its exponential value; this value is then bounded between 0 and +Inf.
It can be shown mathematically that the exponential of the coefficient when the variable X is categorical is the odds-ratio.
If Y can have two values: 0 (the reference) and 1 and X can have two values: A and B:
If the odds ratio of B vs A is greater than 1, the proportion of Y=1 is higher when X=B than when X=A.
The interpretation of the results regarding the quantitative variables X cannot be approached without knowledge of the mathematics underlying the logistic regression: it is the odds ratio of the mean of X+1 vs. the mean of X.
Simply put, if the odds ratio is less than 1, variable X reduces the probability that variable Y=1.
In the following example, we wanted to know if the child’s gender (0 = female) depended on the technique used and the age of the parents.
|Odds Ratio [IC]||p||p global|
|Age Mother||1.00 [0.974, 1.03]||0.9|
|Age Father||0.995 [0.977, 1.01]||0.6|
|Technique||ICSI vs IVF||0.836 [0.66, 1.06]||0.1||0.2|
|IMSI vs IVF||0.836 [0.64, 1.09]||0.2|
We conclude as follows:
- Neither the age of the mother nor the age of the father has an influence on the probability that the child will be male (p > 0.05);
- The ICSI technique does not statistically modify the probability that the child is male (p > 0.05) compared to IVF
- The IMSI technique does not statistically modify the probability that the child is male (p > 0.05) compared to IVF
- Overall, the technique used has no influence on the probability that the child is male (p global); 0.05)