08 Jul Logistic regression
- When the response variable is binary and not censored, the appropriate statistical model is logistic regression;
- When there is only one explanatory variable which is categorical, the logistic regression yields a result similar to a Chi2 test;
For example, if we want to explain the probability of being born male as a function of dietary intake, Y is the gender, and X is the dietary intake.
What is the purpose of a logistic regression?
As with the linear regression and statistical models in general, logistic regression allows to perform multivariable analyses, i.e. to take into account the confounding factors, unlike the tests widely used in medicine (T-tests, Khi2, Mc Nemmar, etc); the latter allowing a comparison of only 2 variables between with each other.
However, it is very often necessary to take into account the confounding factors. And in this case, a univariable test is not appropriate.
How to perform logistic regressions with pvalue.io
Performing logistic regressions should not cause you any problems with pvalue.io :
- Choose to perform an explanatory analysis
- Select the outcome variable (Y) and the variables known to have an impact on it (X)
- Check that there are no errors according to the single-variable descriptive statistics (by looking at the figures and tables).
- Transform variables that are not linearly related to the response variable
Interpretation of the results
By default, statistical software provides for each variable X included in the model, a coefficient, the confidence interval of this coefficient and a p-value.
The coefficient and its confidence interval are between -∞ and +∞.
However, it is common practice to display not the coefficient, but rather its exponential value; this value is then bounded between 0 and +∞.
It can be shown mathematically that the exponential of the coefficient when the variable X is categorical is the odds-ratio.
If Y can have two values 0 and 1 and X can have two values A and B, if the odds ratio of B vs A is greater than 1, the proportion of Y=1 is higher when X=B than when X=A.
The interpretation of the results regarding the numerical variables X cannot be approached without knowledge of the mathematics underlying the logistic regression: it is the odds ratio of the mean of X+1 vs. the mean of X.
Simply put, if the odds ratio is less than 1, variable X reduces the probability that variable Y=1.
In the following example, we wanted to know if the child’s gender (0 = female) depended on the technique used and the age of the parents.
|Odds Ratio [IC]||p||p global|
|Age Mother||1.00 [0.974, 1.03]||0.9|
|Age Father||0.995 [0.977, 1.01]||0.6|
|Technique||ICSI vs IVF||0.836 [0.66, 1.06]||0.1||0.2|
|IMSI vs IVF||0.836 [0.64, 1.09]||0.2|
We conclude as follows:
- Neither the age of the mother nor the age of the father has an influence on the probability that the child will be male (p > 0.05);
- The ICSI technique does not statistically modify the probability that the child is male (p > 0.05) compared to IVF
- The IMSI technique does not statistically modify the probability that the child is male (p > 0.05) compared to IVF
- Overall, the technique used has no influence on the probability that the child is male (p global); 0.05)