04 Jul Linear regressions
- When the outcome variable is numerical and continuous, the appropriate statistical model is the linear regression
- When there is only one explanatory variable which is categorical, linear regression yields a result close to a Welch or Student T test
For example, if we want to study the size of a child according to the size of her mother, Y is the size of the child and X is the size of the mother.
What’s the purpose of this?
Traditional statistical tests (Student test, Chi2 test for the most commonly used in medicine) determine whether the differences observed between 2 or more groups can be the result of chance by sampling fluctuation (it is then said that the null hypothesis of the absence of difference cannot be rejected) or whether such a difference cannot be due to chance (rejection of the null hypothesis).
These univariable tests, raise a key issue: they do not take into account potential confounding factors. However, these are common in medicine. It is therefore necessary to use more complex statistical methods, known as statistical regression models (Wikipedia).
Thus, it is possible to test each of the factors X that may have an influence on the variable Y, and to give them a weight (or a coefficient).
Assumptions of linear regression
There are always assumptions to check for statistical models. If you would like to know more, we suggest you to read the following post.
The significant association found by the test would be due both to the statistical association between smoking and cancer, and to the frequency of coffee consumption more frequent among smokers, thus constituting a known confounding bias.
How to perform linear regressions with pvalue.io
Let the intuitive software interface guide you
- Choose to perform an explanatory analysis
- Select the outcome variable (Y) and the factors known to influence the outcome variable (X)
- Check that no errors are found according to the descriptive analysis (by looking at the graphs and tables)
- Transform variables that are not linearly related to the outcome variable
- That’s it
If the assumptions of linear regression are not met, pvalue.io will inform you if action is required.
Interpretation of the results
The coefficients represent the variation of Y when the value of X increases by 1 unit.
The coefficients represent the variation of Y when the categorical variable is equal to the value of the class (in relation to the reference class)
It is common to set the alpha risk at 5%: it corresponds to the risk that one would assume a priori to conclude wrongly that a coefficient at least as high as this is not due to chance. In other words, it is the risk of wrongly concluding that the results obtained cannot be due to chance.
The p-value is computed a posteriori and corresponds to the probability that one has to observe a coefficient at least as high only because of chance.
Thus, when the p-value is lower than the alpha risk, the null hypothesis of nullity of the coefficient is rejected.
When a categorical (qualitative) variable has more than 2 classes, it is possible to calculate an global p-value for the class; this p-value corresponds to the test of the nullity of the coefficient when the class is not the reference.
In the table below, we wanted to know if the child’s birth weight was related to the mother’s age (Age Mother), the child’s sex, the rank of pregnancy and the fact that he or she has a malformation.
|Estimation [95% CI]||p||p global|
|Age mother||4.45 [-0.152, 9.0]||0.058|
|Sex||M vs F||138 [100, 180]||<0.001|
|Rank of pregnancy||gemellar vs single||-285 [-335, -234]||<0.001||<0.001|
|triple vs single||-442 [-589, -295]||<0.001|
|Malformation||yes vs no||-71.4 [-138, -4.87]||0.035|
We conclude as follows:
- The mother’s age does not influence the child’s weight (p > 0.05); for each additional year, the child’s weight increases by 4.45g, with a confidence interval including 0: [-0.152, 9.0]
- Being a boy significantly increases the child’s weight (+138g [100, 180])
- Gemellar pregnancy significantly reduces the child’s weight (-235g[-335, -234]) compared to a single pregnancy
- A triple pregnancy significantly reduces the child’s weight (-442g[-589, -295]) compared to a single pregnancy
- Overall, having a multiple pregnancy results in a lower weight in children (global p-value <0.001)
- Having a malformation significantly reduces the child’s weight (-71.4g [-138, -4.87])