11 Mar Descriptive, explanatory and predictive analyses
One typology of statistical analyses is based on their purpose:
- Descriptive analyses, to describe the variables, either individually (descriptive statistics), or by cross-tabulating them with another variable (by performing univariables analyses)
- Explanatory analyses, to determine the influence of one or more variables on another (for example using an Odds Ratio)
- Predictive analyses, to classify patients into two groups based on their characteristics
pvalue.io uses this typology to describe the type of analysis to implement.
Descriptive analyses involve the analysis of single-variable statistics and univariables analyses. These analyses are useful for quickly identifying notably extreme data or outliers, and obtaining p-values. Results are displayed in tables and figures. The descriptive analyses are crucial to get the Table 1 of all medical scientific articles (univariable analysis of all patient characteristics according to exposure/treatment).
Explanatory analyses use complex statistical models such as regressions. These analyses make it possible to determine the strength of the association between a response variable (explained variable Y) and one or more explanatory variables (X). They also make it possible to test the statistical significance of this association (in the form of a p-value). The strength of the association can be:
- Odds Ratios for logistic regressions
- Hazard Ratios for Cox’s Model
- Estimates or a coefficients for linear regressions
When there is only one explanatory variable, then the analysis is univariable, otherwise, it is multivariable.
In this type of analysis, it is important that the result of the modeling is simple to interpret, which is why pvalue.io prefers to propose to split an explanatory variable whose linearity cannot be assumed rather than transforming this variable into a natural spline. This is also why the coefficients of the extraneous variables are not displayed.
The objective of a predictive analysis with pvalue.io is to develop a prediction model, like a score. These models aim at classifying a given patient according to her demographic, clinical or para-clinical characteristics.
With a predictive analysis, we obtain the probability that the patient is classified in one group rather than the other. This type of analysis makes it possible to answer the question: what is the probability that a male patient, 68 years old, smoker, asymptomatic, last vaccination performed 3 months ago has a long covid?
Prediction models are based on multivariable models (multiple logistic regression). Unlike explanatory analyses, for which the results must be simple to interpret, for predictive analyses, what is most important is the ability to correctly classify patients (good discrimination: high area under the curve, and good calibration: the probability that they are classified in the right group is correct). This is why pvalue.io proposes to transform into a natural spline the variables for which the log-linearity hypothesis is not assumed: even if the interpretation is tricky, the prediction is more accurate.
Once this model is developed, it is necessary to validate it by estimating the performance of the model on other patients than those used to develop it.