# 03 Jul Univariable and multivariable analyses

• We can consider the following three types of analyses: single variable descriptive statistics, univariable analyses (often named univariable) and multivariable (often unproperly named multivariate) analyses
• Single variable descriptive statistics are used to describe the data, and are useful for detecting problems
• Univariable and multivariable analyses allow statistical comparisons (obtaining a p-value), and only multivariable analyses allow confounding factors to be taken into account

NB: The terms univariate and multivariate are no longer recommended[1] but in practice still widely used.

### Single variable descriptive statistics

Before starting a statistical analysis, it is necessary to have a good knowledge of your data. What is the proportion of women? How old is the oldest patient?
Descriptive statistics answer these questions, and have the purpose of:

• Identifying outliers, i.e. patients with extreme values.
• Checking the distribution of the data: are they normally distributed?

Let’s imagine that in the age column, a patient is 182 years old; it is likely (unless you are studying the Jedis) that there was a mistake somewhere.
It will therefore be necessary to find the real age of this patient or assign him a missing value. If this error is not detected and fixed, statistical analyses taking into account age will be completely wrong.

Analysing the descriptive statictics of the variables is therefore a prerequisite for any statistical analysis, whether univariable or multivariable.

Figures are an integral part of the descriptive statistics because they allow to quickly visualize the distribution of the data.

Once you have selected the variables you want to describe, pvalue.io automatically creates a table and a figure. If the variable is quantitative, the table contains mean, standard deviation, median, 25th and 75th percentile, minimum and maximum; the figure then represents the distribution of the variable in the form of a histogram.
If the variable is qualitative, the table provides the number of subjects in each class; the graph represents the distribution in each class in the form of a bar graph.

Be careful not to confuse the term “Descriptive statistics of a single variable” with descriptive analysis. The term “descriptive analysis” is related to the objective of the analysis (to describe the data), but not to the way in which the data are analyzed (by cross-tabulating them with other data (univariate analysis) or not).

### Univariable analyses

Univariable analyses make it possible to specify the relationship between two variables: is blood pressure (variable 1) different according to sex (variable 2)? Is the proportion of smokers different according to eye colour? etc.
The purpose of univariable analyses is to answer the question: is the difference observed between my patients a real difference or is it due to chance? Univariable analyses are based on statistical tests, which provide a p-value (which is the probability that the observed difference is due to chance). The choice of the appropriate test depends on the variables to compare.

pvalue.io will automatically perform these tests in a table and generate:

• A boxplot if you cross-tabulate a numerical variable with a qualitative variable
• A bar plot if you cross-tabulate two qualitative variables
• Kaplan-Meier curves if you perform survival analyses
• A scatterplot if you cross-tabulate two numerical variables, as well as the calculated linear relationship.

Be careful, univariable analyses do not allow for confounding factors to be taken into account. Let us take a sample in which women are younger than men. We want to know if the treatment has a different effect on survival by gender. If we find a p < 0.05, is it because of sex or because of age?
Only a randomized trial can guarantee comparability of patient characteristics between groups. In this study design and only this one, univariate analyses alone are sufficient. Outside of a randomized trial, it is necessary to adjust with covariates. This is the purpose of multivariable analyses.

To remember: The well-known table 1 in medical articles is most often the result of a univariablee analysis, and describes all the characteristics of the patient at inclusion according to the main explanatory variable.

### Multivariable analyses

Multivariable analyses make it possible to take into account covariates, including confounding variables, by adjusting for these covariates. They are therefore recommended when attempting to identify a statistical relationship between several variables. Multivariable analyses use more sophisticated statistical methods than univariable analyses, and are rarely available in software for non-statisticians.

In the previous example, the adjustment on age allows us to conclude: if the men and women in my sample were the same age, then the effect of treatment would be (or not) statistically significant.

Multivariable analyses are performed using statistical models. The most commonly used in medicine are the linear and logistic regressions, as well as Cox models.

Statistical models make it possible to obtain p-values. They have a non-negligible additional interest: they make it possible to measure the extent to which a factor affects the outcome variable. These association measures are:

• Odds Ratio for logistics regressions
• Hazards Ratio for Cox models
• Estimates or coefficients for linear regressions

The p-value gives information on statistical significance, the association measures quantify the relationship between two variables.

Statistical models require that a number of conditions are met.

[1] 1. Tsai AC. Achieving Consensus on Terminology Describing Multivariable Analyses. Am J Public Health. 2013;103(6):e1. doi:10.2105/AJPH.2013.301234