Group Randomized Clinical Trials: Statistical Analysis
This section presents approaches for statistical analysis of group-randomized or cluster-randomized clinical
trials. Recall from last lecture that you can’t just do the same analysis you would if the assignment to group
were random or if there were no within-group correlation. The choice of analysis depends, as usual, on:
•
The research question (or specific aim);
•
The study design (what treatments, how assigned, when were patients observed);
•
The endpoint or outcome measure (dichotomous, continuous, survival).
A classification scheme for models for relationship of outcome to treatment and covariates
Group and cluster-randomized studies need to reduce variance as much as possible, because of within-group
clustering. Usually this leads to some model-based analysis. The choice of model depends on the outcome,
the relationship to fixed effects (treatment, other covariates), and the sources of random variation.
Is there just one source of random variation (typically patient-to-patient variance), or more?
•
If only one source, between-patient, we can use standard approaches such as proportional hazards, linear
models, logistic regression.
•
If we have two or more sources of random variation, we will need mixed models.
- Several measurements per patient and within-patient random variation as well as between-patient:
random effects models for repeated measures.
- Patients nested within randomly chosen groups (e.g. clinic, town, physician practice): hierarchical
or nested random effects models.
Is the outcome measure continuous, with residuals that are approximately normal (or can be transformed to
yield normal distribution)?
•
Gaussian data: usually some linear model setting.
•
Non-Gaussian data: generalized linear model, allowing for variation to be binomial Poisson, or other.
•
Survival data: typically uses proportional hazards models.
Is the relationship between the outcome and the predictor linear or other?
•
Linear relationship: more typical for continuous data.
•
Non-linear relationship, e.g. logistic: some appropriate link function, e.g. logit.
The linear mixed model is widely discussed in literature; Laird an Ware is a standard paper. The model is
usually written as:
Y
=
X
β
+
ZA
+
e
.
The outcome
Y
represents a vector of observations from the same person (in repeated measures data) or
from the same group or cluster. The observed values are the sum of fixed effects based on treatment and
covariates, random effects
A
at the level of the cluster, and random effects
e
at the level of the observation
within cluster. The random effects are assumed here to be normally distributed.