Data that contain multiple measures (on the same dependent variable) for each subject are very common. Time-series data (many measures on Y at equally spaced time intervals, with one case), panel data (usually a smaller number of observations in time, not necessarily equally spaced, for multiple cases), as well as many experimental and quasi-experimental designs (simple pre-test, post-test; trend studies) are all "repeated measures" designs.
Multiple observations on the same case allow us to use a case as it's own comparison, or control to some degree. That is, many factors that might be related to the outcome of interest may be constants within individuals. Statistically, though, the observations of the dependent variable are no longer independent -- and to get proper estimates of variability for doing hypothesis tests, we need to take this non-independence into account.
The text gives and example of measuring a medical outcome FEV1 at the beginning of drug trials, and then again at each hour for eight hours; three drugs are randomly assigned to 24 patients each. So, the data set would contain 24*9 observations of the dependent variable with 8*9 in each treatment group. The model:
Yijk = m + ai + gk + (ag)ik + eijk observes drug (a) by time (g) and their interactions, across individuals (j). The main effect of drug can be thought of as an average effect; the main effect of time can be thought of as a trend, and the interaction asks if the trends differ between drugs. This example is basically a factoral design of drug by time, but with the unique feature that the observations occur within the same individual persons.
In this case, as in most other panel studies, we cannot assume that the errors eijk are independent. First, any two observations for the same person are likely to be more correlated than any two random observations because of unmeasured factors unique to that individual (called cross-sectional correlation of errors in the literature on pooled cross sections and time series or panel analysis). Second, any two observations that are closer in time are more likely to be correlated than two observations more distant in time (called serial autocorrelated error in time series analysis)
An analogy is made to the agricultural "split plot" experiment -- where each of a number of different treatments are administered across spatial areas. Any two observations, regardless of treatment, within an area (main plot) are likely to be more similar than any randomly drawn pair of observations. The split-plot design, though, administers all levels of the treatment within each plot (splitting the plot). In the current case, there is only one treatment for each plot (patient).
Three general methods have been used to analyze repeated measures types of data, and will be discussed in the chapter:
1) Univariate ANOVA could be applied, treating the data as a "split plot" design of a sort. The patient would be considered the main "plot." Time would be considered as the sub-plot. This approach deals with the correlated error by subject, but does not deal with correlated errors in time component. This approach was called the "split plot in time" approach. If the correlations between measures for subjects are the same regardless of how far apart they are in time, this approach is ok. The means of the dependent variable over time can change without violating this; but, often the variance (and hence the covariance, and hence the correlation) of the dependent variable changes over time (e.g. regression to the mean), in which case, ignoring the serially correlated error can lead to mistaken inferences.
2) The "Analysis of Contrasts" method transforms the data to remove subject and time variance. For example, one could regress the 9 observations for each subject on time, and then use the slopes as a new dependent variable (with a simple before-and-after design, this would be the same thing as taking the difference as the dependent variable). The slopes of the within-subjects regression (or trends) adjust for some individual differences because the within-subject mean comes out in the intercept. The serial correlation of errors is essentially assumed to be the same across subjects, and is ignored as residual. This method does not actually examine the error or covariance structure, and is not recommended. The REPEATED command in GLM can implement this approach.
3) The contemporary approach is to analyze panel data using MIXED model methods. There are two steps. First, the covariance of errors is estimated, then this is used as constraints on the error covariance matrix to derive GLS estimates of the effects.
The data set FEV1MULT is displayed which shows the multiple observations over time for each case. For the ANOVA and MIXED models approaches, the data are re-arranged into the data set FEV1UNI which has a single data line for each observation (that is, 8 lines for each patient with the baseline score as a covariate, a variable indicating which drug, and a numerical variable (1-8) for time of observation).
return to the chapter table of contents
One approach to these data would be to treat each observation separately, have an effect of drug (a), an effect of patient (b - treated as a random effect), a common effect of time, and a deviation of the time slopes for the drugs.
Yijk = m + ai + bij + gk + (agik) + eijk
the mean for subjects (b) is assumed independent of the error. The vector of effects of subjects (b) is often called the "between-subjects" effect. In some approaches to the analysis of pooled cross-sections and time series, the vector of adjusted means for individuals is also used (e.g. "least-squares dummy-variables" or (confusingly) "fixed effects"). Note, however, that the approach illustrated here more correctly treats between subject variance as random. The residual variance eijk is sometimes called the "within subject effect"
The total variance of an observation is equal to the between subject and within subject variance components. The covariance between the observations on the same subject at any two points in time is the between subject variance component. This assumes that there is no serial correlation of errors.
8.2.1 Using GLM to do Univariate ANOVA of repeated measures
All three variables (treatment, subject, time) are treated as class
variables. The model used is:
model fev1=drug patient(drug) hour drug*hour / ss3;
random patient(drug) / test;
Note that the patients are nested within drugs -- that is, each patient receives only one drug. This patient effect is treated as random.
In interpreting the results, the correct error terms need to be used because of the nested and random effects. So, the GLM estimates of SS are not useful, and the corrected estimates based on the estimated mean squares must be used. Note that these effects for drug are not the same.
8.2.2 Contrast, Estimate, and LS Means
All of these tools can be used to describe effects, but the inference tests produced by GLM are wrong because they do not take into account the assumption of random effects of subject. LS means inference results are not correct, but tests for contrasts can be corrected by specifying the correct denominator.
return to the chapter table of contents
The previous analysis is valid if we assume "compound symmetry" where repeated measures have equal variance, and the correlations between any two measures are the same.
8.3.1 Univariate ANOVA of Repeated Measures at Each Time
A useful diagnostic step in working with repeated measures is to run each cross section separately (e.g. does drug have an effect at time 1? at time 2? etc. This allows one to see if the residual variances are approximately equal (compare MS error across times); allows us to see if there are significant associations at any points in time, and possible trends; and which effects are significant.
8.3.2 Using the REPEATED statement in PROC GLM
If the model statement lists multiple dependent variables, and there is no repeated statement, multiple analyses are done. If the REPEATED statement is included, GLM treats this as a multivariate ANOVA.
model fev11h fev12h fev13h= drug;
repeated hour / printe;
"hour" is just a label here. PRINTE produces sums and cross
products of the residuals of the multiple dependent variables -- useful for
examining error structures.
The correlation matrix of the residuals is a useful diagnostic tool. If the correlations display trends over time (usually with two dependent variables closer in time having stronger residual correlation than two further apart in time) then the univariate ANOVA approach discussed above is flawed, because there is not compound symmetry.
The Mauchly criterion "test for sphericity" is printed also by printe. This is a chi-square test of the null hypothesis of random errors and equal error variances. Rejection of this null suggests that compound symmetry is not appropriate.
The multivariate tests generated (Manova F tests) examine an overall effect. The text puzzles me here. The code seems to suggest a model with drug as the only IV -- which seems reasonable. The output shows tests for drug and drug*hour interaction -- but the code doesn't mention "hour" as a variable...
8.3.3 Univariate ANOVA of contrasts of Repeated Measures
Adding SUMMARY to the repeated statement tests each of the hourly means against the last, and tests the contrast between the two drugs at that hour versus the last hour.
Using the REPEATED statement to get the covariance structure of the errors is very useful, and even produces a test that allows the rejection (or not) of the simplist ANOVA treatment of the data. This is very useful diagnostic work, but the best modeling strategy, overall, is to use a MIXED models approach.
return to the chapter table of contents
The split plot in time ANOVA approach discussed earlier was a common approach to panel data. But this approach does assume no "within-subjects" error correlation -- that is, no residual autocorrelation. This is often a poor assumption, and, if ignored, can lead to false-positive inference about treatment effects. The GG and HF corrections in GLM repeated attempt adjust the F tests, but there are better approaches.
MANOVA, in contrast, assumes a unique, non-zero correlation between each pair of observations. This will fit, but uses up a lot of power, when simpler error structures would probably suffice. Also, for MANOVA, if any observation for a subject is missing, the whole subject is lost.
MIXED provides more convenient ways for modeling error structures among the
repeated dependent variables. But one needs to be sure these assumptions
are realistic, so a multi-step strategy of analysis is wise:
1) Model the structure of means using fixed effects
2) Specify a covariance structure both between subjects and within
subjects
3) Fit the means model accounting for the covariance structure specified
4) Then make tests and inferences, including simplifying the means model
if possible
The remainder of this section illustrates techniques and approaches...
8.4.1 The Fixed Effects Model and Related Considerations
Examine the model:
Yijk = intercept + ai + bij +gk +(ag)ik + eijk
that is, the outcome is a constant plus an effect of drug, an effect of hour, an
effect of drug*hour interaction, and a subject effect within drug (bij).
To this model, it may be helpful to add the covariate X of the baseline score
for each subject.
The author argues that the inclusion of the baseline score as a covariate is a good thing, but difficult in GLM.
In the mixed model approach, the effect of subject (bij) and the error within subject (eijk) are considered random effects, and are assumed to be independent of one another. The between subjects errors (bij) are assumed normally distributed.
The within-subjects errors (general residuals) cannot be assumed to be uncorrelated, since they (in this case) have both a time and a subject component. PROC MIXED (and SPSS mixed models) allow the specification of constraints on the residuals.
These constraints are selected by specifying a form for the matrix of within-subject residuals, known as sigma. This matrix is a picture of the correlations of the residuals for the repeated measures (e.g. time 1 with time 2, time 1 with time 3, etc.). The main alternatives:
Compound symmetry covariance model: Each dependent variable has an error variance, and there is a constant (non-zero) correlation among the error terms. This model treats the multiple measures as independent in time, but possibly correlated because of uniqueness of the subjects.
Independence covariance model: Each dependent variable has an error variance, but there are no off-diagonal elements. That is, the errors are uncorrelated in time or across subjects.
First-order autoregressive covariance model: Correlations among errors decline exponentially with distance (e.g. r12 = p; r13 = p2; r14=p3, etc.
Unstructured covariance model: Each correlation is non-zero, but unique. E.g. r12, r13, r14... display no fixed pattern, but are not assumed zero. This argues that there are unique components in each response that may co-vary.
Toeplitz covariance model: Similar to AR(1) in that all correlations at the same distance have the same correlation. But no assumption of exponential decay. The AR(1) model can be estimated with a single parameter (what is the exponent of the distance); the Toeplitz model has as many parameters as there are distances (e.g. if there were 3 measures, then there would be two distances -- one unit distant, two units distant -- and two parameters would be estimated.
Toeplitz and AR1 are reasonable choices for evenly-spaced observations where we have no reason to suppose that the error structure is changing over time.
ANTE(1) covariance model: first-order ante dependence is more general than Toeplitz or AR1. aside: this looks a lot like a moving-average model. Here, the covariance between two time points is a function of the product of variances at both points (hence allowing hetrogenity of error variance across measures to affect the correlation) and the product of the correlations at the distances up to the one chosen. For example, if the correlation of 12 = .50, and the correlation of 23 = ..20, then the correlation of 13 would be .10 (then weighted by the variances of 1 and 3). This is more general and sensible, but does require estimating 2K-1 parameters.
8.4.2 Selecting an Appropriate Covariance model.
The author's research suggests that the MIXED model approach to repeated measures is fairly robust against minor errors in specification of the error terms, but is compromised by major errors.
One approach to diagnosis is to examine the error covariance and correlation matrices graphically. The author suggests running the model in MIXED fully specified, but asking for an unstructured covariance matrix. Recover the residual correlations and covariances. For trend analysis, it is suggested that the error correlations or covariances be plotted separately for each starting time. That is, one trend line is lag 1, lag2, lag3....for the errors starting at time zero. A second trend is the lag 1, lag 2, lag 3... for errors starting at time 1, etc.
Conventional diagnostics for time-series apply. Linearly declining correlations or covariances with increasing lag suggest a AR1 or ANTE(1). If the lines for the multiple trends overlay (e.g. have the same mean) then the variances are approximately equal. If they don't then an error structure that allows variance heterogeneity is more appropriate. The first example does not include the baseline measurement (why???).
8.4.3 Reassessing the Covariance Structure with a Means Model accounting for Baseline Measurement
As in the previous section, estimate the model with an unspecified error structure, and plot or print the covariances and correlations. Adjustment for the starting point of the series may well reduce variance heterogeneity, and allow for the application of a simple AR(1) model.
8.4.4 Information Criteria to Compare Covariance Models
Model fit can be compared using -2 residual log likelihood (which increases with goodness of fit), or three information criteria measures. The latter measures are based on the likelihood, but make corrections for the numbers of parameters of the estimated model. AIC and Schwarz are a bit older, Burham and Anderson is newer (AICC).
Run the model for each plausible error structure, recover the fit statistics, and compare all indicies. Choose the most restrictive model consistent with the data.
8.4.5 PROC MIXED analysis of the FEV1 data
Having selected an error model, one can proceed to interpretation of the results.
Correct tests are given for the fixed effects (in this example, the effects are drug, trend, and drug*trend interaction, and baseline score). All fixed effects are found to be significant. The "naive tests" may be too optimistic in the presence of error correlation. The Kenward and Roger's correction is better, and is requested with ddfm=kr; on the model statement.
Plotting the means for the drug*hour interaction may help interpretation of this (the most interesting) effect. Ask for ls means for the effectd, output them, then plot the estimated.
8.4.6 Inference on the Treatment and Time Effects of FEV1 data using PROC MIXED
Having demonstrated that the three trends of drug*hour do differ, how can one specify and test the differences? Two approaches are suggested.
8.4.6.1 Comparison of Drug*Hour means
One approach is running to look at the differnces in ls means for the interaction "sliced" by hour, and to design contrasts on the interaction.
8.4.6.2 Comparisons using Regression
To test for differences in a linear trend (or a quadratic trend, as shown in the example) one can create the continuous covariate hour (1,2,3,...) and include it and it's interactions in the mixed model. A bit tricky, as priors must be specified -- see the book.
return to the chapter table of contents
return to the chapter table of contents