Ancova is a combination of regression (using scores of some variables as direct entry) and anova. It can be thought of as partitioning the X side into a full-rank matrix of group effects plus a vector of direct variable effects. The direct variables are termed "covariables."
Main applications are:
1) variance Y is reduced by considering covariates, prior to testing for
group effects
2) in some cases, group means on Y can be seen as adjusted for differences
across groups on the covariates
3) in other cases, interest focuses on how the regressions (effect of
direct variables) differs across groups.
Common slope versus separate slopes models. Generally, test for common slopes, unless the problem explicitly requires separate.
return to the chapter table of contents7.2.1 Covariance model
Equations for the common slopes model are introduced. The values of coefficients in the categorical variable effects matrix are affected by reference group; the slope is unaffected by the parameterization of the group variable.
An example (see data for oysters.sas) is developed. Four replications (rep) are done in each of five experimental conditions (trt); the dependent variable is measured at the beginning (initial) and end (final) of the experiment.
Run GLM with trt as the class variable, and a model that includes trt and initial (the covariable). Ask for the parameters with /solution.
Type I SS for treatment is unadjusted (i.e. entered first into the equation). A test of this effect, without the covariable, would conclude that treatment has a significant effect. Adjustment (using type III SS -- or entering treatment last) dramatically reduces the SS (but the effect is still significant). Note that, while the SS for treatment is much smaller with the covariable, the power of the test is actually greater, because of the reduction in the error SS when the covariable is removed.
The effects use treatment 5 as the intercept, and the effects test for group differences from this group, controlling for the covariate. The slope is the population average or common slope across the 5 conditions.
7.2.2 Means and Least Squares Means
Unadjusted means can be obtained with a Means statement. Adjusted means can be obtained with lsmeans, specifying trt as the effect, and asking for / stderr and tdiff. A table of unadjusted and adjusted means, along with the mean score on the covariate can be helpful. A graphic is produced to show the parallel slopes, but differences in intercepts.
Care is needed in interpreting adjusted means. If the covariate (or adjustment) is causally related to the treatment, the logic of interpretation should be one of direct and indirect effects -- not partial adjustment.
estimates of means can also be obtained with the estimate command. This can also be used to test if specific groups difference in adjusted and unadjusted means are significant.
7.2.3 Contrasts
In this example, the five treatments could also be conceptualized as a 2x2 treatment design, with an additional control group. One might want to use contrasts to test specific differences in adjusted means (e.g. control versus all others combined; the two conditions of factor A; the two conditions of factor B.
The point is that contrasts can be applied to test virtually any function of means, adjusted for the covariates (again, remember, that all of this assumes parallel slopes)
7.2.4 Multiple Covariates
No problem with adding multiple covariates. Best to put the class variable in first, so that the sequential SS gives the unadjusted (type I) SS for the treatment -- so it can be compared to the adjusted (type III).
return to the chapter table of contentsAdjustments of means, and tests of adjusted effects are valid only if the slopes are actually parallel.
The model shows Y = intercept for class + vector of class differences + intercept for slope + vector of slope differences + error.
Estimated by model y = a x x*a / solution.
Interesting: and alternative approach is to see the covariable as nested within group, and ask for nointercept: model y = a x(a) / noint solution.
7.3.1 Testing heterogeneity of slopes.
from the example: model final=trt initial trt*initial / solution
Type I SS show, and test whether the addition of the interaction (differences in slopes) is significant (in the example, it's not).
7.3.2 Estimating different slopes
A new data example is introduced, on the sale of two varieties of oranges(Q1 and Q2) per customer, at six stores (store), over six days (day), at prices that changed from time to time at each store (P1 and P2 for the two varieties of oranges). Here, only Q1 is analyzed.
Run model q1=p1 day p1*day / solution. That is, how do sales depend on which day of the week (regardless of price), on price (regardless of day), and -- if the effect of price depends on day (or the effect of day depends on price).
result does not show an interaction. But, for illustration, we will continue...
The estimate command can (rather complexly) be used to get the estimated effect of price for each day. An alternative (simpler) method, seeing price as nested within day is shown
7.3.3 Testing treatment differences with unequal slopes
Where slopes differ, the differences between groups depend on what level of the covariable is examined -- because the group difference covaries with the adjustment variable.
The most common method of describing a group difference in the presence of unequal slopes is to compare the means for the groups predicted at the overall mean of the covariable (this can be misleading substantively, if the means of groups really do differ a lot from the overall mean!).
SAS will produce lsmeans for each level of the treatment at the mean lsmeans day/at means;
An example is shown of how to construct contrasts to test differences in adjusted means evaluated at the overall covariable mean (p. 246).
return to the chapter table of contentsHere a two-factor factorial design with two covariates is examined. Using the same oranges data, one variety.
The two treatment factors are day of the week and store -- completely crossed. Two covariables are the price of the variety in question (P1), and the price of it's competitor (P2). Since the two design factors, crossed, give a cell with one observation (as in a panel analysis, one observation for each store*day), it is not possible to evaluate the interaction of these two variables -- which, implictly shows up in error.
A model assuming equal slopes enters the two class variables, followed by the two price covariates (entry in blocks in SPSS would be helpful here). Ask for LS means for day, which is the primary interest. From the type I SS one can reconstruct the effects of unadjusted store and day, and test the increment to variance by the two prices. The mean differences are tests against the joint reference group (last store, last day). LS means for adjustments show the trends.
Contrasts and estimates could be applied.
return to the chapter table of contentsThis example uses the data set cotton. dependent variable is the weight of useable lint; one treatment is the variety of cotton; another treatment is the distances in the spacing of planting (two levels, 30 or 40). There are two plants at each variety*spacing factorial level (2 varieties * 2 spacings, with 2 plants in each). There are five to nine observations (bolls) per plant. We want to adjust for the covariate of the total weight of the boll to test, essentially, the proportion of useable cotton.
effects are variety, spacing, variety*spacing, plant nested within variety*spacing, total weight, and error (which is the sum of the variation between bolls within plants)
This is a complex, but not wildly complex design.
This could be run in GLM, but the effect of plant (nested in variety by spacing) should be regarded as a random effect. This can be done by adding a random statement to GLM; or by using MIXED.
The results of mixed differ from GLM because mixed adjusts for the random effect separately -- while GLM tests it separately, but estimates it as though it were fixed.
The analysis suggests that the variety*spacing effect is not needed, so the nesting of plant within that can also be dropped. coefficients and LS means can be recovered, as usual.
return to the chapter table of contentsWhere the treatment levels of a design are ordinal rather than nominal, orthogonal polynomials are sometimes used to summarize and analyze effects. In clinical and many other cases, the groups on treatment differ in intensity -- e.g. low doses, shorter exposure times).
Polynomial contrasts test for linear effects, then for additional quadratic effects, etc. (rarely, if ever, do we find more than 3rd order effects that are interpretable).
7.6.1 An example is given on doses of two drugs (type), conducted across four trials of sites (bloc). For each drug, three levels of dose ( 1 unit, 10 units, 100 units) are used -- here these are converted (log10) to logdose. So, the design has
Block (3 df for four trials)
Type (1 df for two drugs)
Log dose (2 df for three levels of intensity)
Type*logdose (2 df to see if effects of drug differ with dose, or if effects of
dose differ across drugs.)
error (15df = 24 observations less the above effects and grand mean)
Analysis wants to examine a linear and possibly quadratic effect of dose -- orthogonal polynomial contrasts are shown. One might also be interested in testing for non-linear effects within each drug type. Contrast coefficients can be calculated using an IML routine given and discussed later.
Note, and explain the form of the model statement (p. 257). Contrasts are used to test the effects.
7.6.2 IML to get contrasts
not read
7.6.3 Use of ANCOVA to accomplish the same tasks, more easily
An overall test of the utility of linear and quadratic components can be tested with:
GLM model y = bloc type logdose logd2 type*logdose type*logd2;
here, logd2 is log dose squared. So, the model looks at whether there is a difference between the two drugs in non-linearity (type*logd2); whether there is a main difference in the linear effect of dose between drugs (type*logdose), as well as general non-linearities. The regressions for each group can be constructed from the intercepts and slopes, or by use of estimate commands.
Another approach used PROC MIXED
class bloc type;
model y=type logdose(type) / noint solution;
random bloc;
This is clever, but I don't completely understand it.
return to the chapter table of contentsoutputs 7.1 to 7.6 data set oysters.sas
outputs 7.7 to 7.16 data set oranges.sas
outputs 7.17 to 7.20 data set cotton.sas
outputs 7.21 to 7.24 data set dose.sas
output 7.25 IML routine, commands iml1.sas