Linear Models

Analyzing data with random effects (Littell, chapter 4)

This site supports tutorial instruction on an linear models, based on Littell, et al. (2002)

The materials in this page are parallel to Littell's text:

- 4.1 Introduction
- 4.2 Nested classifications
- 4.3 Blocked designs with random blocks
- 4.4 The two-way mixed model
- 4.5 A classification with both crossed and nested effects
- 4.6 Split-plot experiments
- Data sets

4.1 Introduction

The key conceptual distinction is between a fixed effect and a random effect. This is important because the inference and analysis of fixed and random effects are different.

**Fixed effects** can be thought of as "treatment" levels that
we have selected for inclusion in a study, which are the only levels of the
variable in question in which we have an interest. In an experiment, we
might have a treatment group and a control group. The purpose of the study
is to compare these two groups -- we are not trying to generalize to other
treatments that we might have included, but didn't. In a non-experimental
setting, a variable may have a small number of levels, and we have included them
all -- gender, for example, might be treated as a fixed effect because we have
included all possible levels of this "treatment" variable in our study
(i.e. males and females). Or, we may have a variable that has, many
possible levels, but we are only interested in generalizing study results to the
ones that we happened to include. For example, suppose that we did a
survey using cluster sampling -- and selected 10 cities at random.
Normally, "city" could be thought of as a factor in the design, with
10 levels. If we did not want to generalize our results to all of the
cities that might have been selected (for example, the 10 cities that we did
include could be regarded as a sample of 10 drawn from a larger population of
cities) -- but only wanted to generalize to the 10 we included, then
"city" could be treated as a "fixed" effect.

In the case of selecting 10 cities from a larger population of cities, just
discussed, we would more naturally treat the variable "city" as a **random
effect**. That is, we would regard the effects of "city" as a
random sample of the effects of all the cities in the full population of
cities. Conceptually, a variable's effects might be treated as random
effects if we can think of the levels of the variable that we included in the
study as a sample drawn from some larger (conceptual) population of levels that
could (in principle) have been selected.

One key difference between fixed and random effects is in the kind of information we want from the analysis of the effects. In the case of fixed effects, we are usually interested in making explicit comparisons of one level against another. For example, we very much would want to compare the mean of the "control group" to the mean of the "treatment group" in an experiment. If explicit comparison of the levels of a variable against one another is the goal of the research, then the levels of the variable are usually treated as "fixed." If, on the other hand, our primary interest is in the effects of other variables or treatments across the levels of a factor (e.g. the effect of gender on voting, across samples from 15 nations), then the "blocking" or "control" variable might be treated as a "random" effect. In this example, the dependent variable is voting, the independent variable is gender (which would be treated as a fixed factor so we can compare men and women), and the national context or sampling design variable (which of 15 nations) might be treated as a random factor.

In the case of a fixed factor, then, we are usually interested in comparing the scores on the dependent variable among the levels of the factor; our interest will be in differences between means. In the case of a random factor, we are not really interested in the specific differences in means from one level of the factor to another -- but, we are interested in the extent to which the random factor accounts for variance in the dependent variable, because we want to control for this. So, rather than being interested in the individual means across the levels of the fixed factor, we are interested in the variance of means across the levels of a random factor.

In modern mixed-models methodology, random factors are actually treated in a more complicated way. While we are interested in the total variation in outcomes attributable to the random factor (it's variance), we might also be interested in specific levels of random factors, in addition. Two examples: a) in using cities as PSUs for surveying how gender affects attitudes, we want to estimate variance in mean attitudes across cities, and remove this component of the variance before making the key gender comparison; but, we might also be interested in the variance across cities itself, and might want to test hypotheses about these differences (in a hierarchical model, we might want to introduce other variables to explain the mean differences across cities). b) In looking at differences between individuals in a treatment group and those in a control group, we might measure each individual a number of times. Some individuals will consistently score higher than others for reasons other than whether they were in the treatment group or not. In testing the fixed effect (does the treatment group differ from the control group), we might want to control for a "random" effect of individual differences. We would want to remove this individual level variance from the outcome in testing for a treatment effect (this is, by the way, the basic idea of how random effects models apply to repeated measures designs). But, in addition to estimating the variance due to the random effect of "individual," we might want to compare particular individuals, or even (again by a hierarchical model) seek to explain this variation with other predictors.

The key issue between fixed and random effects, statistically, is whether the effects of the levels of a factor are thought of as being a draw from a probability distribution of such effects. If so, the effect is random. If the levels of a factor are not a sample of possible levels, the effects are fixed. Usually treatment effects are fixed. Many naturally occurring qualitative variables can be thought of as having fixed effects. Most blocking (sampling design), control, and repeated measures factors are usually treated as random.

Random effects raise two issues: how to construct tests and confidence intervals -- since the effects are a sample of possible effects, we need assumptions about the underlying distribution of all possible effects that might have been observed. Second, if we want to compare or predict specific levels of random effects, then we need to worry about BLUE estimation of such effects.

With balanced data, random factors do not cause inferential problems for tests of fixed effects. But for unbalanced data (very common in non-experimental applications) improper treatment can lead to mistaken inference about treatment effects.

In SAS, the "random" statement in GLM can handle many situations. More complex situations are best dealt with using PROC MIXED. In SPSS, both analyze GLM and analyze Mixed Models do fixed and random effects. The mixed models application is more general and provides some "wizards" to help specify models.

return to the chapter table of contents

4.2 Nested classifications

Probably the most common sampling approach in non-experimental work (and in many experiments) is the crossed design. In such a design, each level of each factor may occur with each level of each other factor. If we had males and females in a treatment and a control group -- and there were some of each gender in each group, the design would be crossed. If we looked at the effects of income (with, potentially, almost as many levels as observations) and years on the job (a discrete variable with a large but finite number of levels) on prestige -- we can think of this as a crossed observational design where we observe the mean prestige of all individuals who fall in a particular income-by-prestige combination. Carefully constructed laboratory studies often have one and only one observation at each level of a crossed design (e.g. latin squares or randomized blocks); some studies have multiple replications in each cell of the design; many non-experimental designs have many observations in some cells, and no observations in many others. Regardless, all are basically crossed classifications. Stratified sampling is a case where we may pay attention to one or more factors in selecting cases, but allow all levels of other variables to occur at each level of the stratified factors (at least in theory).

Nested classifications have some levels of one factor occur only within certain levels of a first factor. Many sampling designs involve this sort of clustering. Individual children are selected within classrooms (but the children from classroom B can never be selected from the population of classroom A); classrooms may occur within schools, which may occur within districts that occur within states, etc. Observations of Y are nested within classrooms, which are nested in....

Another common application of the same idea is in "repeated measures" where we make multiple observations on Y (or multiple Ys) for each individual subject. The simplest example being a "before-after" comparison, where the scores on Y are nested within individuals.

The text gives the example of two bacterial counts being made on each of three samples drawn from each of 20 packages of ground beef. There are 2x3*20 observations (120 total). These data are reproduced in the data set MICROBE.sav. For this example, the model of interest is ln(bacteria) = grand mean + effect of package + effect of sample(within package) + error. With the two observations of each sample being treated as replications to define the error.

It is argued that the main variable of interest (package) is a random draw from all possible packages. It is argued that the samples within each package are a random draw of possible samples, and that the replications are also random outcomes. So, all factors are random. We assume that all are normal and independent.

Notation: fixed effects are labeled with greek (e.g. alpha), random effects are labeled with latin (e.g. a). the notation b(a) means the effect of b is nested within a.

The variance in the dependent variable can be thought of as independent
components: variance due to package, to samples(within packages) and to
observations(within samples, or "error"). These are called **components
of the variance**. In this example, this is our primary interest -- to
identify the relative sizes of the components of variance.

**4.2.1** ANOVA for Nested Classifications

Nested designs with random factors are easy to specify in SAS. In GLM,
the random command specifies the effects that are to be treated as random -- in
the example:

*random package sample(package);*

specifies that both the package effect, and the nested effect of sample within
package are random effects. Specifying an effect as nested is as simple as
using the sample(package) notation shown above in both the model and
random commands. I have not located a way in the GUI approach to SPSS to
get it to estimate this basic nested model. It is possible to specify this
nested random effects model in the mixed models module, but this does not
produce SS, etc. There may be a way using the command language editor,
rather than the GUI.

The analysis produces a partition of the SS (type I and type III are identical for a balanced model). There are 19 df for package (20 packages), and 40 df for sample (1 df is lost within each package, so the 60 samples from 20 packages have 40 df). Caution, the F tests produced by default for nested models are not correct -- the wrong denominator is used -- see section 4.2.3.

The **variance components** of the model are estimated from the mean
squares. The error variance component is simply the error mean
squares. The sample component, in this case is ms sample - ms error /
2. The weights to construct the variance components are given by the Type
III expected mean squares output. What is critical about this, is that
these variance components decompose the total sample variance. In this
case, 8.5% of the variance is error, 50.7% is due to variance from sample to
sample within packages, and 40.1% is due to packages. This gives you a
clear sense of the "reliability" of the package variance estimates
(the real substantive interest in the problem).

**4.2.2** Computing variances of means and optimum
sampling plans

The variance of a mean can also be computed from the ms. This could be used to minimize cost in deriving a sampling design -- most effort should go to making more observations where the variance is high (in the current case, more samples within packages and fewer packages -- given cost constraints -- would improve the standard errors of means.

**4.2.3** Using expected MS to obtain valid inference tests

In nested classifications, the "right" denominator for F tests is not always the "error" SS -- in this case, the sum of the variance between the two readings on each sample within each package. For testing package to package differences, for example, the denominator should be the variance attributable to sample-to-sample variance within packages, but ignore the variance due to multiple measurements. Construction of proper F tests can be done by hand using the expected mean squares. Or, the test sub-command can be used to specify the hypothesis (numerator) MS and the error (denominator) SS.

**4.2.4** Variance component estimation for nested models with PROC
MIXED

MIXED has a slightly different syntax, but provides an easy specification of
fixed and random factors, as well as nesting. SPSS proc mixed looks very
similar, but I can't find any way to specify nested effects. Produces the
"**covariance parameter estimates" which are the variance
components. ** For unbalanced data, MIXED should be used to produce
variance components estimates.

Rather than F tests of variance components, MIXED uses -2 residual log
likelihoods (known elsewhere as "**deviance**."). Hypotheses
about effects are tested by comparing the deviance of nested alternative
models. Differences in the deviance are distributed as chi-square with 1
df per variance component. Wald (chi square difference) tests can also be
produced by running:

proc mixed covtest;

class...

model = depvar = fixed
factors;

random = random effect
nested_effect(effect);

**Warning:** The Wald tests are not good approximations to normal
unless sample size is very large. It is better to use F tests.

**4.2.5 ** Additional tests for nested data: overall mean and
BLUE

Using the "solution" or the "estimate" option in MIXED allows computation of the overall mean.

We might also be interested in whether the mean of a particular package (cluster or PSU) differs from the overall mean. This can be described from the means, but to test the differences, the standard errors of the means must be corrected for the nested sampling design. This is done using BLUP.

return to the chapter table of contents

4.3 Blocked designs with random blocks

The complete randomized blocks design administers each treatment level within each of a number of experimental blocks (e.g. sites or clusters). In chapter 2, the effect of block was treated as fixed. More realistically, the blocks might be regarded as a sample of experimental sites drawn from a population of possible experimental sites -- a random effect. In this case, the variance component and inference should be treated differently.

Often the sampling of blocks includes all blocks -- and then they are treated as fixed. Or, they may be a sample of blocks, but not drawn with any kind of random process -- also, fixed treatment would be better. The choice of fixed versus random effect treatment of the blocking factor does not matter much for testing treatment mean differences -- usually the main interest. But, if getting proper estimates of the treatment group means and their confidence intervals is important, then random treatment of the blocking factor is superior.

This section compares the treatment of a complete random block design to that in chapter 3, where the blocking factor was treated as fixed.

**4.3.1** Random blocks analysis using PROC MIXED

The model is a basic linear model of y = intercept + vector of treatment effects + vector of block effects + individual error. Because block is now treated as random, it is represented with latin rather than greek letter. The data set is that of fruit weights for irrigation methods with areas as blocking factor (data set METHOD).

Analysis using proc mixed to specify block as random gives covariance parameter, residual, and liklihood tests for the random factor, and F tests for the fixed factor (irrigation method). Requesting means gives treatment means and estimated standard errors. The treatment means are identical to the analysis treating block as fixed. But...

The standard errors of the treatment means and differences among treatment means may differ from fixed effects. The standard error of a treatment mean is sqr ( variance of blocks + variance of error) / number of blocks); the standard error of the difference between treatment means is sqr (2 variance of error / number of blocks).

**4.3.2** Differences between Fixed and Random effects analysis of the
complete blocks design

Inference about treatment group differences is not affected by fixed versus random treatment of the blocking factor in a complete and balanced design (but is affected in incomplete designs). But, inferences about treatment means themselves (e.g. do they differ from zero) is affected.

**4.3.2.1** Treatment means IMPORTANT

The estimated standard error of a treatment mean differs, depending on whether we assume the blocking factor to be fixed or random. If fixed, the standard error is sqrt (MSerror/blocks). If random, the standard error is sqrt (var block + var error) / blocks. That is: if the blocks are fixed, the only error component is the sum of the individual residuals. But, if the blocks are random, then the variation between the blocks is also error. Hence, the certainty about treatment means is reduced by the desire to generalize beyond the blocks observed to a full population of possible blocks. The differences between blocks are treated as an estimator of this component of the uncertainty or error variance.

This may be important, for example, in setting a confidence interval around the overall mean of the dependent variable. If we assume that the blocks we have observed are all there are, then the only uncertainty is estimated by variation across the sample of individuals. But, if our mean is uncertain both because of individual sampling variability, and because of uncertainty in which blocks were chosen, the confidence intervals are wider.

Often, this difference is logically equivalent to saying: if we want to make an inference about treatment effects for this experiment or study only -- then we can treat the blocking factor as fixed. But, if we want to consider making inferences about replications (which, by definition, use new blocks), then we should treat blocking as random. The latter type of inference, and our confidence, will always be less than in the fixed case.

**4.3.2.2** Treatment differences

It is possible to validly estimate and evaluate differences in treatment means, even in designs where it may not be possible to estimate the confidence interval on the overall mean.

return to the chapter table of contents4.4 The two-way mixed model

This section examines an alternative treatment of a two-way factorial experiment using the GRASSES data set (section 3.7).

Five varieties of seeds are examined with each of three propagation methods, with six replications in each of the 15 conditions. Let us suppose that the five varieties are a sample of varieties -- and we are interested in broader inference about variety effects. Method will remain fixed (say, these are the only three methods that are ever used), but the interaction of method*variety now also becomes random, because one of its components is random.

To properly estimate standard errors for means and differences, mixed must be used. And, if there is an interest in examining specific variety effects with BLUP, then mixed must be used.

**4.4.1** Working with expected mean squares to get valid tests.
Code for this example is contained in the file GRASSES.sas.

Y = mu + alphai + bj + (ab) + eijk

the method mean mui = grand mean plus the method mean averaged across all levels of b -- "population average" model

the b effects, representing varieties in this case, are assumed normal, equal variance, mean zero

the interaction effects are also assumed mean zero, normal, equal variance

error is the variance among cases using the same variety and method

The "random" statement in SAS causes the program to produce the forumulas for the correct expected mean squares of the effects -- which can then be used for constructing tests. Adding a test statement after the random statement allows specification of hypothesis error component and error error components for tests.

The danger in not properly specifying effects as random, and in not using the correct expected mean squares to construct tests is that estimated errors are too small, and significance of treatment effects may be overstated.

**4.4.2** Standard errors for the mixed two-way model GLM versus
MIXED

GLM standard errors are not correct for the mixed 2-way factoral model. However, the variance component method of mixed analysis does not always yield non-zero components. Adding the "nobound" statement is one fix-up to this problem. lsmeans and differences are correctly estimated when nobounds is used.

**4.4.3 ** More on Expected Mean Squares -- null hypotheses for fixed
effects

Deals with correct forms for testing fixed effects in mixed models with unbalanced data using quadratic forms. Beyond my comprehension.

return to the chapter table of contents4.5 A classification with both crossed and nested effects

This section uses the data CHIPS, available for both SPSS and SAS.

The dependent variable is the amount of resistance measured across a computer chip. The primary concern is with the effects of a treatment factor (et) which has four levels. We assume that this treatment effect is fixed. Twelve wafers of silicon (you might think of them as, for example cities in another context) are selected, and three are assigned to each treatment. Wafers, then, are nested within treatment -- as each wafer occurs only within a given treatment. Four positions are selected for testing on each wafer; these four positions are taken here as an experimental factor and occur in each wafer in each treatment. Imagine, analogously, that we have selected a hispanic, a white, a black and a mixed race neighborhood in each city. We are interested in whether our four kinds of treatments work and whether there are systematic differences due to ethinicity, but we have used a cluster sampling technique to select cases.

For this design, where we are interested in et effects, the variance against which they are tested is wafers. Analogously, for examining our program effects, the SSerror is city. For testing effects of pos, a different error term is needed. Analogously, for testing ethnicity effects, we need a different error term than for testing program effects.

So, the effects here are both crossed and nested. However, all effects are treated as fixed.

**4.5.1** Analysis of Variance

With all fixed effects, SS are used. There are four levels of et (program); 8 df for wafer (12 wafers, less one degree of freedom within each level of et -- since wafer is nested within et (analogously, effects of city have 8 df, since four cities are assigned to each treatment, and the city effects are therefore nested within treatment. There are 3 df associated with position on wafer -- there are four position that are crossed with all other factors of the design. Analogously, there are 3 df associated with ethnicity effects, because there are four groups that occur in every one of the 12 cities. The interaction of the two treatment factors (et and position) has 9 df (3 from et, times 3 from pos); analogously (3 from program, times 3 from ethnicity); individual residual has 24 df. Analogously, variation across neighborhoods within cities within programs has 24 df.).

**4.5.2** Expected mean squares

Shows how to design contrasts to test hypotheses about treatment (ET)
effects, hypotheses about ethnicity (pos) effects within program (et); and
hypotheses about program (et) effects within ethnicity (pos). Table 4.3 (p
127) shows the proper error terms for examining each of the effects -- *this
is actually the key thing to try to understand*.

**4.5.3** Satterthwaite's formula for approximate degrees of freedom

All I get from this is that the df(error) in complex designs like this can be very complex, and there are methods for approximating the correct df.

**4.5.4** MIXED analysis of the same problem

If the model is correctly specified in PROC MIXED, the complexity of finding correct error terms and df is handled by the program -- and hence this is preferred to GLM. In this case, the example suggests that the effect wafer(et) be treated as random. By analogy, we might say that the effect of city(nested within program) might reasonably be treated as a random effect, because we may be interested in attempting to generalize beyond the 12 cities actually examined -- but program and ethnicity are reasonably treated as fixed effects. This seems a better specification of the problem.

The example shows how the correct variance components and tests of effects and contrasts are estimated by MIXED.

return to the chapter table of contents

4.6 Split-plot experiments

There are a wide variety of experiments of this general type. One common occurrence is where there are two or more treatment effects, and one is applied at broad areas (like cities) and another is applied at narrower areas (like neighborhoods) within each city. Designs vary in whether they are complete or not, re-use areas or not, etc. The key issues are that the error terms for effects applied at the larger level differ from (and are larger than) the error terms applied at the local level -- that is, macro effects need to be tested against both macro and micro variability; micro effects can be tested against micro variability pooled across contexts. This section shows the analysis of the same split plot design using GLM and MIXED.

**4.6.1** A standard split-plot experiment.

Data for this example are contained in the datasets CULTIVAR.sav and CULTIVAR.sav.

Four main plots (rep) are selected at random from a population of plots that might have been used to do the study. Within each plot, three kinds of inoculation treatments (innoc) are applied to each of two types of grasses (cult). The dependent variable is the dry weight of the crop. There are 24 observations, being six cult*innoc conditions within each of four plots. Replication has 3df (four plots); cultivar has 1 df (two types of grasses); replication by cultivar interaction (differences between grasses across plots -- the error A or whole-plot error) has 3 df; innoculi (treatments) have 2 df.

There are two relevant error terms. For testing differences between grasses, the replication (and hence source of error mean square) is replication by grass (the four plots by the two grasses) -- ignoring any variation due to the treatment innoculi. For testing effects of the treatment (innoculi), the replication (and hence error mean squares) is the replication by cultivar by innoculi -- or the variation across all the observations.

Think of a parallel case, substituting sociological variables.

**4.6.1.1** Analysis using GLM

GLM yields the formulas for expected mean squares in terms of variance components.

**4.6.1.2** Analysis with MIXED

Treatment main effects and interactions are treated as fixed (in this case, cult, innoc and cult*innoc). The replication across the macro units (plots) and it's interactions are treated as random (in this case, rep and rep*cult). That is, usually the replication or blocking factor in a split-plot design is treated as random.

MiXED yields variance component tests for rep and rep*cult; and yields ss and F for fixed cult, innoc, and innoc*cult.

return to the chapter table of contents

Data sets

return to the chapter table of contents

return to Linear Models home page