Multinomial logit models with PROC CATMOD


This page is part of the documentation of the course in Generalized Linear Models offered by Robert Hanneman of the Department of Sociology at the University of California, Riverside. Your comments and suggestions are welcome. You can reach me as: rhannema@wizard.ucr.edu


This page has several parts:
The problem
SAS code
Results: Goodness of fit of the model
Results: Parameters
Results: Predicted probabilities and residuals


The problem

Substantively

The model and data are taken directly from Agresti, 1996: 207. The goal is to examine the effect of the length of alligators on the type of food that they consume. At some point, I will try to find a meaningful social science example. The length of alligators is measured in meters, and may be a proxy for age, speed, and other unmeasured factors. The types of prey are classified as fish, invertebrate, or other. A total of 59 alligators are observed, with 45 scores on measured length. The goal is to predict the effect of length on the probability of each of the three types of prey, under the constraint that these three response probabilities must sum to unity.

Statistically

There are a number of alternative approaches to this problem. However, since the dependent variable is truely categorical, an extension of the binomial sampling model to multiple outcomes is reasonable. The response probabilities may be linked to the linear combination of predictors by a variety of functions, but the logistic is a good choice (a common alternative might be the probit).

One could estimate the effects of length on the log odds of each pair of outcomes, or the effects of length on the odds of each outcome against the other two pooled. There are a couple problems with this, however. First, since the equations of each approach are not truely independent (the same data are used in more than one equation), the estimated standard errors and inferential statistics may be too optimistic. Second, there is no guarantee that the estimated probabilities of the three outcomes will sum to unity if the equations are estimated independently.

So, a better approach is to choose contrasts that will enable the estimation of the log odds of any two of the three outcomes, and to derive the effects with regard to the third. It is also necessary that the two equations for two outcomes be estimated simultaneously, to ensure consistency. PROC CATMOD is a good tool for this kind of task.

In CATMOD, the standard approach (others are possible) is to define two equations for the generalized logits of two outcomes with respect to the third; and, to derive parameter estimates simultaneously by ML. From the parameters of these two equations, it possible to derive effects of unit changes in independent variables on the probability of each of the three outcomes. These effects are, by the nature of the logistic linking function, non-linear. However, they can be easily understood by graphic methods or by calculation of elasticities.

return to the table of contents


SAS code

Code

data gator ;
input length choice $ ;
cards ;
 1.24 I
 1.45 I
.
.
.
2.36 F
2.72 I
3.66 F
; 
proc catmod; response logits; direct length;
   model choice=length /pred=prob pred=freq;
run;

Comments

The length of the alligator in meters in input, followed by the qualitative variable indicating the type of prey (Invertibrate, Fish, or Other). Qualitative codes are used for the levels of the dependent variable in this case. It is often better to use numeric codes, and use a PROC FORMAT to assign labels, as controlling the order of the categories of variables in catmod is sometimes troublesome.

The "response logits" statement tells CATMOD to model generalized logits. CATMOD calculates the log odds of each category of the dependent variable relative to the last category of the dependent variable. Two alternative response functions are ALOGIT which calculates the log odds of a category relative to the next highest (adjacent) category; and, CLOGIT which calculates the log odds of a category relative to all lower categories. These latter two response functions are normally used for the analysis of ordinal variables.

The "direct length" statement tells CATMOD that the variable length is to be treated as a continuous variable (CATMOD, being a program for the analysis of categorical data, tends to assume that all variables are CLASS, unless told otherwise).

The model statement simply defines the dependent and independent variable. A large number of options are available to control the type of estimation (ML is the default for multinomial logits, but not for everything CATMOD does). Here, we ask to see the predicted probability and freqencies for each case. The purpose is to examine residuals, and to recover case probabilities.

return to the table of contents


Results: Goodness of fit of the model

Output

                              CATMOD PROCEDURE

      Response: CHOICE                      Response Levels (R)= 3           Weight
Variable: None                 Populations     (S)=45   
      Data Set: GATOR                       Total Frequency (N)= 59         Frequency
Missing: 0                  Observations  (Obs)= 59    
                             POPULATION PROFILES
                                            Sample
                          Sample  LENGTH     Size
                          ------------------------
                              1    1.24          1
                              2     1.3          2

                             43    3.68          1
                             44    3.71          1
                             45    3.89          1

                             RESPONSE PROFILES
                              Response  CHOICE
                              ----------------
                                   1      F
                                   2      I
                                   3      O

                         MAXIMUM-LIKELIHOOD ANALYSIS

                             Sub        -2 Log     Convergence
              Iteration   Iteration   Likelihood    Criterion
              ------------------------------------------------
                   0           0       129.63625       1.0000
                   1           0         101.198       0.2194
                   2           0       98.499956       0.0267
                   3           0       98.342152     0.001602
                   4           0       98.341244    9.2344E-6
                   5           0       98.341244    6.662E-10

                                  Parameter Estimates
         Iteration        1           2           3           4
         ---------------------------------------------------------
              0              0           0           0           0
              1         0.2106      2.9900      0.4501     -1.1171
              2         1.5508      5.1826     -0.1153     -2.2039
              3         1.6135      5.6556     -0.1089     -2.4415
              4         1.6177      5.6971     -0.1101     -2.4653
              5         1.6177      5.6974     -0.1101     -2.4654




               MAXIMUM-LIKELIHOOD ANALYSIS-OF-VARIANCE TABLE
             Source                   DF   Chi-Square      Prob
             --------------------------------------------------
             INTERCEPT                 2        10.71    0.0047
             LENGTH                    2         8.94    0.0115

             LIKELIHOOD RATIO         86        75.11    0.7929

Comments

CATMOD reports the number of populations (that is, unique scores on X) and the number of observations. Next, the iteration history of the liklihood function and the parameter estimates are reported. The final estimate of the -2log liklihood should be noted, as it is useful in assessing badness of fit, and improvement in fit. It would be useful if CATMOD would report the liklihood results for a two parameter model (that is, intercept only equations for the two generalized logits), which could then be used to assess the improvement due to adding the independent variables to the model. The four parameters cited in the iteration history are the intercept and slope of age on the two generalized logits. A summary chi-square test is provided for the effect of the intercept (with two degrees of freedom, as two intercepts are estimated) and -- more importantly -- the independent variable, length. Here, we see that we can be reasonably confident that the one or the other, or both of the log odds of consuming fish versus other or invertebrates versus other prey are different from zero. That is, length does hava an effect. The liklihood ratio statistic for the overall model has 86 degrees of freedom (45 scores or levels of X are observed for each of two logits, yielding 90 pieces of information, four of which are consumed by the estimation of the four free parameters of the model). One cannot readily reject the hypothesis that the model fits the data (that is, the differences between the predicted frequencies and the actual frequencies in the 45x2 table are not so large that they could not have reasonably occured by chance).

return to the table of contents


Results: Parameters

Output

                  ANALYSIS OF MAXIMUM-LIKELIHOOD ESTIMATES

                                             Standard    Chi-
      Effect            Parameter  Estimate    Error    Square   Prob
      ----------------------------------------------------------------
      INTERCEPT                 1    1.6177    1.3073     1.53  0.2159
                                2    5.6974    1.7937    10.09  0.0015
      LENGTH                    3   -0.1101    0.5171     0.05  0.8314
                                4   -2.4654    0.8996     7.51  0.0061

Comments

The four parameters are the intercepts and slope coefficients for the two equations predicting the log odds of "fish" versus "other" and "invertebrate" versus "other" prey. As in most regressions, the intercepts are of little interest. They do suggest that fish and invertebrate prey are more common than other prey at low levels of alligator length. The effects of length on the log odds of fish versus other prey are not significant at the 5% level, whereas the effects of length on the log odds of invertebrate versus other prey are highly significant. Facing this result, one might elect to collapse fish and other, and contrast that category with invertebrates. However, given the small sample size, and given a substantive concern that calls for treating the three prey categories separately, we would be more likely to simply note this result, and to proceed to trying to understand the pattern of effects.

One could, of course, simply discuss the effects of one meter differences in alligator lengths on the log-odds that an alligator is consuming fish versus other prey and on the log odds that it is consuming invertebrates versus other prey. However, effects on log odds tend to not speak very well to audiences. A much better strategy is suggested by Agresti, who shows how the probability of each of the three outcomes can be calculated from the regression parameters for any given value of X (here X is a single continuous variable, but the approach holds for any vector of X values in models with multiple independent variables). The transformation looks like this:

probability that a case falls in category one on Y =

exp ( a1 + b1X) / [1 + exp (a1 + b1X) + exp (a2 + b2X)]

probability that a case falls in category two on Y =

exp (a2 + b2X) / [1 + exp (a1 + b1X) + exp (a2 + b2X)]

probability that a case falls in category three (the "reference or last category) of Y =

1 / [ 1 + (a1 + b1X) + (a2 + b2X)]

That is, to calculate the probability in the first category of the outcome, we exponentiate the equation for the selected value of X for that outcome as a numerator; we take one plus the sum of the exponentiated equations for each of the other logits in the denominator. Here, since the dependent variable has only three categories, the denominator has two terms, one for each logit estimated.

The same calculation is performed for the probability of each category of Y, except the last, or reference category. To calculate the probabilities for the reference category of the dependent variable, one is used in the denominator.

This transformation allows us to calculate the predicted probability of each of the three scores on Y for any given score (or scores) on X. Then what? There are two approaches. For models with very simple X vectors (one or two X variables) one can present the response surface in the form of a line chart or graph. For more X, onc can hold all but one (or two) X constant (usually at their mean values), and plot the partial response surfaces.

For those who need a regression coefficient expressing effects of units of X on the probability of Y, the "elasticity" is suggested. First, calculate the predicted probability of each outcome when all X values are fixed at their sample means. Then, change one of the X variables by one unit (or, if you want a "standardized coefficient," by one standard deviation) and recalculate the probabilities of each outcome. The difference in the predicted probabilites can be interpreted as the effect of a one unit change in X on the probability of each Y outcome, when all other variables are held constant at sample mean values. Of course, values other than the sample means could be used, if there were some good reason for doing so.

return to the table of contents


Results: Predicted probabilities and residuals

Output

 MAXIMUM-LIKELIHOOD PREDICTED VALUES FOR RESPONSE FUNCTIONS AND FREQUENCIES

                 -------Observed-------  -------Predicted------
        Function              Standard                Standard
 Sample  Number   Function      Error     Function      Error     Residual
 --------------------------------------------------------------------------
    1       1             .           .  1.48119627  0.72302381           .
            2             .           .  2.64029098  0.78957983           .

           F1             0           0  0.22653072  0.09187621  -0.2265307
           F2             1           0    0.721964   0.1062234    0.278036
           F3             0           0  0.05150528  0.03606738  -0.0515053

    2       1             .           .  1.47458973  0.69729619           .
            2             .           .  2.49236421  0.74902659           .

           F1             0           0  0.50051272  0.18267535  -0.5005127
           F2             2           0  1.38493363  0.20940992  0.61506637
           F3             0           0  0.11455365  0.07564352  -0.1145537
.
.
.

   44       1             .           .  1.20922692  0.78106743           .
            2             .           .   -3.449361  1.69477195           .

           F1             1           0  0.76457993  0.13741989  0.23542007
           F2             0           0   0.0072481  0.01134416  -0.0072481
           F3             0           0  0.22817198  0.13731548   -0.228172

   45       1             .           .   1.1894073  0.86253435           .
            2             .           .  -3.8931413  1.84990868           .

           F1             1           0  0.76300599  0.15361043  0.23699401
           F2             0           0  0.00473375   0.0080913  -0.0047337
           F3             0           0  0.23226027  0.15362293  -0.2322603

Comments

It is a good idea to collect and examine residual statistics carefully in multinomial logistic regression, just as it is in any other variety of regression modeling. The output above shows, for each "poplation" (i.e. unique score on X) the predicted log odds on the "first function" (i.e. the log odds of fish versus other) and the predicted log odds on the "second function" (i.e. the log odds of invertebrate versus other prey). Perhaps more usefully, the ouput shows the frequency distribution of cases in each population on the dependent variable (i.e. how many fish eaters, inverebrate eaters and other eaters), and the probabilities of each outcome predicted by the regression function. For the last population, for example, the one case was actually a fish eater -- and the model predicted a .76 probability of that outcome (generating a residual of .24).

This type of output has a couple uses. First, it enables us to construct alternative measures of goodness of fit, if we are so inclined. It is not uncommon to count the numbers of "correct" and "incorrect" classifications produced by the model, or false positives and false negatives for each outcome (one must, of course select some reasonable rule for assigning cases to categories from the predicted probabilities). Second, and probably more important (at least in somewhat more complex cases than this example), we can identify outliers (possible indications of measurement errors or omitted variables), and places where the model consistently over or under predicts (indicating, perhaps, a less than optimal choice of the linking function).

return to the table of contents