Multinomial logit models with PROC CATMOD

This page is part of the documentation of the course in Generalized Linear Models offered by Robert Hanneman of the Department of Sociology at the University of California, Riverside. Your comments and suggestions are welcome. You can reach me as: rhannema@wizard.ucr.edu

This page has several parts:

- The problem
- SAS code
- Results: Goodness of fit of the model
- Results: Parameters
- Results: Predicted probabilities and residuals

The problem

One could estimate the effects of length on the log odds of each pair of outcomes, or the effects of length on the odds of each outcome against the other two pooled. There are a couple problems with this, however. First, since the equations of each approach are not truely independent (the same data are used in more than one equation), the estimated standard errors and inferential statistics may be too optimistic. Second, there is no guarantee that the estimated probabilities of the three outcomes will sum to unity if the equations are estimated independently.

So, a better approach is to choose contrasts that will enable the estimation of the log odds of any two of the three outcomes, and to derive the effects with regard to the third. It is also necessary that the two equations for two outcomes be estimated simultaneously, to ensure consistency. PROC CATMOD is a good tool for this kind of task.

In CATMOD, the standard approach (others are possible) is to define two equations for the generalized logits of two outcomes with respect to the third; and, to derive parameter estimates simultaneously by ML. From the parameters of these two equations, it possible to derive effects of unit changes in independent variables on the probability of each of the three outcomes. These effects are, by the nature of the logistic linking function, non-linear. However, they can be easily understood by graphic methods or by calculation of elasticities.

return to the table of contents

SAS code

data gator ; input length choice $ ; cards ; 1.24 I 1.45 I . . . 2.36 F 2.72 I 3.66 F ; proc catmod; response logits; direct length; model choice=length /pred=prob pred=freq; run;

The "response logits" statement tells CATMOD to model generalized logits. CATMOD calculates the log odds of each category of the dependent variable relative to the last category of the dependent variable. Two alternative response functions are ALOGIT which calculates the log odds of a category relative to the next highest (adjacent) category; and, CLOGIT which calculates the log odds of a category relative to all lower categories. These latter two response functions are normally used for the analysis of ordinal variables.

The "direct length" statement tells CATMOD that the variable length is to be treated as a continuous variable (CATMOD, being a program for the analysis of categorical data, tends to assume that all variables are CLASS, unless told otherwise).

The model statement simply defines the dependent and independent variable. A large number of options are available to control the type of estimation (ML is the default for multinomial logits, but not for everything CATMOD does). Here, we ask to see the predicted probability and freqencies for each case. The purpose is to examine residuals, and to recover case probabilities.

return to the table of contents

CATMOD PROCEDURE Response: CHOICE Response Levels (R)= 3 Weight Variable: None Populations (S)=45 Data Set: GATOR Total Frequency (N)= 59 Frequency Missing: 0 Observations (Obs)= 59 POPULATION PROFILES Sample Sample LENGTH Size ------------------------ 1 1.24 1 2 1.3 2 43 3.68 1 44 3.71 1 45 3.89 1 RESPONSE PROFILES Response CHOICE ---------------- 1 F 2 I 3 O MAXIMUM-LIKELIHOOD ANALYSIS Sub -2 Log Convergence Iteration Iteration Likelihood Criterion ------------------------------------------------ 0 0 129.63625 1.0000 1 0 101.198 0.2194 2 0 98.499956 0.0267 3 0 98.342152 0.001602 4 0 98.341244 9.2344E-6 5 0 98.341244 6.662E-10 Parameter Estimates Iteration 1 2 3 4 --------------------------------------------------------- 0 0 0 0 0 1 0.2106 2.9900 0.4501 -1.1171 2 1.5508 5.1826 -0.1153 -2.2039 3 1.6135 5.6556 -0.1089 -2.4415 4 1.6177 5.6971 -0.1101 -2.4653 5 1.6177 5.6974 -0.1101 -2.4654 MAXIMUM-LIKELIHOOD ANALYSIS-OF-VARIANCE TABLE Source DF Chi-Square Prob -------------------------------------------------- INTERCEPT 2 10.71 0.0047 LENGTH 2 8.94 0.0115 LIKELIHOOD RATIO 86 75.11 0.7929

return to the table of contents

ANALYSIS OF MAXIMUM-LIKELIHOOD ESTIMATES Standard Chi- Effect Parameter Estimate Error Square Prob ---------------------------------------------------------------- INTERCEPT 1 1.6177 1.3073 1.53 0.2159 2 5.6974 1.7937 10.09 0.0015 LENGTH 3 -0.1101 0.5171 0.05 0.8314 4 -2.4654 0.8996 7.51 0.0061

One could, of course, simply discuss the effects of one meter differences in alligator lengths on the log-odds that an alligator is consuming fish versus other prey and on the log odds that it is consuming invertebrates versus other prey. However, effects on log odds tend to not speak very well to audiences. A much better strategy is suggested by Agresti, who shows how the probability of each of the three outcomes can be calculated from the regression parameters for any given value of X (here X is a single continuous variable, but the approach holds for any vector of X values in models with multiple independent variables). The transformation looks like this:

probability that a case falls in category one on Y =

exp ( a1 + b1X) / [1 + exp (a1 + b1X) + exp (a2 + b2X)]

probability that a case falls in category two on Y =

exp (a2 + b2X) / [1 + exp (a1 + b1X) + exp (a2 + b2X)]

probability that a case falls in category three (the "reference or last category) of Y =

1 / [ 1 + (a1 + b1X) + (a2 + b2X)]

That is, to calculate the probability in the first category of the outcome, we exponentiate the equation for the selected value of X for that outcome as a numerator; we take one plus the sum of the exponentiated equations for each of the other logits in the denominator. Here, since the dependent variable has only three categories, the denominator has two terms, one for each logit estimated.

The same calculation is performed for the probability of each category of Y, except the last, or reference category. To calculate the probabilities for the reference category of the dependent variable, one is used in the denominator.

This transformation allows us to calculate the predicted probability of each of the three scores on Y for any given score (or scores) on X. Then what? There are two approaches. For models with very simple X vectors (one or two X variables) one can present the response surface in the form of a line chart or graph. For more X, onc can hold all but one (or two) X constant (usually at their mean values), and plot the partial response surfaces.

For those who need a regression coefficient expressing effects of units of X on the probability of Y, the "elasticity" is suggested. First, calculate the predicted probability of each outcome when all X values are fixed at their sample means. Then, change one of the X variables by one unit (or, if you want a "standardized coefficient," by one standard deviation) and recalculate the probabilities of each outcome. The difference in the predicted probabilites can be interpreted as the effect of a one unit change in X on the probability of each Y outcome, when all other variables are held constant at sample mean values. Of course, values other than the sample means could be used, if there were some good reason for doing so.

return to the table of contents

MAXIMUM-LIKELIHOOD PREDICTED VALUES FOR RESPONSE FUNCTIONS AND FREQUENCIES -------Observed------- -------Predicted------ Function Standard Standard Sample Number Function Error Function Error Residual -------------------------------------------------------------------------- 1 1 . . 1.48119627 0.72302381 . 2 . . 2.64029098 0.78957983 . F1 0 0 0.22653072 0.09187621 -0.2265307 F2 1 0 0.721964 0.1062234 0.278036 F3 0 0 0.05150528 0.03606738 -0.0515053 2 1 . . 1.47458973 0.69729619 . 2 . . 2.49236421 0.74902659 . F1 0 0 0.50051272 0.18267535 -0.5005127 F2 2 0 1.38493363 0.20940992 0.61506637 F3 0 0 0.11455365 0.07564352 -0.1145537 . . . 44 1 . . 1.20922692 0.78106743 . 2 . . -3.449361 1.69477195 . F1 1 0 0.76457993 0.13741989 0.23542007 F2 0 0 0.0072481 0.01134416 -0.0072481 F3 0 0 0.22817198 0.13731548 -0.228172 45 1 . . 1.1894073 0.86253435 . 2 . . -3.8931413 1.84990868 . F1 1 0 0.76300599 0.15361043 0.23699401 F2 0 0 0.00473375 0.0080913 -0.0047337 F3 0 0 0.23226027 0.15362293 -0.2322603

This type of output has a couple uses. First, it enables us to construct alternative measures of goodness of fit, if we are so inclined. It is not uncommon to count the numbers of "correct" and "incorrect" classifications produced by the model, or false positives and false negatives for each outcome (one must, of course select some reasonable rule for assigning cases to categories from the predicted probabilities). Second, and probably more important (at least in somewhat more complex cases than this example), we can identify outliers (possible indications of measurement errors or omitted variables), and places where the model consistently over or under predicts (indicating, perhaps, a less than optimal choice of the linking function).

return to the table of contents