# Ordered Logit and Probit Models with PROC LOGIST and PROC PROBIT

This page is part of the documentation of the course in Generalized Linear Models offered by Robert Hanneman of the Department of Sociology at the University of California, Riverside. Your comments and suggestions are welcome. You can reach me as: rhannema@wizard.ucr.edu

The problem
SAS code
Baseline 1: Regression by OLS on a linear contrast of Y
Baseline 2: Multinomial logistic model (Y treated as nominal)
PROC LOGISTIC
PROC PROBIT

## The Problem

### Substantively

We are interested in predicting success in completion of the various stages in a graduate program. Students can be sorted into a set of qualitative outcomes:
1. left the program without completion of the master's degree
2. in progress toward the master's degree
3. completed the master's and left the program
4. in progress toward the PhD
5. completed the PhD

Information on a number of predictors is obtained for each of 117 students (missing data on some variables causes the loss of more than 20, however). We will examine the predictive power of scores on the Graduate Record Examination Verbal (GREV) and mathematical (GREM) sections, the student's gender (treated as a dummy variable with males serving as the reference group), United States citizenship (with non-citizens serving as the referenee group in a dummy code), and the year that the student entered the program. This last variable serves primarily as a control for students who have not yet completed their careers due to recency. This effect is confounded with secular trends in the success probabilities of students in the program, in general.

Leaving aside the control variable, the research hypotheses hope to identify significant positive partial effects of both verbal skills (above those expected from math skills and other factors) and mathematical skills on attainment. The research hypothesis hopes to identify a lack of any effect of gender and U.S. citizenship on the probabilities of attainment, once "ability" has been taken into account.

It is pretty obvious that this exercise should not be taken very seriously because of probably incompleteness of the model, and lack of good theoretical guidance about what kind of model to fit. In part because of this lack of specification of the problem substantively, there are a large number of plausible ways to approach the data analysis.

### Statistically

Since the data include students who are currently enrolled, as well as those who have completed their careers, some question arises about how to think about these outcomes. There are a number of plausible possibilities. There are many other choices of strategy that need to be carefully considered in this problem, but we will focus our attention on how to conceptualize and treat the dependent variable as our main issue.

One idea is to think about the five outcomes as representing points along a linear scale (or, of course, one could impose some other weighting scheme, e.g. 1, 1.5, 5, 7.5, 10). One could then suppose that the degree of attainment was a function the X vector. The choice of a linking function is not obvious, nor is the nature of the sampling distribution underlying the conditional central tendency of the scale. One could, however, attack the problem by fitting a range of models with differing assumptions about the distribution of Y (normal, poisson, gamma), the interval distances in the weights assigned to levels of Y, and the link connecting X to Y (direct, logistic, cumulative normal, etc.). We don't like this approach much. But, to illustrate a very crude example of it, we have coded the dependent variable 1-5, assumed that it's sampling distribution is normal, and that the link with X is direct. Then, we apply OLS with SAS proc REG to estimate parameters. A more elegant approach would be to use SAS PROC GENMOD to fit with ML for different distributions of Y and link functions. Since we don't like this approach, we won't go to the trouble to do so. The OLS linear regression model is reported below as a baseline, but is not recommended as an approach to this problem.

A second approach goes to the opposite extreme, and ignores completely the ordering of the levels of the dependent variable. A simple multi-nomial model could be fit to the five outcomes, treating them as simple alternatives. With this approach, we could constrain the effects of X on the probability of each level of Y to be identical; a more logical approach is to allow the effects of X to differ across the outcomes. SAS PROC CATMOD can be used to fit this model, using generalized logits. Since this model ignores the ordering in the data, however, it too must be treated as a baseline, and is not recommended as an approach to the problem.

A third approach conceptualizes the dependent variable as a set of sequential stages, such that each subsequent outcome depends on the attainment of the prior one. Alternatively, one might think of this as a series of events, with people being "at risk" for an event only if the prior event in the sequence has occurred. This seems a more logical approach, as the process generating the data clearly has this character: one cannot complete the PhD without first completing the Master's. But, the situation here is made quite messy by including students who have not yet completed their degrees. One might think of these as sequential stages as well. For example, one must pass through the "event" of "in progress toward the Master's" as a conditon to attaining the "event" "earned the Master's." Thinking about the problem this way leads us to see the problem as a set of equations, each predicting the probability of a a "success" at each stage among those at risk for that event. That is, we would estimate a number of equations. If we were using the logistic formulation, we might look at the log odds of being in progress toward the master's or any further attainment, relative to having dropped out without the master's; the log odds of earning the masters (or more) relative to dropping out or still being in progress toward the master's, etc. For more developed examples of estimating nested or sequential probability models, see Liao, 1994).

A fourth approach is to treat the five stages as ordered categories with different overall response probabilities, but having the same responses to differences in X. That is, higher GREV scores might be hypothesized to increase the probability of falling in the next highest category of the dependent variable. This model (implemented in a number of variations below) analyzes the cumulative probability distribution of Y as we move from low to high on the ordered categories. The "distances" between the categories are adjusted by a series of intercepts, and the effects of X on falling in the next highest category of Y, relative to all prior categories is assumed to be homogeneous across all levels of Y. This last assumption (essentially the assumption of the homogeneity of association) or "proportionality" of effects of X is tested with a diagnostic chi-square statistic. PROC LOGISTIC and PROC PROBIT in SAS can be used to fit these proportional effects models and to test the proportionality assumptions. As an alternative, we might not wish to impose the constraint of proportional effects, but still analyze the cumulative logits (or normits, or gompits). This approach, in principle, can be accomplished in PROC CATMOD by specifying that the cumulative logits rather than the default generalized logits are to be analyzed in the multinomial model (I was not, however, able to get an example of this to run). Again, Liao, 1994 (chapter 5) provides a discussion of the proportional effects ordered logit and probit models.

A final word, before turning to the output. It is probably true that none of the approaches discussed above are ideal. The outcome categories are clearly dependent on one another, and are generally sequential -- but not purely so do to the inclusion of students in progress. The inclusion of recency of admission as a predictor is an attempt "after the fact" to deal with this messyness. It is not necessarily an efficacious approach. In the original analysis of these data, two efforts were made to overcome the problem. In one, the information on length of time to event was used, and event-history regression techniques were used to deal with the censoring of outcomes. This, however, changes the research question somewhat. In the other approach, we ran two simple logistic models, following the sequential outcome logic described above. In one variant, only students with completed careers were used; in the other variant, we made the assumption that students in progress toward a particular degree would attain it. The results of the various approaches, while hardly identical, were sufficiently similar to be convincing about the main patterns of effects (which were, in fact, somewhat surprising, and did not support the research hypotheses).

## SAS code

### Code

```proc format ;
value afmt 1='noma' 2='maip' 3='maonly' 4='phdip' 5='phd' ;
input id sex race enterage uscit priorma priorsoc priorgpa
ucrug toefl grev grem enteryr macen phdcen ninter
nfcomp nfspec gpa grea
matime phdtime tmatime tphdtime mai maii phdi phdii
attain;
format attain afmt. ;
cards;
8 1 1 30 1 1 1 3.42 0 .   700 770 92 1 1 0 . . .    580
1 .   1 .  . 1 . .
2
1 1 1 22 1 0 1 3.42 0 .   520 620 92 1 1 0 . . .    660
1 .   1 .  . 1 . .
2

116 2 3 53 1 1 1   4. 0 .   540 410 78 0 0 1 0 1 3.61 320
13 21 .  .  1 1 1 1
5
117 1 1 38 1 1 0  3.7 0 .   650 660 87 1 1 0 . . 3.17 610
.  .   4 .  0 0 . .
1
;
data job ;
options linesize=80 compress=yes nocenter;
gender=0;
if sex=1 then gender=0;
else gender=1;
racein=0;
if race>1 and uscit=1 then racein=1;
else racein=0;
if race>1 and uscit=0 then raceout=1;
else raceout=0;
gre=grev+grem+grea;
proc sort ;
by attain ;
proc reg ;
model attain=gender grev grem enteryr uscit ;
proc catmod data=job ;
direct grev grem ;
model attain=gender grev grem uscit;
proc logistic  order=data ;
model attain=gender grev grem enteryr  uscit/ link=cloglog ;
proc logistic  order=data ;
model attain=gender grev grem enteryr uscit / link=logit ;
proc logistic  order=data ;
model attain=gender grev grem enteryr uscit/ link=normit ;
proc probit  order=data ;
class attain ;
model attain=gender grev grem enteryr uscit / d=normal;
proc probit  order=data ;
class attain ;
model attain=gender grev grem enteryr uscit / d=logistic;
proc probit  order=data ;
class attain ;
model attain=gender grev grem enteryr uscit / d=gompertz;
run;
```

The SAS code is quite ordinary, if rather lengthy due to data handling and the number of models being run. We begin by defining a format for the dependent variable (just for labeling output), and reading the data. Subsequent to this, another data step is executed to do some housekeeping (not all of it relevant to these examples) most notably, to try to separate non-citizen non-white students from citizen non-white students, and to create a composit GRE score.

The data are then sorted from low to high on the dependent variable, ATTAIN. In the various models that follow, we order SAS to order the categories of the dependent variable according to the order in which they are read from the data set (ORDER=DATA). These steps are one way (there are others) to make sure that SAS PROC LOGISTIC and PROC PROBIT are actually ordering the categories of the dependent variable for analysis as we intended.

The various logistic models analyze the cumulative log odds of categories relative to the last level of the variable (that is, attained the PhD). Three alternative link functions are examined, with the second (logit) being the "conventional" ordered logistic model and the "normit" being the cumulative normal link. The various probit models analyze the cumulative normal probability function from "no MA" to "no MA or MA in progress" etc. Again, three alternative linking functions are examined: the first d=normal is the normal "probit" model; the second model imposes a logistic link, and the third a gomperz link.

## Baseline 1: Regression by OLS on a linear Y

### Output

```Dependent Variable: ATTAIN

Analysis of Variance

Sum of         Mean
Source          DF      Squares       Square      F Value       Prob>F

Model            5    105.85684     21.17137       15.882       0.0001
Error           90    119.97650      1.33307
C Total         95    225.83333

Root MSE       1.15459     R-square       0.4687
Dep Mean       2.95833     Adj R-sq       0.4392
C.V.          39.02831

Parameter Estimates
Parameter      Standard    T for H0:
Variable  DF      Estimate         Error   Parameter=0    Prob > |T|

INTERCEP   1     23.622594    2.56256668         9.218        0.0001
GENDER     1      0.222260    0.24749930         0.898        0.3716
GREV       1     -0.002833    0.00123905        -2.287        0.0246
GREM       1      0.000254    0.00141177         0.180        0.8576
ENTERYR    1     -0.225816    0.02819551        -8.009        0.0001
USCIT      1      0.138575    0.42538609         0.326        0.7454
```

The point of this OLS normal linear model is twofold: to give us a very crude baseline of what partial effects ought to look like, so that the probability model output will be correctly interpreted, and to illustrate why using the wrong approach can be a bad thing.

In the former regard, these results suggest a direct relationship between the X vector and the five stages of the graduate career, where these stages are treated as interval. The r-square is statistically significant, and suggests about 45% of the variance accounted for. The precision of the regression (RMSE and CV) appear fairly reasonable, though hardly impressive. An examination of residuals, however, would have shown very severe problems, and the sums of squares, in particular are quite suspect.

Even given all of that, we form the (very tentative) impression that perhaps female gender is, if anything, an advantage; as is U.S. citizenship. The affects, however, look quite weak. More recent admissions to the program have not had as high a level of attainment. This clearly reflects the nature of the data (with censored observations included) and may or may not reflect any secular tendency in attaiment probabilities. Math ability that is higher or lower than would be expected from the other independent variables appears to have no effect on attainment; unusual verbal ability appears to have the perverse effect of retarding attainment.

## Baseline 2: Multinomial logistic model (Y treated as nominal)

### Output

```CATMOD PROCEDURE

Response: ATTAIN                      Response Levels (R)=     5
Weight Variable: None                 Populations     (S)=    95
Data Set: JOB                         Total Frequency (N)=    96
Frequency Missing: 21                 Observations  (Obs)=    96

POPULATION PROFILES
Sample
Sample  GENDER  USCIT  GREV  GREM     Size
-------------------------------------------
1     0       0    260   700          1
2     0       0    400   780          1
3     0       0    430   620          1
.
.
93     1       1    700   520          1
94     1       1    730   600          1
95     1       1    740   540          1

RESPONSE PROFILES

Response  ATTAIN
----------------
1    noma
2    maip
3    maonly
4    phdip
5    phd

MAXIMUM-LIKELIHOOD ANALYSIS

Sub       -2 Log    Convergence        Parameter Estimates
Iteration  Iteration  Likelihood   Criterion        1          2          3
------------------------------------------------------------------------------
0          0      309.01208      1.0000           0          0          0
1          0      278.85294      0.0976     -4.3961    -5.4860    -4.5222
5          0      276.07723   9.412E-10     -3.7968    -4.5767    -5.7760

Parameter Estimates
Iteration       4          5          6          7          8          9
---------------------------------------------------------------------------
0             0          0          0          0          0          0
1       -3.8980     0.0900    -0.4065    -0.0122    -0.3984   0.008763
5       -3.3262     0.0971    -0.3227     0.3275    -0.4092   0.008562

Parameter Estimates
Iteration      10         11         12         13         14         15
---------------------------------------------------------------------------
0             0          0          0          0          0          0
1      0.008040   0.007623   0.002578  -0.000062   0.002586   0.000663
5      0.007488     0.0112   0.001638  -0.000647   0.001796  -0.000438

Parameter Estimates
Iteration      16          17          18          19          20
--------------------------------------------------------------------
0             0           0           0           0           0
1      0.003844      0.4723      0.3534      1.0175      0.0401
5      0.003799      0.5620      0.4500      1.6499   -0.005137

MAXIMUM-LIKELIHOOD ANALYSIS-OF-VARIANCE TABLE

Source                   DF   Chi-Square      Prob
--------------------------------------------------
INTERCEPT                 4         5.01    0.2859
GENDER                    4         4.27    0.3702
GREV                      4         9.97    0.0409
GREM                      4         1.53    0.8205
USCIT                     4         6.37    0.1729

LIKELIHOOD RATIO        360       273.30    0.9998

ANALYSIS OF MAXIMUM-LIKELIHOOD ESTIMATES

Standard    Chi-
Effect            Parameter  Estimate    Error    Square   Prob
----------------------------------------------------------------
INTERCEPT                 1   -3.7968    2.3860     2.53  0.1115
2   -4.5767    2.3686     3.73  0.0533
3   -5.7760    3.4117     2.87  0.0905
4   -3.3262    2.5562     1.69  0.1932
GENDER                    5    0.0971    0.3160     0.09  0.7586
6   -0.3227    0.3156     1.05  0.3065
7    0.3275    0.4795     0.47  0.4947
8   -0.4092    0.3447     1.41  0.2351
GREV                      9   0.00856   0.00354     5.86  0.0155
10   0.00749   0.00340     4.85  0.0276
11    0.0112   0.00468     5.70  0.0169
12   0.00164   0.00365     0.20  0.6537
GREM                     13  -0.00065   0.00377     0.03  0.8638
14   0.00180   0.00368     0.24  0.6253
15  -0.00044   0.00501     0.01  0.9304
16   0.00380   0.00403     0.89  0.3464
USCIT                    17    0.5620    0.6520     0.74  0.3887
18    0.4500    0.6113     0.54  0.4617
19    1.6499    0.7312     5.09  0.0240
20  -0.00514    0.6678     0.00  0.9939

```

The results are not directly comparable here, as the year of admission to the program was somehow left out. Tests for significant effects of each independent variable (across the four logit equations) suggest that only GREV has general effects. By examining the effects of each variable across the four logits, we can get a sense of whether, and to what degree, the proportionality assumption may be justified (i.e. homogenity of effects).

To interpret the coefficients, we must remember that the logits here are formed by the frequencies of the first outcome (dropped out without MA) to the last (completed PhD); the second outcome (in progress toward the MA) to the last (completed PhD), etc. We see some evidence of non-proportional effects. The coefficients for gender differ across the four logits in sign, but none are significant. Mathematics test scores also display differences in sign, with none significant. Being a U.S. citizen appears to increase the odds of being in progress toward the PhD compared to having attained the degree, but does not matter otherwise (I interpret this effect as one due to high non-resident tuition charges forcing non-citizens onto the job market more quickly than citizens). Finally, we note that the perverse partial effect of GRE verbal scores is moderately homogeneous across the logits (although it is not significant on the distinction between being in progress toward the PhD versus having achieved it).

The multinomial model is also only a baseline model, in that it does not take the ordering of the categories of the dependent variable into account. We turn now to a number of variations on the proportional effects model for cumulative logits -- which are explicitly ordinal.

## PROC LOGISTIC: Complementary log/log link

### Output

```Response Variable: ATTAIN
Response Levels: 5
Number of Observations: 96

Response Profile

Ordered
Value  ATTAIN     Count

1  noma          22
2  maip          25
3  maonly         8
4  phdip         17
5  phd           24

Score Test for the Equal Slopes Assumption

Chi-Square = 229.8320 with 15 DF (p=0.0001)

Model Fitting Information and Testing Global Null Hypothesis BETA=0

Intercept
Intercept        and
Criterion       Only       Covariates    Chi-Square for Covariates
AIC             305.258       245.000         .
SC              315.516       268.080         .
-2 LOG L        297.258       227.000       70.258 with 5 DF (p=0.0001)
Score              .             .          55.394 with 5 DF (p=0.0001)

Analysis of Maximum Likelihood Estimates

Parameter   Standard      Wald         Pr >      Standardized
Variable   DF    Estimate     Error    Chi-Square   Chi-Square     Estimate

INTERCP1   1     -27.4623     3.8850      49.9669       0.0001              .
INTERCP2   1     -26.2261     3.8467      46.4836       0.0001              .
INTERCP3   1     -25.7676     3.8231      45.4279       0.0001              .
INTERCP4   1     -24.7244     3.7326      43.8749       0.0001              .
GENDER     1      -0.3047     0.2629       1.3425       0.2466      -0.117275
GREV       1      0.00339    0.00135       6.2714       0.0123       0.280696
GREM       1     -0.00038    0.00148       0.0667       0.7962      -0.028804
ENTERYR    1       0.2805     0.0416      45.4039       0.0001       0.940938
USCIT      1      -0.2842     0.4301       0.4366       0.5088      -0.073674

Association of Predicted Probabilities and Observed Responses

Concordant = 76.2%          Somers' D = 0.527
Discordant = 23.4%          Gamma     = 0.530
Tied       =  0.4%          Tau-a     = 0.415
(3589 pairs)                c         = 0.764
```

Using the -2 ll approach, we see that the independent variables jointly reduce badness of fit by 70.3 units over the intecept only model, and that this improvement is significant at the .05 level. Of the original -2ll of 297.258, some 24% is accounted for by non-zero slope parameters. Note that this estimate of goodness-of-fit is markedly less than the biased R-squared from OLS regression.

A test is performed to examine the viability of the proportionality or equal slopes assumption. The null hypothesis is that of proportionality, and we see that we can be confident in rejecting that idea that effects are proportional. That is, we really ought to stop at this point, and not move on to the parameters. But, as this is an exercise, we will continue.

To interpret the parameters, one must know what logits are being examined. Here, the logits are composed by comparing the odds of the first category to the last; the first and second to the last; the first, second, and third to the last; etc. That is, cumulative logits. A "common sense" interpretation of parameters then is that parameters show the effects of a unit of X on the log odds of falling below a certain score versus achieving a certain score.

The multiple intercept terms are not normally directly interpreted, although one could use them to create the equation describing each cumulative logit, and to calculate the response probabilities. The slope coefficients for the independent variables describe effects on the log odds of failing to achieve a given point in the attainment continuum. Gender (female) reduces the odds of "failure" or, conversely, increases the odds of success, albeit not in a statistically significant way. Similarly, mathematical test scores and U.S. citizenship appear to have (non-significant) effects in reducing the odds of failure to attain. More recent entrants to the program have higher odds of falling lower on the attainment scale; and the perverse partial effect of verbal ability again surfaces in this analysis.

Most frequently, we would go no further than noting the directions and significance of effects. If we did wish to go further, several approaches are possible. A fairly reasonable thing to do is to calculate the expected cumulative logits at the sample mean values of the independent variables (or for a baseline "type" of case) and contrast the difference in the expected cumulative logits as each X score is varied by one unit (e.g. when we consider males instead of female, holding GREV, GREM, USCIT, ENTERYR constant at sample means or a baseline). The differences between the calculated logits are an "elasticity" of a sort, and can be compared across the variables to get a sense of the relative magnitudes of effects.

In this model, we specified that the link function was to be the complementary log-log formulation. Several other possible links might be examined, including the more common logistic and cumulative normal links.

### Output

```Link Function: Logit

Score Test for the Proportional Odds Assumption

Chi-Square = 174.9428 with 15 DF (p=0.0001)

Model Fitting Information and Testing Global Null Hypothesis BETA=0

Intercept
Intercept        and
Criterion       Only       Covariates    Chi-Square for Covariates

AIC             305.258       262.634         .
SC              315.516       285.713         .
-2 LOG L        297.258       244.634       52.625 with 5 DF (p=0.0001)
Score              .             .          41.430 with 5 DF (p=0.0001)

Analysis of Maximum Likelihood Estimates

Parameter Standard    Wald       Pr >    Standardized     Odds
Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate      Ratio

INTERCP1 1   -32.9038   5.3282    38.1359     0.0001            .     .
INTERCP2 1   -31.3751   5.2477    35.7459     0.0001            .     .
INTERCP3 1   -30.7907   5.2124    34.8956     0.0001            .     .
INTERCP4 1   -29.3892   5.1183    32.9703     0.0001            .     .
GENDER   1    -0.4927   0.4080     1.4588     0.2271    -0.134123    0.611
GREV     1    0.00471  0.00211     4.9874     0.0255     0.275859    1.005
GREM     1   -0.00068  0.00232     0.0853     0.7702    -0.036258    0.999
ENTERYR  1     0.3387   0.0570    35.2937     0.0001     0.803180    1.403
USCIT    1    -0.3279   0.6933     0.2236     0.6363    -0.060097    0.720

Association of Predicted Probabilities and Observed Responses

Concordant = 76.6%          Somers' D = 0.535
Discordant = 23.2%          Gamma     = 0.536
Tied       =  0.2%          Tau-a     = 0.421
(3589 pairs)                c         = 0.767
```

Briefly: the proportionality assumption test failure is not mitigated by using the logistic link function. The fit is non-zero, and the -2lll is reduced by 17.7%, noticably worse than the log-log model. The same pattern of significant effects and their directions are found. This time, the "odds ratio" is also reported. The "odds ratio" is exp(B), and expresses the multiplicative effect of a unit difference in X on the log odds of failing to attain a level relative to attaining it (i.e. the same cumulative logits are being modeled here). Where the metrics of the independent variables differ as much as they do in this case (GRE scores being hundreds of points, USCIT being either zero or one), it is not clear that either the parameter or it's exponentiation is really much help, and elasticities or predicted probabilities should probably be calculated.

## PROC LOGISTIC: Cumulative normal (normit) link

### Output

```Link Function: Normit

Score Test for the Equal Slopes Assumption

Chi-Square = 101.2743 with 15 DF (p=0.0001)

Model Fitting Information and Testing Global Null Hypothesis BETA=0

Intercept
Intercept        and
Criterion       Only       Covariates    Chi-Square for Covariates
AIC             305.258       263.665         .
SC              315.516       286.744         .
-2 LOG L        297.258       245.665       51.594 with 5 DF (p=0.0001)
Score              .             .          39.060 with 5 DF (p=0.0001)

Analysis of Maximum Likelihood Estimates

Parameter   Standard      Wald         Pr >      Standardized
Variable   DF    Estimate     Error    Chi-Square   Chi-Square     Estimate

INTERCP1   1     -19.6980     2.9166      45.6143       0.0001              .
INTERCP2   1     -18.8136     2.8807      42.6523       0.0001              .
INTERCP3   1     -18.4728     2.8663      41.5345       0.0001              .
INTERCP4   1     -17.6519     2.8265      39.0014       0.0001              .
GENDER     1      -0.2451     0.2396       1.0469       0.3062      -0.121013
GREV       1      0.00277    0.00123       5.0567       0.0245       0.294168
GREM       1     -0.00033    0.00137       0.0573       0.8108      -0.031761
ENTERYR    1       0.2030     0.0313      42.1107       0.0001       0.873398
USCIT      1      -0.1450     0.4090       0.1257       0.7230      -0.048203

Association of Predicted Probabilities and Observed Responses

Concordant = 76.6%          Somers' D = 0.534
Discordant = 23.2%          Gamma     = 0.535
Tied       =  0.2%          Tau-a     = 0.420
(3589 pairs)                c         = 0.767
```

The cumulative normal linking function, or normit (not to be confused with the normit or probit regression model) produces a model which reduces -2ll by only 17.3%, the worst of the three. Again, the proportionality assumption fails. The same variables with the same directions are identified as having significant effects on "failure to attain" the various levels.

### Output

```Class Level Information

Class    Levels    Values

ATTAIN        5    noma maip maonly phdip phd

Number of observations used = 96
Data Set          =WORK.JOB
Dependent Variable=ATTAIN

Weighted Frequency Counts for the Ordered Response Categories

Level     Count
noma        22
maip        25
maonly         8
phdip        17
phd        24

Observations with Missing Values=  21

Log Likelihood for NORMAL -122.8322946

Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

INTERCPT   1 -19.698018 3.011221  42.79171  0.0001 Intercept
GENDER     1 -0.2451083 0.240532  1.038413  0.3082
GREV       1 0.00277196 0.001228  5.093648  0.0240
GREM       1 -0.0003277  0.00139  0.055579  0.8136
ENTERYR    1 0.20303478   0.0324  39.26857  0.0001
USCIT      1 -0.1449933 0.403835   0.12891  0.7196
INTER.2    1 0.88437332 0.154668                   maip
INTER.3    1 1.22515554 0.179669                   maonly
INTER.4    1 2.04603752 0.236945                   phdip
```

The probit model differs from the logistic model in that the dependent variable is no longer the log odds of one outcome versus another; rather, we are now seeking to predict the probit, or cumulative normal probability of one outcome versus another. However, since the probits are formed in the same fashion (cumulatively, using the highest category as the baseline or denominator) as the logits, the general approach to interpretation is identical to that of the logit model.

It is clear at a glance that the main patterns of this model are the same as the logistics examined above. Interpretation of the parameters beyond direction, size, and significance is again probably best done by calculating elasticities. The only difference between caluculating elasticities from the logit rather than the probit model is that one calculates the expected probits for a given X (say, the baseline case where all variables are at sample means), and then consults a table of Z scores to determine the probability of each outcome (that is, the probability of not getting an MA versus getting a PhD; the probability of not getting an MA or having one in progress versus getting a PhD, etc.).

### Output

```Class Level Information

Class    Levels    Values

ATTAIN        5    noma maip maonly phdip phd

Weighted Frequency Counts for the Ordered Response Categories

Level     Count
noma        22
maip        25
maonly         8
phdip        17
phd        24

Log Likelihood for LOGISTIC  -122.316893

Probit Procedure

Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

INTERCPT   1 -32.904096 5.313054  38.35407  0.0001 Intercept
GENDER     1 -0.4927412 0.410173  1.443127  0.2296
GREV       1 0.00471487 0.002056  5.261113  0.0218
GREM       1 -0.0006785 0.002419  0.078669  0.7791
ENTERYR    1 0.33865984 0.056532  35.88773  0.0001
USCIT      1  -0.327885 0.676406  0.234978  0.6279
INTER.2    1 1.52865774 0.276664                   maip
INTER.3    1 2.11305694 0.324375                   maonly
INTER.4    1 3.51462307 0.439274                   phdip
```

A pattern should be forming in the reader's mind by now -- if they have not fallen asleep. That is that the cumulative logits and cumulative probits seem to tell us pretty much the same thing about these data, and that choosing different linking functions between either logits or probits and the X vectors does not seem to produce any notable differences in directions, magnitudes, or significance test results.

### Output

```Log Likelihood for GOMPERTZ  -113.500196

Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

INTERCPT   1 -27.462548 3.980509  47.59972  0.0001 Intercept
GENDER     1 -0.3046536 0.262168  1.350376  0.2452
GREV       1 0.00339238 0.001404  5.834209  0.0157
GREM       1 -0.0003812 0.001514  0.063352  0.8013
ENTERYR    1 0.28054145 0.043464  41.66055  0.0001
USCIT      1 -0.2842309 0.423537   0.45036  0.5022
INTER.2    1 1.23624474 0.223105                   maip
INTER.3    1 1.69472946 0.256916                   maonly
INTER.4    1 2.73800891 0.336051                   phdip
```