Sociology 203B
Cronbach's alpha reliability analysis with SAS

This course is offered in the spring quarter of 2006-07  by Robert A. Hanneman of the Department of Sociology at the University of California, Riverside. When the course is in session, announcements, discussion groups, and other features may be found on the U.C.R. instructional web site. Your comments and suggestions are welcome by email to the instructor.
This example has several parts:
From items that are ordered categorical or continuous, where the items are believed to be uni-dimensional and "parallel," it is common to construct scales by simply adding the items together (assuming that they are scaled in the same direction from low to high). Batteries of Likert-type items from surveys are commonly treated this way.

For scales of this type, the most common approach to assessing the reliability of the resulting scale is to use a measure of "internal consistency." Roughly, a scale is internally consistent if all of its items are strongly correlated. A high average correlation among the items suggests that they are all measuring "the same thing." While each item may have an error component, the common components may be expected to "add up" when the items are combined, while errors across items would be expected to "cancel out."

Cronbach's alpha coefficient is widely used to assess internal consistency reliability. Coefficient alpha is a positive function of the average correlation between items in a scale, and the number of items in the scale. The logic is quite straight-forward: the higher the average correlation, the lower the "error" or "unique" components of items; the more items, the greater the likelihood that errors will cancel out.

SAS computes a basic alpha-reliability analysis for a scale within PROC CORR. This is done as:

proc corr data=mydata alpha nomiss;
var = var1 var2 .....varK;

Suppose we had collected data from 50 people intended to measure "pro-social" or "altruistic" behavior. This example is taken from Hatcher, chapter 3. Six items are asked, and responses are collected on a 7 point grouped-ordinal Likert-type scale.

Return to the index of this page

Descriptive statistics

Correlation Analysis

6 'VAR' Variables: V1 V2 V3 V4 V5 V6

Simple Statistics

Variable N    Mean       Std Dev    Sum   Minimum    Maximum

V1      50    5.180000   1.395181   259   1.000000   7.000000
V2      50    5.400000   1.106567   270   3.000000   7.000000
V3      50    5.520000   1.216217   276   2.000000   7.000000
V4      50    3.640000   1.792957   182   1.000000   7.000000
V5      50    4.220000   1.669535   211   1.000000   7.000000
V6      50    3.100000   1.555110   155   1.000000   7.000000

note: Since alpha depends on the correlation coefficient, it is important that the correlations be valid measures of the strength of inter-item association. Ideally, all variables will be normally distributed, and the relationships among them will be strictly linear. It is a very good idea to scatter-plot each pair of variables (we haven't done that here) and, if necessary, to test for non-linearity. Let's assume that our 15 associations have passed these tests.

In examining the descriptive statistics, we note that the items do not all center at the middle of the range (V6 has a mean of 3.1, V3 has a mean of 5.5). So long as the degree of skew and non-normality is not extreme between items, the correlations will be reasonably robust. Note that there are differences in the variability in items, but that all have enough variation to be useful (e.g. V2 has a S.D of 1.1 and a mean of 5.4, for a coefficient of variation of about 20 -- low, but not so low that the item should be thrown away).

Items that have restricted variability, or which are skewed, may fail to correlate well with other items, even if they really do measure the same underlying concept. We will be interested in assessing, however, whether our scale might be better off without one or more of these potentially troublesome items.

Return to the index of this page

Reliability analysis

Correlation Analysis

Cronbach Coefficient Alpha

for RAW variables : 0.667326
for STANDARDIZED variables: 0.667953

          Raw Variables           Std. Variables

Deleted    Correlation               Correlation
Variable  with Total  Alpha       with Total  Alpha

V1        0.319049    0.650895    0.410336    0.621520
V2        0.274983    0.661642    0.314462    0.654488
V3        0.373226    0.634994    0.426554    0.615776
V4        0.433111    0.614673    0.370018    0.635590
V5        0.490013    0.588893    0.424555    0.616487
V6        0.498449    0.586887    0.436779    0.612129

note: Alpha is given for "RAW" and for "STANDARDIZED" variables. The former is a scale that is constructed by simply adding the V1 to V6 together; the latter forms a scale by z-scoring each of the variables, then summing. In the "raw" scale, items that have more variability contribute more to the variability of the resulting scale; in the "standardized" form, each item gets equal weight. Since our items are measured on the same scale, and have similar standard deviations, it makes little difference in this case.

The overall scale alphas of .667 would not usually be regarded with great enthusiasm. The most common "rule of thumb" is that alpha should exceed .80. In practice, scales with lower reliabilities are often used (and productively so).

The results are helpful for identifying individual items that might be troublesome. We are given two ways of assessing this. First, we ask, how strong is the correlation between an item and a scale composed of all of the other items? We note here that V1 and V2 appear to have the least in common with the sum of the remaining items. Second, we ask, would the alpha reliability of my scale be better if I deleted the item? Here, we see that the answer is "no" for all of the items (even V1 and V2). It appears that we would lose more power by shortening our test than we would gain from a higher average correlation.

So, what to do?

One possibility is to add more items. This might not be very practical, since the study has already been completed.

Another possibility is that maybe we should delete more than one item at a time. This sounds odd, but all of the tests we've performed assume that we are looking for bad individual items. What if our whole scale is bad? That is, what if our scale really isn't one-dimensional, but contains items measuring several different things? If we could find sub-sets of items that have very high correlations, we might create a better scale.

This problem would usually be tackled with factor analysis. But, we can get a sense for whether it might work by examining the correlation matrix.

Return to the index of this page

Correlation matrix

Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 50

       V1      V2        V3       V4       V5     V6

V1  1.00000  0.49439  0.71345 -0.10410  0.11407 0.07619
       0.0   0.0003   0.0001   0.4719   0.4302  0.5990

V2  0.49439  1.00000  0.38820  0.05349 -0.05965 0.14231
    0.0003   0.0      0.0053   0.7122   0.6807  0.3242

V3  0.71345  0.38820  1.00000 -0.02471  0.20383 0.05827
    0.0001   0.0053   0.0      0.8648   0.1557  0.6877

V4 -0.10410  0.05349 -0.02471  1.00000  0.62014 0.63532
    0.4719   0.7122   0.8648   0.0      0.0001  0.0001

V5  0.11407 -0.05965  0.20383  0.62014  1.00000 0.45512
    0.4302   0.6807   0.1557   0.0001   0.0     0.0009

V6  0.07619  0.14231  0.05827  0.63532  0.45512 1.00000
    0.5990   0.3242   0.6877   0.0001   0.0009  0.0

note: If you scan down the column of correlations for V1, you see weak or negative associations with V4, V5, and V6. The same is true for V2, which correlates pretty strongly with V1. V3 correlates well the V1 and V2, but poorly with V4, V5, and V6. The correlations among V4, V5, and V6 are all pretty strong. There is (in this contrived example) an obvious pattern. It appears the the first three items "go together" and that the remaining three items "go together." In fact, the alpha reliability of a scale composed of only the first three items is .78. It turns out that the two item scale composed of V1 and V3 is even better, with an alpha of .83.

For small problems, we can often see patterns in the correlation matrix -- and by shuffling rows and columns get some new ideas. These can easily be tested by including different VAR lists in PROC CORR. For larger or messier problems, factor analysis is a better approach.

Return to the index of this page

Alpha reliabilities are the standard approach for summated scales built from grouped-ordinal or continuous items. A variation (KR-20) (Knuder-Richardson's formula) can be used with dichotomous items.

Don't proceed too quickly to coefficient alpha. It requires that the correlations be "good" in the sense of multi-normal linear relations with unrestricted variation. It also assumes uni-dimensionality. Both are strong assumptions, so test them carefully before and after calculating alpha.

Also remember that there are other "scaling models." Most notably, the Guttman and Thurstone logic of ordered items of differing difficulty. Cronbach's alpha is not really an appropriate measure of scale and item reliability for items that do not satisfy the "domain sampling" assumptions. Watch for this when you read. Many analysts, unfortunately, apply alpha to all scaling problems.

Return to the index of this page