Sociology 203B

Cronbach's alpha reliability analysis with SAS

This course is offered in the spring quarter of 2006-07 by Robert A. Hanneman of the Department of Sociology at the University of California, Riverside. When the course is in session, announcements, discussion groups, and other features may be found on the U.C.R. instructional web site. Your comments and suggestions are welcome by email to the instructor.

This example has several parts:

Introduction

From items that are ordered categorical or continuous, where the items are believed to be uni-dimensional and "parallel," it is common to construct scales by simply adding the items together (assuming that they are scaled in the same direction from low to high). Batteries of Likert-type items from surveys are commonly treated this way.

For scales of this type, the most common approach to assessing the reliability of the resulting scale is to use a measure of "internal consistency." Roughly, a scale is internally consistent if all of its items are strongly correlated. A high average correlation among the items suggests that they are all measuring "the same thing." While each item may have an error component, the common components may be expected to "add up" when the items are combined, while errors across items would be expected to "cancel out."

Cronbach's alpha coefficient is widely used to assess internal consistency reliability. Coefficient alpha is a positive function of the average correlation between items in a scale, and the number of items in the scale. The logic is quite straight-forward: the higher the average correlation, the lower the "error" or "unique" components of items; the more items, the greater the likelihood that errors will cancel out.

SAS computes a basic alpha-reliability analysis for a scale within PROC CORR. This is done as:

proc corr data=mydata alpha nomiss;

var = var1 var2 .....varK;

Suppose we had collected data from 50 people intended to measure "pro-social" or "altruistic" behavior. This example is taken from Hatcher, chapter 3. Six items are asked, and responses are collected on a 7 point grouped-ordinal Likert-type scale.

Return to the index of this page

Descriptive statistics

Correlation Analysis

6 'VAR' Variables: V1 V2 V3 V4 V5 V6

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum

V1 50
5.180000 1.395181 259 1.000000 7.000000

V2 50 5.400000 1.106567
270 3.000000 7.000000

V3 50 5.520000 1.216217
276 2.000000 7.000000

V4 50 3.640000 1.792957
182 1.000000 7.000000

V5 50 4.220000 1.669535
211 1.000000 7.000000

V6 50 3.100000 1.555110
155 1.000000 7.000000

**note:** Since alpha depends on the correlation coefficient, it is important that the correlations be valid
measures of the strength of inter-item association. Ideally, all variables will be normally distributed, and the
relationships among them will be strictly linear. It is a very good idea to scatter-plot each pair of variables
(we haven't done that here) and, if necessary, to test for non-linearity. Let's assume that our 15 associations
have passed these tests.

In examining the descriptive statistics, we note that the items do not all center at the middle of the range (V6
has a mean of 3.1, V3 has a mean of 5.5). So long as the degree of skew and non-normality is not extreme between
items, the correlations will be reasonably robust. Note that there are differences in the variability in items,
but that all have enough variation to be useful (e.g. V2 has a S.D of 1.1 and a mean of 5.4, for a coefficient
of variation of about 20 -- low, but not so low that the item should be thrown away).

Items that have restricted variability, or which are skewed, may fail to correlate well with other items, even
if they really do measure the same underlying concept. We will be interested in assessing, however, whether our
scale might be better off without one or more of these potentially troublesome items.

Return to the index of this page

Reliability analysis

Correlation Analysis

Cronbach Coefficient Alpha

for RAW variables : 0.667326

for STANDARDIZED variables: 0.667953

Raw Variables Std. Variables

Deleted Correlation
Correlation

Variable with Total Alpha
with Total Alpha

V1
0.319049 0.650895 0.410336
0.621520

V2 0.274983
0.661642 0.314462 0.654488

V3 0.373226
0.634994 0.426554 0.615776

V4 0.433111
0.614673 0.370018 0.635590

V5 0.490013
0.588893 0.424555 0.616487

V6 0.498449
0.586887 0.436779 0.612129

**note:** Alpha is given for "RAW" and for "STANDARDIZED" variables. The former is a scale
that is constructed by simply adding the V1 to V6 together; the latter forms a scale by z-scoring each of the variables,
then summing. In the "raw" scale, items that have more variability contribute more to the
variability of the resulting scale; in the "standardized" form, each item gets equal weight. Since our items are
measured on the same scale, and have similar standard deviations, it makes little difference in this case.

The overall scale alphas of .667 would not usually be regarded with great enthusiasm. The most common "rule
of thumb" is that alpha should exceed .80. In practice, scales with lower reliabilities are often used (and
productively so).

The results are helpful for identifying individual items that might be troublesome. We are given two ways of assessing
this. First, we ask, how strong is the correlation between an item and a scale composed of all of the other items?
We note here that V1 and V2 appear to have the least in common with the sum of the remaining items. Second, we
ask, would the alpha reliability of my scale be better if I deleted the item? Here, we see that the answer is
"no" for all of the items (even V1 and V2). It appears that we would lose more power by shortening our
test than we would gain from a higher average correlation.

So, what to do?

One possibility is to add more items. This might not be very practical, since the study has already been completed.

Another possibility is that maybe we should delete more than one item at a time. This sounds odd, but all of the
tests we've performed assume that we are looking for bad individual items. What if our whole scale is bad? That
is, what if our scale really isn't one-dimensional, but contains items measuring several different things? If
we could find sub-sets of items that have very high correlations, we might create a better scale.

This problem would usually be tackled with factor analysis. But, we can get a sense for whether it might work
by examining the correlation matrix.

Return to the index of this page

Correlation matrix

Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 50

V1 V2 V3 V4 V5 V6

V1 1.00000 0.49439 0.71345 -0.10410
0.11407 0.07619

0.0 0.0003 0.0001
0.4719 0.4302 0.5990

V2 0.49439 1.00000 0.38820 0.05349 -0.05965 0.14231

0.0003 0.0 0.0053
0.7122 0.6807 0.3242

V3 0.71345 0.38820 1.00000 -0.02471
0.20383 0.05827

0.0001 0.0053 0.0
0.8648 0.1557 0.6877

V4 -0.10410 0.05349 -0.02471
1.00000 0.62014 0.63532

0.4719 0.7122 0.8648 0.0
0.0001 0.0001

V5 0.11407 -0.05965 0.20383
0.62014 1.00000 0.45512

0.4302 0.6807 0.1557 0.0001
0.0 0.0009

V6 0.07619 0.14231 0.05827 0.63532
0.45512 1.00000

0.5990 0.3242 0.6877 0.0001
0.0009 0.0

**note:** If you scan down the column of correlations for V1, you see weak or negative associations with V4,
V5, and V6. The same is true for V2, which correlates pretty strongly with V1. V3 correlates well the V1 and
V2, but poorly with V4, V5, and V6. The correlations among V4, V5, and V6 are all pretty strong. There is (in
this contrived example) an obvious pattern. It appears the the first three items "go together" and that
the remaining three items "go together." In fact, the alpha reliability of a scale composed of only
the first three items is .78. It turns out that the two item scale composed of V1 and V3 is even better, with
an alpha of .83.

For small problems, we can often see patterns in the correlation matrix -- and by shuffling rows and columns get
some new ideas. These can easily be tested by including different VAR lists in PROC CORR. For larger or messier
problems, factor analysis is a better approach.

Return to the index of this page

Discussion

Alpha reliabilities are the standard approach for summated scales built from grouped-ordinal or continuous items. A variation (KR-20) (Knuder-Richardson's formula) can be used with dichotomous items.

Don't proceed too quickly to coefficient alpha. It requires that the correlations be "good" in the sense of multi-normal linear relations with unrestricted variation. It also assumes uni-dimensionality. Both are strong assumptions, so test them carefully before and after calculating alpha.

Also remember that there are other "scaling models." Most notably, the Guttman and Thurstone logic of ordered items of differing difficulty. Cronbach's alpha is not really an appropriate measure of scale and item reliability for items that do not satisfy the "domain sampling" assumptions. Watch for this when you read. Many analysts, unfortunately, apply alpha to all scaling problems.

Return to the index of this page