Sociology 110: Multivariate Analysis

Assignments


This page describes the homework assignments for Sociology 110C (multivariate analysis) taught by Robert A. Hanneman of the Sociology Department at the University of California, Riverside, in the spring quarter of 2000-2001. Your comments are welcome.
A portion (50%) of your grade in this course is based on the homework assignments described in this page. Most of these assignments require that you use the SPSSx package of computer programs.  You may have SPSS version 8 on your own personal computer; this can be used for all but the logistic regression assignment.  SPSS version 10 is available in the Sproul and Watkins labs, and can be used for all assignments.  

We will be analyzing some data from the "World" data set that is available on the SPSS student version CD, and on campus lab machines.  The codebook for this data set is available on reserve.  We will also be examining some data from the 1996 General Social Survey that is also available in labs and on the SPSS CD.

If you have problems with your personal computer, please see the student helpdesk at the Watkins facility.  If you have difficulties with SPSS or with the data set, please see your TA or the instructor.


The homework assignments are intended to be brief, and to help you apply the lessons from the lectures and readings to real data analysis problems. The exercises will parallel the topics in the course:

(There are no assignments covering non-linear models and factor analysis. Depending on time constraints, the homework on path analysis may or may not be required).


Using causal diagrams to represent multivariate hypotheses

A question that interests researchers on social inequality is how people make sense of the inequalities in their society.  Some people are more successful than others, and what people believe about the causes of this may be quite important for "legitimating" social inequality.

Respondents in the 1996 General Social Survey were asked whether they believed that successful people got ahead because of "hard work," "luck or help from others," or "both"  (the variable is named getahead and has values 1= work, 2 = both, 3= luck or help).  We can see that higher scores on this variable indicate beliefs that success is due more to "external" factors than to "internal" ones.

Any number of factors might help to explain why people's opinions differ.  People's place in the system of social stratification, in particular, might be expected to affect their beliefs about social inequality.  We will explore three such factors in this exercise.  We might believe that a person's sex (variable sex, coded 1 = male and 2 = female) could predispose one to different attitudes.  Another major form of stratification in the U.S. is race (variable race, coded 1 = white, 2 = black, 3 = other).  Another very important form of stratification in the U.S. is on the basis of educational attainment (variable degree, coded 0 = less than H.S., 1 = high school, 2 = some college, 3 = bachelor's degree, and 4 = graduate study).

Your assignment is to develop a causal model that can be used to examine the relationships among getahead, sex, race, and degree.   Prepare a properly labeled diagram to summarize your model. Provide arguments that justify the causal ordering of the variables, and each of the causal effects hypothesized.

A good answer to this exercise does not have to be long. You should, if you write clearly and carefully, be able to do a good answer in a couple (typewritten) pages.

Return to the Homework Exercises menu.


Multi-way cross-tabulation

Guided by the model you developed in the first exercise, analyze the data from the GSS96 data set to estimate the magnitude and significance of effects among sex, race, degree, and getahead.  So that our tables don't become too big and complicated, drop cases who responded "other" to race; combine the categories "less than high school" and "high school" on degree; and, combine the categories "junior or some college" and "bachelors degree" on the degree variable.  On the variable getahead, treat the value "4" (other) as missing.

Write a short report that:

1) Explains the research question and states and justifies the hypotheses. Use a causal diagram to do this efficiently.

2) Reports univariate and bivariate statistics the variables.

3) Reports the relevant multivariate results.

4) Reaches a conclusion about the hypotheses - were your hypotheses supported by the data analysis? What did we learn by doing this analysis?

Return to the Homework Exercises menu.


Multiple regression

For a number of the remaining exercises, we will be focusing on an important social problem at the world-system level: rapid growth of the human population.  We will be using the data set on world nations in the 1990s that is distributed with the SPSS CD (also available in labs).  This data file contains information on about 175 nation-states in the mid-1990s.

The primary dependent variable that will interest us is the annual rate of growth in population (populgr) which is measured in percentage terms.  We are going to examine several different ideas about why some nations have rapid population and others have slow (or even negative) population growth rates.

One idea is that economic development is a cause of variation in the population growth rate.  We will measure the level of economic development using the gross national product per capita, measured in dollars (gnppc).  Some theorists believe that, as the level of material well-being increases, people are less dependent on their children for support in later life, and may consequently restrict births.  Increased material standards of living may also affect population growth rates in other ways.

A second idea is that modernization of beliefs, rather than increasing material standards of living may be at the root of lowering population growth rates.  We will examine the effect of modernization by using the proxy variable of urbanization (urban), measured as the percentage of the national population who live in urban areas.

A third notion about the causes of variation in population growth rates deals with gender equality.  Feminist theorists argue that increasing education and employment opportunities for women lead to reduced population growth rates as women restrict their fertility.  We will examine the evidence for this idea by using the indicator of the ratio of the level of female education to the level of male education in societies (femed).

Perform the necessary univariate, bi-variate, and multivariate analyses to assess the relative utility of these three theories, and prepare a brief report on your results: 

Return to the Homework Exercises menu.


Analysis of Variance

Some analysts feel that there are important additional factors that cause variation in population growth rates across nations.

One idea is that there are major cultural, climatological, and economic differences among world regions that may impact population growth rates.  A quite different idea is that the political arrangements of nations have an impact on population growth by affecting people's freedom to choice in matters of fertility.  We will try to explore these two ideas in this exercise.  As before, our dependent variable is the population growth rate (populgr).

The data set codes nations as being located in one of nine regions.  This classification is a bit too complicated for us to use, so we will simplify it.  You should create a new variable from the existing variable region, as follows.  Combine region 1 (North Africa) and 8 (Asia) into a new "Greater Asia" region; leave region 2 (Sub-Saharan Africa) as it is; create a new "European" region by combining regions 3 (North America), 7 (Oceania), and 9 (Europe); lastly, create an "American" region by combining regions 4 (Central America), 5 (Caribbean), and 6 (South America).  We will use the new set of four regions for our analysis.

The data set contains an ordinal variable that ranks nation states in terms of their degree of political freedom (democ).  Scores on this scale range from 1, indicating little political freedom to 7, indicating fairly high levels.  For our analysis, we will recode democ into a new freedom variable by combining levels 1 and 2 to be "low", levels 3, 4, and 5 to be "medium", and levels 6 and 7 to be "high."

Create the necessary dummy variables to represent which groups each nation falls in on the two categorical independent variables.  Also create dummy variables to represent the interaction of these two variables.

Do the necessary univariate analysis.  Then, perform one-way analysis of variance to examine each bi-variate relationship.  Do a two-way ANOVA, and finally a two-way ANOVA with interaction.

Prepare a brief report on your analyses.  As always, include brief problem statements, a hypothesis in the form of a model, a report of results, and a conclusion.

Return to the Homework Exercises menu.


Analysis of Covariance

We noted in our earlier regression analysis that gnppc and femed appear to have effects on populgr.  In our analysis of variance, we noted that there were regional differences in populgr.  In this exercise you will combine these the two approaches.  We are particularly interested in whether the effects of gnppc and femed "hold up" when we control for region; and, whether the effects of these two variables on populgr are the same in each of the four regions.

Build, examine, and discuss two ancova models.  In the first, enter gnppc, femed, and the recoded region variable.  In the second enter these three effects plus an interaction of femed with region (we will not explore the question of whether gnppc effects are the same across regions).

Write a short research report stating the problem, reporting and interpreting your statistical results, and reaching conclusions.

Return to the Homework Exercises menu.


Logistic Regression

Activities that are legal in some nations are illegal in others.  One of the most controversial issues in discourse about "human rights" among nations is abortion; just as this issue is very controversial in the United States.  In this exercise, you will develop a statistical model to predict which nations prohibit abortion, and which have legalized it.  The dependent variable (abortleg) is measured as a dichotomy (0 = abortion is illegal, 1 = abortion is legal) in the world data set.

A structural-functional theoretical perspective might suggest that legalization of abortion may be a response problems of over-population.  If this theory is correct, then nations with lower material standards of living should have legal abortion as a mechanism of population control.  We will use the infant mortality rate (infmrt) to indicate problems with overpopulation and low material standards of living.

A modernization perspective might suggest that nations in which the populations have more secular and individualistic world views would be more accepting of abortion.  We will use the level of formal schooling attained (educ) to indicate modernization and secularization.

A feminist perspective might suggest that where women have achieved greater equality in various spheres of social life, abortion laws will be less restrictive.  We will use the variable femleg (percentage of seats in parliament held by women) to indicate the level of political equality of women.

Prepare a short research report that builds a model to predict abortleg, and provides an interpretation of the results.

Return to the Homework Exercises menu.


Path Analysis

Reformulate the variables in the multiple regression problem (above) into a recursive path model.  Justify the causal order and  the effects hypothesized.

Perform a path analysis to evaluate the model you developed. Write a brief report that 1) states the causal model used and justifies the causal ordering chosen 2) reports the proper statistical results on effects (using a path diagram), 3) performs a decomposition of correlations of the three independent variables with populgr, and 4) reaches conclusions about the processes that cause variation in female access to high-paying occupations across nations.

Return to the Homework Exercises menu.