MCMCpack Software Tutorial

MCMCpack Introduction

An R package that contains functions to perform Bayesian inference using posterior simulation.
Aimed primarily at social scientists.
The implementation of MCMC algorithms are model-specific.

Functions in MCMCpack

Statistical Models:

Linear regression models (linear regression with Gaussian errors, a singular value decomposition regression, and regression for a censored dependent variable)
Discrete choice models (logistic/multinomial logistic/ordinal probit/probit regression)
Measurement models (one-dimensional IRT,k-dimensional IRT, k-dimensional ordinal factor, k-dimensional linear factor, k-dimensional mixed factor, and k-dimensional robust IRT)
A model for count data (Poisson regression model)
Models for ecological inference (hierarchical ecological inference model and dynamic ecological inference model)
Time-series models for change-point problems (binary/ probit/ ordinal probit/ Poisson change-point model).

Also contains some useful utility functions, including some additional density functions and pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization.

Installation of MCMCpack

install.packages(MCMCpack)

library(MCMCpack)

## Warning: package 'MCMCpack' was built under R version 3.4.3

## Warning: package 'MASS' was built under R version 3.4.3

Ex1: Linear Regression Model

The model takes the following form:

\[y_i = x_i^{'}\beta + \varepsilon_{i}, \quad \varepsilon_{i} \sim \mathcal{N}(0, 1/\tau)\]

We assume standard, semi-conjugate priors:

\(\beta \sim \mathcal{N}(b_0,B_0^{-1})\) and \(\tau \sim \mathcal{G}amma(c_0/2, d_0/2)\)

where \(\beta\) and \(\tau\) are assumed a priori independent.

MCMCregress(formula, data, burnin = 1000, mcmc = 10000, b0 = 0, B0 = 0,
            
            marginal.likelihood = c("none", "Laplace", "Chib95"), ...)

burnin: The number of burn-in iterations for the sampler.

mcmc: The number of MCMC iterations after burnin.

b0: The prior mean of \(\beta\).

B0: The prior precision of \(\beta\).Default value of 0 is equivalent to an improper uniform prior for beta.

marginal.likelihood: "none" in which case the marginal likelihood will not be calculated, "Laplace" in which case the Laplace approximation is used, and "Chib95" in which case the method of Chib (1995) is used.

Birthwt example

data(birthwt)
post.lm1 <- MCMCregress(bwt~age+lwt+as.factor(race) + smoke + ht,
                       data=birthwt,mcmc=50000,
                       b0=c(2700, 0, 0, -500, -500, -500, -500),
                       B0=c(1e-6, .01, .01, 1.6e-5, 1.6e-5, 1.6e-5,
                            1.6e-5),  marginal.likelihood="Chib95")

summary(post.lm1)

## 
## Iterations = 1001:51000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 50000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                        Mean        SD  Naive SE Time-series SE
## (Intercept)        2766.216   266.744 1.193e+00      1.193e+00
## age                  -1.613     6.867 3.071e-02      3.071e-02
## lwt                   4.789     1.649 7.375e-03      7.375e-03
## as.factor(race)2   -497.656   129.948 5.811e-01      5.811e-01
## as.factor(race)3   -405.334   104.238 4.662e-01      4.662e-01
## smoke              -412.213    97.384 4.355e-01      4.355e-01
## ht                 -510.062   159.032 7.112e-01      7.112e-01
## sigma2           453136.179 47854.439 2.140e+02      2.182e+02
## 
## 2. Quantiles for each variable:
## 
##                        2.5%        25%        50%        75%      97.5%
## (Intercept)        2244.254   2586.351   2766.107   2944.028   3294.325
## age                 -15.057     -6.282     -1.654      3.061     11.908
## lwt                   1.545      3.686      4.785      5.897      8.034
## as.factor(race)2   -751.984   -584.729   -497.411   -410.831   -241.960
## as.factor(race)3   -610.201   -475.504   -405.602   -335.402   -201.463
## smoke              -604.018   -477.380   -412.223   -346.538   -220.036
## ht                 -822.476   -616.279   -510.280   -403.291   -197.281
## sigma2           368846.617 419405.802 450062.914 483105.313 555700.305

Model Comparison

Suppose that the observed data y could have been generated under one of two models \(M_1\) and \(M_2\).

A natural thing to ask from the Bayesian perspective is, “what is the posterior probability that \(M_1\) is true (assuming either \(M_1\) or \(M_2\) is true)?”

Using Bayes theorem we can write:

\[Pr(M_k|y)=\frac{p(y|M_k)Pr(M_k)}{p(y|M_1)Pr(M_1)+p(y|M_1)Pr(M_1)}, \quad k=1,2\]

\(p(y|M_1)\) and \(p(y|M_2)\) here are marginal liklihoods.

Model Comparison

It is instructive to look at the posterior odds in favor of one model (say \(M_1\)): \[\frac{Pr(M_1|y)}{Pr(M_2|y)}=\frac{p(y|M_1)}{p(y|M_2)} \times \frac{Pr(M_1)}{Pr(M_2)}\]
Define Bayes factor for \(M_1\) relative to \(M_2\) as

\[B_{12}=\frac{p(y|M_1)}{p(y|M_2)} \]

Since the posterior odds equal the Bayes factor when the models are equally likely a priori, the Bayes factor is a measure of how much support is available in the data for one model relative to another.

Model Comparison Example

PostProbMod: calculate the posterior probability that each model under study is correct given that one of the models under study is correct.

post.lm1 <- MCMCregress(bwt~age+lwt+as.factor(race) + smoke + ht,
                       data=birthwt,mcmc=50000,
                       b0=c(2700, 0, 0, -500, -500, -500, -500),
                       B0=c(1e-6, .01, .01, 1.6e-5, 1.6e-5, 1.6e-5,
                            1.6e-5),  marginal.likelihood="Chib95")
post.lm2 <- MCMCregress(bwt~age+lwt+as.factor(race) + smoke,
                       data=birthwt,mcmc=50000,
                       b0=c(2700, 0, 0, -500, -500, -500),
                       B0=c(1e-6, .01, .01, 1.6e-5, 1.6e-5, 1.6e-5),                                               marginal.likelihood="Chib95")
BF <- BayesFactor(post.lm1,post.lm2)
mod.probs <- PostProbMod(BF)

Model Comparison Example

print(mod.probs)

##   post.lm1   post.lm2 
## 0.93357327 0.06642673

Ex2: Item Response Theory (IRT) Model

\(y_{ij}\) represents the observed choice by subject \(i\) on item \(j\).
each subject has an ability (ideal point) dentoed by \(\theta_{i(K \times 1)}\).
each item has a difficulty parameter \(\alpha_j\) and discrimination parameter \(\beta_{j(K \times 1)}\).

Consider

\[y_{ij} \sim Bernoulli(\pi_{ij}),\quad i=1,...,I \quad j=1,...,J\] \[\pi_{ij}=\Phi (-\alpha_{j}+\beta_j^T\theta_i)\]

where \(\Phi\) is the the CDF of the standard normal distribution.

Prior distribution for the model parameters are: \[(\alpha_j,\beta_j)^T \sim N(a_0,A_0^{-1}) \quad j=1,...,J\] and \[\theta_i \sim N(0,1) \quad i=1,...,I\]

MCMCirt1d(datamatrix, theta.constraints = list(), burnin = 1000, mcmc = 20000)

datamatrix: The matrix of data. Must be 0, 1, or missing values. The rows of datamatrix correspond to subjects and the columns correspond to items.
theta.constraints: A list specifying possible simple equality or inequality constraints on the ability parameters. A typical entry in the list has one of three forms:

(1). varname=c which will constrain the ability parameter for the subject named varname to be equal to c,

(2). varname="+" which will constrain the ability parameter for the subject named varname to be positive,

(3). varname="-" which will constrain the ability parameter for the subject named varname to be negative.

SupremeCourt Example

Data for 9 justices Rehnquist, Stevens, O'Connor, Scalia, Kennedy, Souter, Thomas, Ginsburg, and Breyer for the U.S. Supreme Court from 43 non-unanimous cases.
The votes are coded liberal (1), conservative (0) or missing values.

data(SupremeCourt)
post.irt <- MCMCirt1d(t(SupremeCourt),
  theta.constraints=list(Stevens="-",
  Scalia="+"), burnin=5000, mcmc=100000)

summary(post.irt)

## 
## Iterations = 5001:105000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1e+05 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                    Mean     SD  Naive SE Time-series SE
## theta.Rehnquist  1.2435 0.3380 0.0010689       0.008874
## theta.Stevens   -1.5566 0.4275 0.0013519       0.013209
## theta.O'Connor   0.2106 0.2040 0.0006450       0.004721
## theta.Scalia     2.3383 0.5590 0.0017678       0.020324
## theta.Kennedy    0.4647 0.2218 0.0007014       0.005229
## theta.Souter    -0.9754 0.2962 0.0009368       0.007523
## theta.Thomas     2.0760 0.5154 0.0016300       0.017308
## theta.Ginsburg  -1.2929 0.3603 0.0011395       0.010085
## theta.Breyer    -1.4533 0.4000 0.0012648       0.011456
## 
## 2. Quantiles for each variable:
## 
##                     2.5%      25%     50%     75%   97.5%
## theta.Rehnquist  0.64611  1.00643  1.2227  1.4582  1.9691
## theta.Stevens   -2.49232 -1.81881 -1.5167 -1.2525 -0.8294
## theta.O'Connor  -0.18238  0.07256  0.2072  0.3441  0.6223
## theta.Scalia     1.34693  1.94596  2.3028  2.6873  3.5606
## theta.Kennedy    0.04852  0.31259  0.4575  0.6090  0.9206
## theta.Souter    -1.62223 -1.15759 -0.9554 -0.7687 -0.4555
## theta.Thomas     1.17013  1.71491  2.0401  2.3928  3.1980
## theta.Ginsburg  -2.08412 -1.51247 -1.2621 -1.0394 -0.6743
## theta.Breyer    -2.33480 -1.69472 -1.4166 -1.1738 -0.7753

EX3:Metropolis Sampling for a User-Defined Model

MCMCpack also has some facilities for fitting user-specified models.

The MCMCmetrop1R() function uses a random walk Metropolis algorithm to sample from a user-defined log-posterior density.

MCMCmetrop1R(fun, theta.init, burnin = 500, mcmc = 20000,
  tune = 1, logfun = TRUE, force.samp = FALSE,
  V = NULL, optim.method = "BFGS", ...)

fun: The unnormalized (log)density of the distribution from which to take a sample.

theta.init: Starting values for the sampling. Must be of the appropriate dimension. It must also be the case that fun(theta.init, …) is greater than -Inf if fun() is a logdensity or greater than 0 if fun() is a density.

tune: The tuning parameter for the Metropolis sampling. Can be either a positive scalar or a k-vector, where k is the length of θ.

Example for a User-Defined Model

Suppose one is interested in fitting a logistic regression with an improper uniform prior.

One could do the following:

 logitfun <- function(beta, y, X){
      eta <- X %*% beta
      p <- 1.0/(1.0+exp(-eta))
      sum( y * log(p) + (1-y)*log(1-p) )
    }

x1 <- rnorm(1000)
x2 <- rnorm(1000)
Xdata <- cbind(1,x1,x2)
p <- exp(.5 - x1 + x2)/(1+exp(.5 - x1 + x2))
yvector <- rbinom(1000, 1, p)

post.samp <- MCMCmetrop1R(logitfun, theta.init=c(0,0,0),
                              X=Xdata, y=yvector,
                              mcmc=40000, burnin=500,
                              tune=c(1.5, 1.5, 1.5),
                              logfun=TRUE)

## 
## 
## @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
## The Metropolis acceptance rate was 0.28891
## @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

plot(post.samp)

summary(post.samp)

## 
## Iterations = 501:40500
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 40000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##        Mean      SD  Naive SE Time-series SE
## [1,]  0.561 0.08153 0.0004076       0.001298
## [2,] -1.086 0.09502 0.0004751       0.001552
## [3,]  1.139 0.09813 0.0004906       0.001589
## 
## 2. Quantiles for each variable:
## 
##         2.5%     25%     50%     75%   97.5%
## var1  0.4055  0.5041  0.5597  0.6165  0.7208
## var2 -1.2748 -1.1511 -1.0854 -1.0206 -0.9002
## var3  0.9469  1.0724  1.1379  1.2038  1.3343