--- title: "MCMCpack Software Tutorial" author: "Huiling Liu" output: ioslides_presentation: smaller: yes --- ## MCMCpack Introduction - An R package that contains functions to perform Bayesian inference using posterior simulation. - Aimed primarily at social scientists. - The implementation of MCMC algorithms are model-specific. ## Functions in MCMCpack Statistical Models: - Linear regression models (linear regression with Gaussian errors, a singular value decomposition regression, and regression for a censored dependent variable) - Discrete choice models (logistic/multinomial logistic/ordinal probit/probit regression) - Measurement models (one-dimensional IRT,k-dimensional IRT, k-dimensional ordinal factor, k-dimensional linear factor, k-dimensional mixed factor, and k-dimensional robust IRT) - A model for count data (Poisson regression model) - Models for ecological inference (hierarchical ecological inference model and dynamic ecological inference model) - Time-series models for change-point problems (binary/ probit/ ordinal probit/ Poisson change-point model). Also contains some useful utility functions, including some additional density functions and pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization. ## Installation of MCMCpack {r,eval=FALSE} install.packages(MCMCpack)  {r,message=FALSE} library(MCMCpack)  ## Ex1: Linear Regression Model The model takes the following form: $$y_i = x_i^{'}\beta + \varepsilon_{i}, \quad \varepsilon_{i} \sim \mathcal{N}(0, 1/\tau)$$ We assume standard, semi-conjugate priors: $\beta \sim \mathcal{N}(b_0,B_0^{-1})$ and $\tau \sim \mathcal{G}amma(c_0/2, d_0/2)$ where $\beta$ and $\tau$ are assumed a priori independent. ## {r,eval=FALSE} MCMCregress(formula, data, burnin = 1000, mcmc = 10000, b0 = 0, B0 = 0, marginal.likelihood = c("none", "Laplace", "Chib95"), ...)  burnin: The number of burn-in iterations for the sampler. mcmc: The number of MCMC iterations after burnin. b0: The prior mean of $\beta$. B0: The prior precision of $\beta$.Default value of 0 is equivalent to an improper uniform prior for beta. marginal.likelihood: "none" in which case the marginal likelihood will not be calculated, "Laplace" in which case the Laplace approximation is used, and "Chib95" in which case the method of Chib (1995) is used. ## Birthwt example {r} data(birthwt) post.lm1 <- MCMCregress(bwt~age+lwt+as.factor(race) + smoke + ht, data=birthwt,mcmc=50000, b0=c(2700, 0, 0, -500, -500, -500, -500), B0=c(1e-6, .01, .01, 1.6e-5, 1.6e-5, 1.6e-5, 1.6e-5), marginal.likelihood="Chib95")  ## {r} summary(post.lm1)  ## Model Comparison Suppose that the observed data y could have been generated under one of two models $M_1$ and $M_2$. A natural thing to ask from the Bayesian perspective is, “what is the posterior probability that $M_1$ is true (assuming either $M_1$ or $M_2$ is true)?” Using Bayes theorem we can write: $$Pr(M_k|y)=\frac{p(y|M_k)Pr(M_k)}{p(y|M_1)Pr(M_1)+p(y|M_1)Pr(M_1)}, \quad k=1,2$$ $p(y|M_1)$ and $p(y|M_2)$ here are marginal liklihoods. ## Model Comparison It is instructive to look at the posterior odds in favor of one model (say $M_1$): $$\frac{Pr(M_1|y)}{Pr(M_2|y)}=\frac{p(y|M_1)}{p(y|M_2)} \times \frac{Pr(M_1)}{Pr(M_2)}$$ Define Bayes factor for $M_1$ relative to $M_2$ as $$B_{12}=\frac{p(y|M_1)}{p(y|M_2)}$$ Since the posterior odds equal the Bayes factor when the models are equally likely a priori, the Bayes factor is a measure of how much support is available in the data for one model relative to another. ## Model Comparison Example PostProbMod: calculate the posterior probability that each model under study is correct given that one of the models under study is correct. {r} post.lm1 <- MCMCregress(bwt~age+lwt+as.factor(race) + smoke + ht, data=birthwt,mcmc=50000, b0=c(2700, 0, 0, -500, -500, -500, -500), B0=c(1e-6, .01, .01, 1.6e-5, 1.6e-5, 1.6e-5, 1.6e-5), marginal.likelihood="Chib95") post.lm2 <- MCMCregress(bwt~age+lwt+as.factor(race) + smoke, data=birthwt,mcmc=50000, b0=c(2700, 0, 0, -500, -500, -500), B0=c(1e-6, .01, .01, 1.6e-5, 1.6e-5, 1.6e-5), marginal.likelihood="Chib95") BF <- BayesFactor(post.lm1,post.lm2) mod.probs <- PostProbMod(BF)  ## Model Comparison Example {r} print(mod.probs)  ## Ex2: Item Response Theory (IRT) Model - $y_{ij}$ represents the observed choice by subject $i$ on item $j$. - each subject has an ability (ideal point) dentoed by $\theta_{i(K \times 1)}$. - each item has a difficulty parameter $\alpha_j$ and discrimination parameter $\beta_{j(K \times 1)}$. Consider $$y_{ij} \sim Bernoulli(\pi_{ij}),\quad i=1,...,I \quad j=1,...,J$$ $$\pi_{ij}=\Phi (-\alpha_{j}+\beta_j^T\theta_i)$$ where $\Phi$ is the the CDF of the standard normal distribution. Prior distribution for the model parameters are: $$(\alpha_j,\beta_j)^T \sim N(a_0,A_0^{-1}) \quad j=1,...,J$$ and $$\theta_i \sim N(0,1) \quad i=1,...,I$$ ## {r,eval=FALSE} MCMCirt1d(datamatrix, theta.constraints = list(), burnin = 1000, mcmc = 20000)  - datamatrix: The matrix of data. Must be 0, 1, or missing values. The rows of datamatrix correspond to subjects and the columns correspond to items. - theta.constraints: A list specifying possible simple equality or inequality constraints on the ability parameters. A typical entry in the list has one of three forms: (1). varname=c which will constrain the ability parameter for the subject named varname to be equal to c, (2). varname="+" which will constrain the ability parameter for the subject named varname to be positive, (3). varname="-" which will constrain the ability parameter for the subject named varname to be negative. ## SupremeCourt Example - Data for 9 justices Rehnquist, Stevens, O'Connor, Scalia, Kennedy, Souter, Thomas, Ginsburg, and Breyer for the U.S. Supreme Court from 43 non-unanimous cases. - The votes are coded liberal (1), conservative (0) or missing values. {r,message=FALSE} data(SupremeCourt) post.irt <- MCMCirt1d(t(SupremeCourt), theta.constraints=list(Stevens="-", Scalia="+"), burnin=5000, mcmc=100000)  ## {r} summary(post.irt)  ## {r,echo=0} library(grDevices) theta.post <- as.matrix(post.irt) set.seed(1) x <- rnorm(20000,0,1) plot(density(x),ylim=c(0,2),xlim=c(-4,4),col=gray(6/8),lwd=2,xlab="Ideal Point",ylab="f",main="") cl <- colors() col <- c() r <- c(2,9,8,6,3,5,1,7,4) for(i in 1:9){ lines(density(theta.post[,r[i]]),col=rgb(red=31*(i-1),blue=248-31*(i-1),green=0,maxColorValue = 248),lwd=2) col[i] <- rgb(red=31*(i-1),blue=248-31*(i-1),green=0,maxColorValue = 248) } legend("topleft",col=c(col,gray(6/8)),c("stevens","breyer","ginsburg","souter","o'connor","kennedy","rehnquist","thomas","scalia","prior"),lty=c(1,1,1,1,1,1,1,1,1,1),lwd=2,cex=0.6)  ## EX3:Metropolis Sampling for a User-Defined Model MCMCpack also has some facilities for fitting user-specified models. The MCMCmetrop1R() function uses a random walk Metropolis algorithm to sample from a user-defined log-posterior density. ## {r,eval=FALSE} MCMCmetrop1R(fun, theta.init, burnin = 500, mcmc = 20000, tune = 1, logfun = TRUE, force.samp = FALSE, V = NULL, optim.method = "BFGS", ...)  fun: The unnormalized (log)density of the distribution from which to take a sample. theta.init: Starting values for the sampling. Must be of the appropriate dimension. It must also be the case that fun(theta.init, ...) is greater than -Inf if fun() is a logdensity or greater than 0 if fun() is a density. tune: The tuning parameter for the Metropolis sampling. Can be either a positive scalar or a k-vector, where k is the length of θ. ## Example for a User-Defined Model Suppose one is interested in fitting a logistic regression with an improper uniform prior. One could do the following: {r} logitfun <- function(beta, y, X){ eta <- X %*% beta p <- 1.0/(1.0+exp(-eta)) sum( y * log(p) + (1-y)*log(1-p) ) } x1 <- rnorm(1000) x2 <- rnorm(1000) Xdata <- cbind(1,x1,x2) p <- exp(.5 - x1 + x2)/(1+exp(.5 - x1 + x2)) yvector <- rbinom(1000, 1, p)  {r} post.samp <- MCMCmetrop1R(logitfun, theta.init=c(0,0,0), X=Xdata, y=yvector, mcmc=40000, burnin=500, tune=c(1.5, 1.5, 1.5), logfun=TRUE)  ## {r} plot(post.samp)  ## {r} summary(post.samp)