---
title: "`MCMCpack` Software Tutorial"
author: "Huiling Liu"
output:
  ioslides_presentation:
    smaller: yes
---

## MCMCpack Introduction
- An R package that contains functions to perform Bayesian inference using posterior simulation.
- Aimed primarily at social scientists.
- The implementation of MCMC algorithms are model-specific.


## Functions in MCMCpack

Statistical Models:

- Linear regression models (linear regression with Gaussian errors, a singular value decomposition regression, and regression for a censored dependent variable)
- Discrete choice models (logistic/multinomial logistic/ordinal probit/probit regression)
- Measurement models (one-dimensional IRT,k-dimensional IRT, k-dimensional ordinal factor, k-dimensional linear factor, k-dimensional mixed factor, and k-dimensional robust IRT)
- A model for count data (Poisson regression model)
- Models for ecological inference (hierarchical ecological inference model and dynamic ecological inference model)
- Time-series models for change-point problems (binary/ probit/ ordinal probit/ Poisson change-point model).

Also contains some useful utility functions, including some additional density functions and pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization.


## Installation of MCMCpack
```{r,eval=FALSE}
install.packages(MCMCpack)
```

```{r,message=FALSE}
library(MCMCpack)
```

## Ex1: Linear Regression Model
The model takes the following form:

$$y_i = x_i^{'}\beta + \varepsilon_{i}, \quad  \varepsilon_{i} \sim \mathcal{N}(0, 1/\tau)$$

We assume standard, semi-conjugate priors:

$\beta \sim \mathcal{N}(b_0,B_0^{-1})$ and $\tau \sim \mathcal{G}amma(c_0/2, d_0/2)$

where $\beta$ and $\tau$ are assumed a priori independent. 

## 
```{r,eval=FALSE}
MCMCregress(formula, data, burnin = 1000, mcmc = 10000, b0 = 0, B0 = 0,
            
            marginal.likelihood = c("none", "Laplace", "Chib95"), ...)
```

burnin:	The number of burn-in iterations for the sampler.

mcmc:	The number of MCMC iterations after burnin.

b0:	The prior mean of $\beta$. 

B0:	The prior precision of $\beta$.Default value of 0 is equivalent to an improper uniform prior for beta.

marginal.likelihood:  "none" in which case the marginal likelihood will not be calculated, "Laplace" in which case the Laplace approximation is used, and "Chib95" in which case the method of Chib (1995) is used.


## Birthwt example
```{r}
data(birthwt)
post.lm1 <- MCMCregress(bwt~age+lwt+as.factor(race) + smoke + ht,
                       data=birthwt,mcmc=50000,
                       b0=c(2700, 0, 0, -500, -500, -500, -500),
                       B0=c(1e-6, .01, .01, 1.6e-5, 1.6e-5, 1.6e-5,
                            1.6e-5),  marginal.likelihood="Chib95")
```

##
```{r}
summary(post.lm1)
```

## Model Comparison
Suppose that the observed data y could have been generated under one of two models $M_1$ and $M_2$.

A natural thing to ask from the Bayesian perspective is, “what is the posterior probability that $M_1$ is true (assuming either $M_1$ or $M_2$ is true)?”

Using Bayes theorem we can write:

$$Pr(M_k|y)=\frac{p(y|M_k)Pr(M_k)}{p(y|M_1)Pr(M_1)+p(y|M_1)Pr(M_1)}, \quad k=1,2$$

$p(y|M_1)$ and $p(y|M_2)$ here are marginal liklihoods.

## Model Comparison
It is instructive to look at the posterior odds in favor of one model (say $M_1$):
        $$\frac{Pr(M_1|y)}{Pr(M_2|y)}=\frac{p(y|M_1)}{p(y|M_2)} \times \frac{Pr(M_1)}{Pr(M_2)}$$                                                                        
Define Bayes factor for $M_1$ relative to $M_2$ as

$$B_{12}=\frac{p(y|M_1)}{p(y|M_2)} $$
                                                                    
Since the posterior odds equal the Bayes factor when the models are equally likely a priori, the Bayes factor is a measure of how much support is available in the data for one model relative to another.


## Model Comparison Example
PostProbMod: calculate the posterior probability that each model under study is correct given that one of the models under study is correct.
```{r}
post.lm1 <- MCMCregress(bwt~age+lwt+as.factor(race) + smoke + ht,
                       data=birthwt,mcmc=50000,
                       b0=c(2700, 0, 0, -500, -500, -500, -500),
                       B0=c(1e-6, .01, .01, 1.6e-5, 1.6e-5, 1.6e-5,
                            1.6e-5),  marginal.likelihood="Chib95")
post.lm2 <- MCMCregress(bwt~age+lwt+as.factor(race) + smoke,
                       data=birthwt,mcmc=50000,
                       b0=c(2700, 0, 0, -500, -500, -500),
                       B0=c(1e-6, .01, .01, 1.6e-5, 1.6e-5, 1.6e-5),                                               marginal.likelihood="Chib95")
BF <- BayesFactor(post.lm1,post.lm2)
mod.probs <- PostProbMod(BF)
```

## Model Comparison Example
```{r}
print(mod.probs)
```

## Ex2: Item Response Theory (IRT) Model

- $y_{ij}$ represents the observed choice by subject $i$ on item $j$.

- each subject has an ability (ideal point) dentoed by $\theta_{i(K \times 1)}$.

- each item has a difficulty parameter $\alpha_j$ and discrimination parameter $\beta_{j(K \times 1)}$. 

Consider 

$$y_{ij} \sim Bernoulli(\pi_{ij}),\quad  i=1,...,I \quad  j=1,...,J$$
$$\pi_{ij}=\Phi (-\alpha_{j}+\beta_j^T\theta_i)$$

where $\Phi$ is the the CDF of the standard normal distribution.

Prior distribution for the model parameters are:
$$(\alpha_j,\beta_j)^T \sim N(a_0,A_0^{-1}) \quad j=1,...,J$$
and $$\theta_i \sim N(0,1) \quad i=1,...,I$$

## 
```{r,eval=FALSE}
MCMCirt1d(datamatrix, theta.constraints = list(), burnin = 1000, mcmc = 20000)
```
- datamatrix:	The matrix of data. Must be 0, 1, or missing values. The rows of datamatrix correspond to subjects and the columns correspond to items.

- theta.constraints: A list specifying possible simple equality or inequality constraints on the ability parameters. A typical entry in the list has one of three forms: 

(1). varname=c which will constrain the ability parameter for the subject named varname to be equal to c, 

(2). varname="+" which will constrain the ability parameter for the subject named varname to be positive, 

(3). varname="-" which will constrain the ability parameter for the subject named varname to be negative.


## SupremeCourt Example

- Data for 9 justices Rehnquist, Stevens, O'Connor, Scalia, Kennedy, Souter, Thomas, Ginsburg, and Breyer for the U.S. Supreme Court from 43 non-unanimous cases.
- The votes are coded liberal (1), conservative (0) or missing values.
```{r,message=FALSE}
data(SupremeCourt)
post.irt <- MCMCirt1d(t(SupremeCourt),
  theta.constraints=list(Stevens="-",
  Scalia="+"), burnin=5000, mcmc=100000)
```


##
```{r}
summary(post.irt)
```

## 
```{r,echo=0}
library(grDevices)
theta.post <- as.matrix(post.irt)
set.seed(1)
x <- rnorm(20000,0,1)
plot(density(x),ylim=c(0,2),xlim=c(-4,4),col=gray(6/8),lwd=2,xlab="Ideal Point",ylab="f",main="")
cl <- colors()
col <- c()
r <- c(2,9,8,6,3,5,1,7,4)
for(i in 1:9){
  lines(density(theta.post[,r[i]]),col=rgb(red=31*(i-1),blue=248-31*(i-1),green=0,maxColorValue = 248),lwd=2)
  col[i] <- rgb(red=31*(i-1),blue=248-31*(i-1),green=0,maxColorValue = 248)
}
legend("topleft",col=c(col,gray(6/8)),c("stevens","breyer","ginsburg","souter","o'connor","kennedy","rehnquist","thomas","scalia","prior"),lty=c(1,1,1,1,1,1,1,1,1,1),lwd=2,cex=0.6)
```

## EX3:Metropolis Sampling for a User-Defined Model
MCMCpack also has some facilities for fitting user-specified models.

The MCMCmetrop1R() function uses a random walk Metropolis algorithm to
sample from a user-defined log-posterior density.

## 
```{r,eval=FALSE}
MCMCmetrop1R(fun, theta.init, burnin = 500, mcmc = 20000,
  tune = 1, logfun = TRUE, force.samp = FALSE,
  V = NULL, optim.method = "BFGS", ...)
```

fun: The unnormalized (log)density of the distribution from which to take a sample.

theta.init: Starting values for the sampling. Must be of the appropriate dimension.
It must also be the case that fun(theta.init, ...) is greater than -Inf if fun() is a logdensity or greater than 0 if fun() is a density.

tune: The tuning parameter for the Metropolis sampling. Can be either a positive scalar or a k-vector, where k is the length of θ.

## Example for a User-Defined Model
Suppose one is interested in fitting a logistic regression with an improper uniform prior. 

One could do the following:
```{r}
 logitfun <- function(beta, y, X){
      eta <- X %*% beta
      p <- 1.0/(1.0+exp(-eta))
      sum( y * log(p) + (1-y)*log(1-p) )
    }

x1 <- rnorm(1000)
x2 <- rnorm(1000)
Xdata <- cbind(1,x1,x2)
p <- exp(.5 - x1 + x2)/(1+exp(.5 - x1 + x2))
yvector <- rbinom(1000, 1, p)
```

```{r}
post.samp <- MCMCmetrop1R(logitfun, theta.init=c(0,0,0),
                              X=Xdata, y=yvector,
                              mcmc=40000, burnin=500,
                              tune=c(1.5, 1.5, 1.5),
                              logfun=TRUE)
```

## 
```{r}
plot(post.samp)
```

## 
```{r}
summary(post.samp)
```