Agenda

  • Markov chain Monte Carlo, again
  • Gibbs sampling
  • Output analysis for MCMC
  • Convergence diagnostics
  • Examples: Capture-recapture and toy example

Gibbs Sampling

  1. Select starting values \(x_0\) and set \(t=0\)
  2. Generate in turn (deterministic scan Gibbs sampler)
    • \(x^{(1)}_{t+1} \sim f( x^{(1)} | x^{(-1)}_t)\)
    • \(x^{(2)}_{t+1} \sim f( x^{(2)} | x^{(1)}_{t+1}, x^{(3)}_t, \dots, x^{(p)}_t)\)
    • \(x^{(3)}_{t+1} \sim f( x^{(3)} | x^{(1)}_{t+1}, x^{(2)}_{t+1}, x^{(4)}_t, \dots, x^{(p)}_t)\)
    • \(x^{(p)}_{t+1} \sim f( x^{(p)} | x^{(-p)}_{t+1})\)
  3. Increment \(t\) and go to Step 2

Gibbs Sampling

  • Common to have one or more components not available in closed form
  • Then one can just use a MH sampler for those components known as a Metropolis within Gibbs or Hybrid Gibbs sampling
  • Common to ``block'' groups of random variables

Example: Capture-recapture

  • Data from a fur seal pup capture-recapture study for \(i=7\) census attempts
  • Goal is to estimate the number of pups in a fur seal colony using a capture-recapture study
Count Parameter 1 2 3 4 5 6 7
Captured \(c_i\) 30 22 29 26 31 32 35
Newly Caught \(m_i\) 30 8 17 7 9 8 5

Example: Capture-recapture

  • Let \(N\) be the population size, \(I\) be the number of census attempts where \(c_i\) were captured (\(I=7\) in our case), and \(r\) be the total number captured (\(r = \sum_{i=1}^I m_i = 84\))

  • We consider a separate unknown capture probability for each census \((\alpha_1, \dots, \alpha_I)\) where the animals are equally ``catchable''
  • Then \[ L(N,\alpha | c,r) \propto \frac{N!}{(N-r)!} \prod_{i=1}^{I} \alpha_i ^{c_i} (1-\alpha_i) ^ {N-c_i} \]

Example: Capture-recapture

  • Assume \(N\) and \(\alpha\) are apriori independent with \[ f(N) \propto 1 \mbox{ and } f(\alpha_i | \theta_1, \theta_2) \stackrel{i.i.d.}{\sim} \mbox{Beta} (\theta_1, \theta_2) \]
  • We use \(\theta_1 = \theta_2 = 1/2\), which is the Jeffrey's Prior
  • The resulting posterior is proper when \(I>2\) and recommended when \(I>5\)

Example: Capture-recapture

  • Then it is easy to show the posterior is \[ f(N,\alpha | c,r) \propto \frac{N!}{(N-r)!} \prod_{i=1}^{I} \alpha_i ^{c_i} (1-\alpha_i) ^ {N-c_i} \prod_{i=1}^{I} \alpha_i^{-1/2} (1-\alpha_i)^{-1/2} . \]
  • Further, one can show \[ \begin{aligned} N - 84 | \alpha & \sim \mbox{NB} \left( 85, 1 - \prod_{i=1}^{I} (1-\alpha_i) \right) \mbox{ and }\\ \alpha_i | N & \sim \mbox{Beta} \left( c_i + 1/2 , N -c_i + 1/2 \right) \mbox{ for all }i \end{aligned} \]

Example: Capture-recapture

Then we can consider the chain \[ \left(N , \alpha\right) \rightarrow \left(N' , \alpha\right) \rightarrow \left(N' , \alpha'\right) \] or \[ \left(N , \alpha\right) \rightarrow \left(N , \alpha'\right) \rightarrow \left(N' , \alpha'\right) , \] where both involve a ``block'' update of \(\alpha\)

Example: Capture-recapture

First, we can write the data into R

captured <- c(30, 22, 29, 26, 31, 32, 35)
new.captures <- c(30, 8, 17, 7, 9, 8, 5)
total.r <- sum(new.captures)

Example: Capture-recapture

The following R code implements the Gibbs sampler

gibbs.chain <- function(n, N.start = 94, alpha.start = rep(.5,7)) {
    output <- matrix(0, nrow=n, ncol=8)
    for(i in 1:n){
        neg.binom.prob <- 1 - prod(1-alpha.start)
        N.new <- rnbinom(1, 85, neg.binom.prob) + total.r
        beta1 <- captured + .5
        beta2 <- N.new - captured + .5
        alpha.new <- rbeta(7, beta1, beta2)
        output[i,] <- c(N.new, alpha.new)
        N.start <- N.new
        alpha.start <- alpha.new    
    }
    return(output)
}

MCMC output analysis

How can we tell if the chain is mixing well?

  • Trace plots or time-series plots
  • Autocorrelation plots
  • Plot of estimate versus Markov chain sample size
  • Effective sample size (ESS) \[ \text{ESS} (n) = \frac{n}{1+2 \sum_{k=1}^{\infty} \rho_{k}(g)}, \] where \(\rho_{k}(g)\) is the autocorrelation of lag \(k\) for \(g\)
  • Alternative, ESS can be written as \[ \text{ESS} (n) = \frac{n}{\sigma ^2 / \text{Var} g} \] where \(\sigma ^2\) is the asymptotic variance from a Markov chain CLT

Example: Capture-recapture

Then we consider some preliminary simulations to ensure the chain is mixing well

trial <- gibbs.chain(1000)
plot.ts(trial[,1], main = "Trace Plot for N")
for(i in 1:7){
    plot.ts(trial[,(i+1)], main = paste("Alpha", i))
    }

acf(trial[,1], main = "Lag Plot for N")
for(i in 1:7){
    acf(trial[,(i+1)], main = paste("Lag Alpha", i))
    }

Example: Capture-recapture

Example: Capture-recapture

Example: Capture-recapture

Now for a more complete simulation to estimate posterior means and a 90% Bayesian credible region

sim <- gibbs.chain(10000)
N <- sim[,1]
alpha1 <- sim[,2]

Example: Capture-recapture

par(mfrow=c(1,2))
hist(N, freq=F, main="Estimated Marginal Posterior for N")
hist(alpha1, freq=F, main ="Estimating Marginal Posterior for Alpha 1")

par(mfrow=c(1,1))

Example: Capture-recapture

library(mcmcse)
## mcmcse: Monte Carlo Standard Errors for MCMC
## Version 1.2-1 created on 2016-03-24.
## copyright (c) 2012, James M. Flegal, University of California,Riverside
##                     John Hughes, University of Minnesota
##                     Dootika Vats, University of Minnesota
##  For citation information, type citation("mcmcse").
##  Type help("mcmcse-package") to get started.
ess(N)
##       se 
## 8421.642
ess(alpha1)
##       se 
## 8519.459

Example: Capture-recapture

par(mfrow=c(1,2))
estvssamp(N)
estvssamp(alpha1)

par(mfrow=c(1,1))

Example: Capture-recapture

mcse(N)
## $est
## [1] 89.5495
## 
## $se
## [1] 0.0307835
mcse.q(N, .05)
## $est
## [1] 86
## 
## $se
## [1] 0.02930293
mcse.q(N, .95)
## $est
## [1] 95
## 
## $se
## [1] 0.06565672

Example: Capture-recapture

mcse(alpha1)
## $est
## [1] 0.3374951
## 
## $se
## [1] 0.0005427247
mcse.q(alpha1, .05)
## $est
## [1] 0.2556669
## 
## $se
## [1] 0.0009743541
mcse.q(alpha1, .95)
## $est
## [1] 0.4221397
## 
## $se
## [1] 0.001015246

Example: Capture-recapture

current <- sim[10000,] # start from here is you need more simulations
sim <- rbind(sim, gibbs.chain(10000, N.start = current[1], alpha.start = current[2:8]))
N.big <- sim[,1]

Example: Capture-recapture

hist(N.big, freq=F, main="Estimated Marginal Posterior for N")

Example: Capture-recapture

ess(N)
##       se 
## 8421.642
ess(N.big)
##       se 
## 14446.78

Example: Capture-recapture

estvssamp(N.big)

Example: Capture-recapture

mcse(N)
## $est
## [1] 89.5495
## 
## $se
## [1] 0.0307835
mcse(N.big)
## $est
## [1] 89.50495
## 
## $se
## [1] 0.02318482

Example: Capture-recapture

mcse.q(N, .05)
## $est
## [1] 86
## 
## $se
## [1] 0.02930293
mcse.q(N.big, .05)
## $est
## [1] 86
## 
## $se
## [1] 0.01798403

Example: Capture-recapture

mcse.q(N, .95)
## $est
## [1] 95
## 
## $se
## [1] 0.06565672
mcse.q(N.big, .95)
## $est
## [1] 95
## 
## $se
## [1] 0.04391722

Convergence diagnostics

  • A more popular method of MCMC output analysis
  • There are many; Gelman and Rubin diagnostic, Geweke's diagnostic, Heidel diagnostic, Raferty diagnostic, and other in coda package
  • Useful to detect problem with a sampler, but offer no guarantee you have converged
  • Think of these like hypothesis testing

Gelman and Rubin diagnostic

Gelman and Rubin Diagnostic is also used a stopping criteria

  • Most popular method for stopping the simulation, one of many convergence diagnostics
  • Simulates \(m\) independent parallel Markov chains
  • Considers a ratio of two different estimates of \(\text{Var}_{\pi} g\), not \(\sigma_{g}^{2}\) from the CLT
  • Argue the simulation should continue until the diagnostic (\(\widehat{R}_{0.975}\)) is close to 1

Toy example

  • Let \(Y_1,\dots,Y_m\) be i.i.d. \(\mbox{N}(\mu,\lambda)\) and let the prior for \((\mu,\lambda)\) be proportional to \(1/\sqrt{\lambda}\)
  • The posterior density is characterized by \[ \pi(\mu,\lambda|y) \propto \lambda^{-\frac{m+1}{2}} \exp\left \{ -\frac{1}{2\lambda} \sum_{j=1}^m (y_j-\mu)^2 \right \} \] which is proper as long as \(m \ge 3\)

Toy example

  • A Gibbs sampler requires the full conditionals \[ \mu|\lambda,y \sim \mbox{N} ({\bar y},\lambda/m) \] and \[ \lambda | \mu, y \sim \mbox{IG}\left( \frac{m-1}{2}, \frac{s^{2} + m(\bar{y}-\mu)^{2}}{2} \right) \; , \] where \({\bar y}\) is the sample mean and \(s^{2}=\sum (y_i - \bar{y})^{2}\)

Toy example

  • Consider the Gibbs sampler that updates \(\lambda\) then \(\mu\) \[ (\lambda',\mu') \rightarrow (\lambda,\mu') \rightarrow (\lambda,\mu) \]
  • This sampler is geometrically ergodic
  • Suppose \(m=11\), \(\bar{y}=1\), and \(s^{2}=14\)
  • Then \(E(\mu | y)=1\) and \(E(\lambda | y)=2\)
  • Consider estimating \(E(\mu | y)\) and \(E(\lambda | y)\) with \(\bar{\mu}_{n}\) and \(\bar{\lambda}_{n}\)

Toy example

Stopped the simulation when \[ \begin{matrix} BM: & t_{.975, (a - 1)} \displaystyle \frac{ \hat{\sigma}_{BM} }{ \sqrt{n} } + I(n < 400) < 0.04 \\ & \\ GRD: & \hat{R}_{0.975} + I(n < 400) < 1.005 \end{matrix} \] - 1000 independent replications + Starting from \(\bar{y}\) for BM + Starting from draws from \(\pi\) for GRD - Used 4 chains for GRD

Toy example

Plots of \(\bar{\mu}_{n}\) vs. \(n\) for both stopping methods

Toy example

MSE BM GRD
MSE for \(E(\mu | y)\) 3.73e-05 (1.8e-06) 0.000134 (9.2e-06)
MSE for \(E(\lambda | y)\) 0.000393 (1.8e-05) 0.00165 (0.00012)

Toy example

Histograms of \(\bar{\mu}_{n}\) for both stopping methods.

Summary

  • Bayesian inference usually requires a MCMC simulation
  • Metropolis-Hastings algorithm and Gibbs samplers
  • Basic idea is similar to OMC, but sampling from a Markov chain yields dependent draws
  • MCMC output analysis is often ignored or poorly understood