STAT 203: Lecture 9

Agenda

Importance Sampling
Metropolis Hastings Algorithm
Gibbs Sampler

Applied Measure Theory

If \(X\) is a random element of some probability space having measure \(P\), the we write \[ E\left\{ g(X) \right\} = \int g(x) P(dx) \] whenever \(g\) is a real-valued function such that the expectation exists.
Basically, this is a convenient shorthand that covers discrete, continuous, mixed, and multivariate random variates.
Use notation \(E _P \left\{ g(X) \right\}\) to add clarity when necessary.
Remember that a probability is a special case of expectation, i.e. \[ P(A) = E _P \left\{ I_A(X) \right\} = \int_A P(dx), \] where \(I_A\) denotes the indicator function of the set \(A\).

Intractable Integrals

Consider an integral that is an expectation, say \[ \mu = E\left\{ g(X) \right\} = \int g(x) P(dx) \] where \(X\) is a random variable with probability measure \(P\). We assume this expectation actually exists, i.e. \(\mu < \infty\).
Most integrals are impossible to calculate exactly, hence the best we can do is an approximation.
Only generally applicable tool for approximating integrals like this is so-called Monte Carlo integration.

SLLN

Suppose we can simulate an iid sequence \(X_1, X_2, \dots\) of random variables having the probability measure P. Then \[ Y_i = g(X_i), \quad i = 1,2,\dots \] is an iid sequence of random variables having mean \(\mu\).
SLLN says that if \(E|Y| < \infty\) then with probability 1 as \(n \to \infty\) \[ \bar{Y}_n = \frac{1}{n} \sum_{i=1}^n Y_i \to \mu . \]

Densities

We say a probability measure \(P\) has a density \(f\) with respect to another probability measure \(Q\) if our expectation can be rewritten \[ \begin{aligned} E \left\{ g(X) \right\} & = \int g(x) P(dx) \\ & = \int g(x) f(x) Q(dx) \end{aligned} \] and this holds for all functions \(g\) for which the expectation is defined.

Densities

For example, consider the case where both \(P\) and \(Q\) are the probability measures of continuous real-valued random variables.
Say \(X\) has measure \(P\) and \(Y\) has measure \(Q\) and \[ E \left\{ g(X) \right\} = \int_{-\infty}^{\infty} g(x) f_X (x) dx = \int g(x) P(dx) \] for any function \(g\) for which the expectation is defined and \[ E \left\{ g(Y) \right\} = \int_{-\infty}^{\infty} g(y) f_Y (y) dy = \int g(x) Q(dx) \] for any function \(g\) for which the expectation is defined, then in our first equation we have \[ f(w) = \frac{f_X(w)}{f_Y(w)}. \]

Importance Sampling

One of the most important techniques in Monte Carlo is the importance sampling trick.
Uses one distribution to give information about another.
Suppose \(P\) has a density \(f\) with respect to \(Q\) and we want to calculate the expectation \[ E \left\{ g(X) \right\} = \int g(x) P(dx) \] by Monte Carlo.
If direct simulation from \(P\) is difficult then the relationship \[ \begin{aligned} E \left\{ g(X) \right\} & = \int g(x) P(dx) \\ & = \int g(x) f(x) Q(dx) \end{aligned} \] gives us another way to do it.

Importance Sampling

That is, one Monte Carlo approximation of \(E_P \{ g(X) \}\) is \[ \frac{1}{n} \sum_{i=1}^n g(X_i) \quad \quad (A) \] where \(X_1, X_2, \dots\) form an identically \(P\)-distributed sequence obeying the SLLN.
Another is \[ \frac{1}{n} \sum_{i=1}^n g(Y_i) f(Y_i) \quad \quad (B) \] where \(Y_1, Y_2, \dots\) form an identically \(Q\)-distributed sequence obeying the SLLN.

Importance Sampling

This works because \[ E_P \{ g(X) \} = E_Q \{ g(Y) f(Y) \} \] when \(X\) has measure \(P\) and \(Y\) has measure \(Q\).
Just \(f(w)\) rewritten in different notation.
As long as we only discuss the SLLN, both estimates are equally good (that is, they both work).
But one may be better than the other if we consider the variance of the Monte Carlo estimators.

Importance Sampling

In the simplest case using independent sequences, the variance of (A) is \[ \frac{1}{n} \text{Var}_P \{ g(X) \} \quad \quad (VA) \] and the variance of (B) is \[ \frac{1}{n} \text{Var}_Q \{ g(Y) f(Y) \} \quad \quad (VB) \]
Two variances will typically not be the same.
In the extreme, we might have \[ f(y) \propto \frac{1}{g(y)} \] which would make the random variable \(g(Y )f(Y )\) constant and hence (VB) zero. Then there would be no Monte Carlo error, but only because we knew so much about the problem that we didn’t need to use Monte Carlo.

Importance Sampling

Variance reduction spin on importance sampling is perhaps the least interesting aspect of it.
Problem with classical importance sampling is that it’s often not worth the trouble.
Importance sampling is often used when efficiency considerations go the other way, when (VA) is smaller than (VB).
Sometimes the importance sampling estimator (B) is just easier to do, i.e. simulation from \(Q\) is easy.

Example

Consider the parametric family \[ \mathcal{P} = \left\{ P_{\theta} : \theta \in \Theta \right\} \] of probability measures (a statistical model) and we want to calculate \[ \mu (\theta) = E_{\theta} \{ g(X) \} = \int g(x) P_{\theta} (dx) \] for all \(\theta \in \Theta\).
Naive way to calculate this by Monte Carlo is by a different simulation \(X_1, X_2, \dots\) identically \(P_{\theta}\)-distributed for each \(\theta\) and use the estimator (A) to estimate \(\mu (\theta)\).
Trouble with the naive idea is that it takes an infinite amount of time to do an infinite number of simulations.
Anyone implementing the naive idea will only actually do a finite number of simulations at a grid of points in \(\Theta\), but still costly especially if \(\Theta\) is not one-dimensional.

Example

Suppose all of the measures in the model are dominated by some probability measure \(Q\), say.
Then \[ \mu (\theta) = \int g(x) P_{\theta} (dx) = \int g(y) f_{\theta} (y) Q (dy) \] for all functions \(g\) for which the expectation exists.
Then if \(Y_1, Y_2, \dots\) form an identically \(Q\)-distributed sequence obeying the SLLN \[ \bar{\mu}_n (\theta) = \frac{1}{n} \sum_{i=1}^n g(Y_i) f_{\theta} (Y_i) \] is a Monte Carlo estimate of \(\mu (\theta)\).

Example

We get an estimate for all \(\theta \in \Theta\) with just one Monte Carlo sample!
Efficient! But not in terms of the the usual importance sampling notion.
The variance of \(\bar{\mu}_n (\theta)\) will vary with \(\theta\) and may be very bad for some \(\theta\).
Idea here is not to have the optimal scheme for calculating any one expectation, but a scheme that does an acceptable job of calculating many expectations.

Normalized Importance Sampling

Recall \[ f(w) = \frac{f_X(w)}{f_Y(w)}. \]
In practice, often know \(f\) only up to a ratio of normalizing constants. That is, if \(h_X\) and \(h_Y\) are nonnegative functions such that \[ f(w) = \frac{h_X(w) / c_X}{h_Y(w) / c_Y} \] where \[ c_X = \int h_X(w) dw \text{ and } c_Y = \int h_Y(w) dw . \]
We know \(h_X\) and \(h_Y\), but not \(c_X\) and \(c_Y\).
Unnormalized densities slightly complicate importance sampling. The previous formula doesn’t work. However, a slight variant does.

Normalized Importance Sampling

Notice that \[ E_P \{ g(X) \} = E_Q \left\{ \frac{g(Y) f_X (Y)}{f_Y (Y)} \right\} = \frac{c_Y}{c_X} E_Q \left\{ \frac{g(Y) h_X (Y)}{h_Y (Y)} \right\} = \frac{c_Y}{c_X} E_Q \left\{ g(Y) w(Y) \right\} \] where the unnormalized importance weights are \[ w(y) = \frac{h_X (y)}{h_Y (y)}. \]
Hence if \(Y_1, Y_2, \dots Y_n\) form an identically \(Q\)-distributed sequence obeying the SLLN \[ \frac{1}{n} \sum_{i=1}^n g(Y_i) w(Y_i) \to E_Q \left\{ g(Y) w(Y) \right\} \] with probability 1 as \(n \to \infty\).

Normalized Importance Sampling

Now \[ E_Q \left\{ w(Y) \right\} = E_Q \left\{ \frac{h_X (Y)}{h_Y (Y)} \right\} = \frac{c_X}{c_Y} E_Q \left\{ \frac{f_X (Y)}{f_Y (Y)} \right\} = \frac{c_X}{c_Y} . \]
Thus, if \(Y_1, Y_2, \dots Y_n\) form an identically \(Q\)-distributed sequence obeying the SLLN \[ \frac{1}{n} \sum_{i=1}^n w(Y_i) = \frac{1}{n} \sum_{i=1}^n \frac{h_X (Y_i)}{h_Y (Y_i)} \to \frac{c_X}{c_Y} \] with probability 1 as \(n \to \infty\). Let \[ w^* (y) = \frac{w(y)}{\sum_{i=1}^n w(y_i)} \] which are called the normalized importance weights.

Normalized Importance Sampling

Then if \(Y_1, Y_2, \dots Y_n\) form an identically \(Q\)-distributed sequence obeying the SLLN \[ \frac{1}{n} \sum_{i=1}^n g(Y_i) w^*(Y_i) = \frac{\frac{1}{n} \sum_{i=1}^n g(Y_i) w(Y_i)}{\frac{1}{n} \sum_{i=1}^n w(Y_i)} \to \frac{c_Y}{c_X} E_Q \left\{ g(Y) w(Y) \right\} = E_P \{ g(X) \} \] with probability 1 as \(n \to \infty\).
Note that \(w^*\) is a function having the properties of a probability distribution on the sample \(Y_1, \dots ,Y_n\) \[ w^*(Y_i) \ge 0, \; i = 1, \dots, n \text{ and } \sum_{i=1}^n w^*(Y_i) = 1 . \]

Normalized Importance Sampling

Suppose that the function \(g\) for which we are calculating a Monte Carlo expectation is an indicator function (so the expectation is a probability), hence we write \[ P(A) = E_P \left\{ I_A(X) \right\} = \int_A P(dx) \] and the Monte Carlo estimate is \[ \bar{P}_n (A) = \sum_{i=1}^n I_A (Y_i) w^* (Y_i). \]
Now our intuition about probabilities says these should satisfy the complement rule \[ \bar{P}_n (A^C) = 1 - \bar{P}_n (A) \] and they do, but only because we are using normalized importance sampling.

Example

Suppose, as is often the case, that the densities \(f_{\theta}\) defined previously are known only up to a constant of proportionality, that is, we know functions \(h_{\theta}\) such that \[ f_{\theta} (x) \propto h_{\theta} (x) \] but we do not know the constant of proportionality.
Of course, its value is determined by the requirement that \(f_{\theta}\) integrate to one. Hence \[ f_{\theta} (x) = \frac{h_{\theta} (x)}{c(\theta)} \] where \[ c(\theta) = \int h_{\theta} (x) Q(dx). \]
But we may not know how to do this integral, so we don’t know the “normalizing constant” \(c(\theta)\) in any practical sense.

Normalized Importance Sampling

Useful to have terminology to describe this situation. We say that \[ \mathcal{H} = \left\{ h_{\theta} : \theta \in \Theta \right\} \] is a family of unnormalized probability densities with respect to \(Q\).
Then \(c(\theta)\) defines the normalizing function of the family, and \(f_{\theta} (x)\) defines the corresponding normalized densities \(f_{\theta} (x)\).

Normalized Importance Sampling

The previous SLLN argument for the validity as a Monte Carlo estimate says with probability 1, as \(n \to \infty\) \[ \frac{1}{n} \sum_{i=1}^n g(Y_i) f_{\theta} (Y_i) \to \mu (\theta). \]

Normalized Importance Sampling

In the current setting we get, as \(n \to \infty\) \[ \frac{1}{n} \sum_{i=1}^n g(Y_i) \frac{h_{\theta} (Y_i)}{c(\theta)} \to \mu (\theta) \] with probability 1, or \[ \frac{1}{n} \sum_{i=1}^n g(Y_i) h_{\theta} (Y_i) \to c(\theta) \mu (\theta). \]
In the special case with \(g=1\), because \(E_{\theta} (1) = 1\), we have \[ \frac{1}{n} \sum_{i=1}^n h_{\theta} (Y_i) \to c(\theta) \] with probability 1 as \(n \to \infty\).

Normalized Importance Sampling

Combining the previous two equations gives \[ \frac{\frac{1}{n} \sum_{i=1}^n g(Y_i) h_{\theta} (Y_i)}{\frac{1}{n} \sum_{i=1}^n h_{\theta} (Y_i)} \to \mu (\theta) \]
The left hand side is the desired Monte Carlo estimator of \(\mu(\theta)\) in the case where densities are unnormalized.
The \(n\) numbers \(h_{\theta} (Y_i)\) are the unnormalized importance weights.
When divided by their sum, they are the normalized importance weights \[ w_{\theta}(y) = \frac{h_{\theta} (y)}{\sum_{i=1}^n h_{\theta} (Y_i)} . \]
Then the left hand side can be written as a weighted average \[ \bar{\mu}_n (\theta) = \frac{\frac{1}{n} \sum_{i=1}^n g(Y_i) h_{\theta} (Y_i)}{\frac{1}{n} \sum_{i=1}^n h_{\theta} (Y_i)} = \sum_{i=1}^n g(Y_i) w_{\theta}(Y_i). \]

Markov chain Monte Carlo

MCMC methods are used most often in Bayesian inference where the equilibrium (invariant, stationary) distribution is a posterior distribution.
Challenge lies in construction of a suitable Markov chain with \(f\) as its stationary distribution.
Two primary algorithms to construct Markov chains in MCMC, Metropolis Hastings and Gibbs.

Metropolis-Hastings algorithm

Setting \(X_0 = x_0\) (somehow), the Metropolis-Hastings algorithm generates \(X_{t+1}\) given \(X_t = x_t\) as follows:

Sample a candidate value \(X^* \sim g(\cdot | x_t)\) where \(g\) is the proposal distribution
Compute the MH ratio \(R(x_t, X^*)\), where \[ R(x_t, X^*) = \frac{f(x^*) g (x_t | x^*)}{f(x_t) g (x^* | x_t)} \]
Set \[ X_{t+1} = \begin{cases} x^* \mbox{ w.p.\ } \min\{ R(x_t, X^*), 1\} \\ x_t \mbox{ otherwise} \end{cases} \]

Metropolis-Hastings algorithm

Irreducibility and aperiodicity depend on the choice of \(g\), these must be checked
Performance (finite sample) depends on the choice of \(g\) also, be careful

Independence chains

Suppose \(g (x^* | x_t) = g (x^*)\), this yields an independence chain since the proposal does not depend on the current state
In this case, the MH ratio is
\[ R(x_t, X^*) = \frac{f(x^*) g (x_t)}{f(x_t) g (x^*)}, \] and the resulting Markov chain will be irreducible and aperiodic if \(g > 0\) where \(f>0\)
A good envelope function \(g\) should resemble \(f\), but should cover \(f\) in the tails

Random walk chains

Generate \(X^*\) such that \(\epsilon\sim h(\cdot)\) and set \(X^* = X_t + \epsilon\), then \(g(x^* | x_t) = h(x^* - x_t)\)
Common choices of \(h(\cdot)\) are symmetric zero mean random variables with a scale parameter, e.g. a Uniform(\(-a,a\)), Normal(\(0, \sigma^2\)), \(c*T_{\nu}, \dots\)
For symmetric zero mean random variables, the MH ratio is
\[ R(x_t, X^*) = \frac{f(x^*)}{f(x_t)} \]
If the support of \(f\) is connected and \(h\) is positive in a neighborhood of 0, then the chain is irreducible and aperiodic.

Example: Markov chain basics

Exercise: Suppose \(f \sim Exp(1)\)

Write an independence MH sampler with \(g \sim Exp(\theta)\)
Show \(R(x_t, X^*) = \exp \left\{ (x_t - x^*)(1-\theta) \right\}\)
Generate 1000 draws from \(f\) with \(\theta \in \{ 1/2, 1, 2 \}\)
Write a random walk MH sampler with \(h \sim N(0, \sigma^2)\)
Show \(R(x_t, X^*) = \exp \left\{ x_t - x^* \right \} I(x^* > 0)\)
Generate 1000 draws from \(f\) with \(\sigma \in \{ .2, 1, 5 \}\)
In general, do you prefer an independence chain or a random walk MH sampler? Why?
Implement the fixed-width stopping rule for you preferred chain

Example: Markov chain basics

Independence Metropolis sampler with Exp(\(\theta\)) proposal

ind.chain <- function(x, n, theta = 1) {
  ## if theta = 1, then this is an iid sampler
  m <- length(x)
  x <- append(x, double(n))
  for(i in (m+1):length(x)){
    x.prime <- rexp(1, rate=theta)
    u <- exp((x[(i-1)]-x.prime)*(1-theta))
    if(runif(1) < u)
      x[i] <- x.prime
    else
      x[i] <- x[(i-1)]
  }
  return(x)
}

Example: Markov chain basics

Random Walk Metropolis sampler with N(\(0,\sigma\)) proposal

rw.chain <- function(x, n, sigma = 1) {
  m <- length(x)
  x <- append(x, double(n))
  for(i in (m+1):length(x)){
    x.prime <- x[(i-1)] + rnorm(1, sd = sigma)
    u <- exp((x[(i-1)]-x.prime))
    u
    if(runif(1) < u && x.prime > 0)
      x[i] <- x.prime
    else
      x[i] <- x[(i-1)]
  }
  return(x)
}

Example: Markov chain basics

trial0 <- ind.chain(1, 500, 1)
trial1 <- ind.chain(1, 500, 2)
trial2 <- ind.chain(1, 500, 1/2)
rw1 <- rw.chain(1, 500, .2)
rw2 <- rw.chain(1, 500, 1)
rw3 <- rw.chain(1, 500, 5)

Example: Markov chain basics

Gibbs Sampling

Select starting values \(x_0\) and set \(t=0\)
Generate in turn (deterministic scan Gibbs sampler)
- \(x^{(1)}_{t+1} \sim f( x^{(1)} | x^{(-1)}_t)\)
- \(x^{(2)}_{t+1} \sim f( x^{(2)} | x^{(1)}_{t+1}, x^{(3)}_t, \dots, x^{(p)}_t)\)
- \(x^{(3)}_{t+1} \sim f( x^{(3)} | x^{(1)}_{t+1}, x^{(2)}_{t+1}, x^{(4)}_t, \dots, x^{(p)}_t)\)
- …
- \(x^{(p)}_{t+1} \sim f( x^{(p)} | x^{(-p)}_{t+1})\)
Increment \(t\) and go to Step 2

Gibbs Sampling

Common to have one or more components not available in closed form
Then one can just use a MH sampler for those components known as a Metropolis within Gibbs or Hybrid Gibbs sampling
Common to ``block'' groups of random variables

Example: Capture-recapture

Data from a fur seal pup capture-recapture study for \(i=7\) census attempts
Goal is to estimate the number of pups in a fur seal colony using a capture-recapture study

Count	Parameter	1	2	3	4	5	6	7
Captured	\(c_i\)	30	22	29	26	31	32	35
Newly Caught	\(m_i\)	30	8	17	7	9	8	5

Example: Capture-recapture

Let \(N\) be the population size, \(I\) be the number of census attempts where \(c_i\) were captured (\(I=7\) in our case), and \(r\) be the total number captured (\(r = \sum_{i=1}^I m_i = 84\))
We consider a separate unknown capture probability for each census \((\alpha_1, \dots, \alpha_I)\) where the animals are equally ``catchable''
Then \[ L(N,\alpha | c,r) \propto \frac{N!}{(N-r)!} \prod_{i=1}^{I} \alpha_i ^{c_i} (1-\alpha_i) ^ {N-c_i} \]

Example: Capture-recapture

Assume \(N\) and \(\alpha\) are apriori independent with \[ f(N) \propto 1 \mbox{ and } f(\alpha_i | \theta_1, \theta_2) \stackrel{i.i.d.}{\sim} \mbox{Beta} (\theta_1, \theta_2) \]
We use \(\theta_1 = \theta_2 = 1/2\), which is the Jeffrey's Prior
The resulting posterior is proper when \(I>2\) and recommended when \(I>5\)

Example: Capture-recapture

Then it is easy to show the posterior is \[ f(N,\alpha | c,r) \propto \frac{N!}{(N-r)!} \prod_{i=1}^{I} \alpha_i ^{c_i} (1-\alpha_i) ^ {N-c_i} \prod_{i=1}^{I} \alpha_i^{-1/2} (1-\alpha_i)^{-1/2} . \]
Further, one can show \[ \begin{aligned} N - 84 | \alpha & \sim \mbox{NB} \left( 85, 1 - \prod_{i=1}^{I} (1-\alpha_i) \right) \mbox{ and }\\ \alpha_i | N & \sim \mbox{Beta} \left( c_i + 1/2 , N -c_i + 1/2 \right) \mbox{ for all }i \end{aligned} \]

Example: Capture-recapture

Then we can consider the chain \[ \left(N , \alpha\right) \rightarrow \left(N' , \alpha\right) \rightarrow \left(N' , \alpha'\right) \] or \[ \left(N , \alpha\right) \rightarrow \left(N , \alpha'\right) \rightarrow \left(N' , \alpha'\right) , \] where both involve a ``block'' update of \(\alpha\)

Example: Capture-recapture

First, we can write the data into R

captured <- c(30, 22, 29, 26, 31, 32, 35)
new.captures <- c(30, 8, 17, 7, 9, 8, 5)
total.r <- sum(new.captures)

Example: Capture-recapture

The following R code implements the Gibbs sampler

gibbs.chain <- function(n, N.start = 94, alpha.start = rep(.5,7)) {
    output <- matrix(0, nrow=n, ncol=8)
    for(i in 1:n){
        neg.binom.prob <- 1 - prod(1-alpha.start)
        N.new <- rnbinom(1, 85, neg.binom.prob) + total.r
        beta1 <- captured + .5
        beta2 <- N.new - captured + .5
        alpha.new <- rbeta(7, beta1, beta2)
        output[i,] <- c(N.new, alpha.new)
        N.start <- N.new
        alpha.start <- alpha.new    
    }
    return(output)
}

MCMC output analysis

How can we tell if the chain is mixing well?

Trace plots or time-series plots
Autocorrelation plots
Plot of estimate versus Markov chain sample size
Effective sample size (ESS) \[ \text{ESS} (n) = \frac{n}{1+2 \sum_{k=1}^{\infty} \rho_{k}(g)}, \] where \(\rho_{k}(g)\) is the autocorrelation of lag \(k\) for \(g\)
Alternative, ESS can be written as \[ \text{ESS} (n) = \frac{n}{\sigma ^2 / \text{Var} g} \] where \(\sigma ^2\) is the asymptotic variance from a Markov chain CLT

Example: Capture-recapture

Then we consider some preliminary simulations to ensure the chain is mixing well

trial <- gibbs.chain(1000)
plot.ts(trial[,1], main = "Trace Plot for N")
for(i in 1:7){
    plot.ts(trial[,(i+1)], main = paste("Alpha", i))
    }

acf(trial[,1], main = "Lag Plot for N")
for(i in 1:7){
    acf(trial[,(i+1)], main = paste("Lag Alpha", i))
    }

Example: Capture-recapture

Now for a more complete simulation to estimate posterior means and a 90% Bayesian credible region

sim <- gibbs.chain(10000)
N <- sim[,1]
alpha1 <- sim[,2]

Example: Capture-recapture

par(mfrow=c(1,2))
hist(N, freq=F, main="Estimated Marginal Posterior for N")
hist(alpha1, freq=F, main ="Estimating Marginal Posterior for Alpha 1")

par(mfrow=c(1,1))

Example: Capture-recapture

library(mcmcse)

## mcmcse: Monte Carlo Standard Errors for MCMC
## Version 1.3-2 created on 2017-07-03.
## copyright (c) 2012, James M. Flegal, University of California, Riverside
##                     John Hughes, University of Colorado, Denver
##                     Dootika Vats, University of Warwick
##                     Ning Dai, University of Minnesota
##  For citation information, type citation("mcmcse").
##  Type help("mcmcse-package") to get started.

ess(N)

## [1] 8736.499

ess(alpha1)

## [1] 8702.875

Example: Capture-recapture

par(mfrow=c(1,2))
estvssamp(N)
estvssamp(alpha1)

par(mfrow=c(1,1))

Example: Capture-recapture

mcse(N)

## $est
## [1] 89.4245
## 
## $se
## [1] 0.02909663

mcse.q(N, .05)

## $est
## [1] 85
## 
## $se
## [1] 0.04338107

mcse.q(N, .95)

## $est
## [1] 94
## 
## $se
## [1] 0.06135594

Example: Capture-recapture

mcse(alpha1)

## $est
## [1] 0.3376739
## 
## $se
## [1] 0.0005387394

mcse.q(alpha1, .05)

## $est
## [1] 0.2564481
## 
## $se
## [1] 0.0009462308

mcse.q(alpha1, .95)

## $est
## [1] 0.4224168
## 
## $se
## [1] 0.001145234