- Overview of
mcmcse
Package - Diasorin Example
- Confidence Regions
- Effective Sample Size
- Graphical Diagnostics
- Homework
mcmcse
Packagemcmcse
provides estimates of Monte Carlo standard errors for MCMC.set.seed(1) n1 <- 19 n2 <- 15 low <- c(91, 46, 95, 60, 33, 410, 105, 43, 189,1097, 54,178, 114, 137, 233, 101, 25,70,357) normal <- c(370, 267, 99,157, 75,1281, 48, 298, 268, 62,804,430,171,694,404) diasorin <- c(low,normal) group <- c(rep(1,n1),rep(2,n2)) group1 <- factor(group, 1:2, c('Low','Normal'))
gibbs <- function(reps, mu, tau, y.bar, s2, n, a, b, c, d){ output <- matrix(0, nrow=reps, ncol=2) a.gamma <- c+n/2 for(i in 1:reps){ mu.hat <- n * tau / (n * tau + b) * y.bar + b / (n * tau + b) * a sd.norm <- 1/sqrt(n*tau + b) mu <- rnorm(1, mu.hat, sd.norm) b.gamma <- d + ((n-1)*s2 + (y.bar - mu)^2)/2 tau <- rgamma(1, a.gamma, b.gamma) output[i,] <- c(mu, tau) } return(output) }
N <- 50000 y.bar.l <- mean(log(low)) s2.l <- var(log(low)) y.bar.n <- mean(log(normal)) s2.n <- var(log(normal)) sample.l <- gibbs(N, mu=5, tau=1, y.bar = y.bar.l, s2 = s2.l, n = n1, a=4.87, b=0.00288, c=1.0376, d=0.001) sample.n <- gibbs(N, mu=5, tau=1, y.bar = y.bar.n, s2 = s2.n, n = n2, a=5.39, b=0.00280, c=1.04653, d=0.001)
mcmcse
package the rows of the MCMC output should store each iteration of the algorithm. Thus the output should have \(N\) rows and \(p\) columns.chain <- cbind(sample.l, sample.n) colnames(chain) <- c("mu1", "tau1", "mu2", "tau2") head(chain)
## mu1 tau1 mu2 tau2 ## [1,] 4.561362 1.3286643 5.209149 1.1472458 ## [2,] 4.538759 1.9654129 5.290550 1.3588665 ## [3,] 4.758976 0.9519949 5.562766 1.3602247 ## [4,] 4.635779 1.2538313 5.580864 1.6013328 ## [5,] 4.642499 1.9271081 5.333285 1.2905995 ## [6,] 4.769481 1.0219264 5.534604 0.9064198
R
can can obtain \(\nu_n\) using the usual colMeans
function. If \(p=1\), then we can use mean
instead of colMeans
.colMeans(chain)
## mu1 tau1 mu2 tau2 ## 4.704665 1.315990 5.488848 1.295256
R
by creating a function that takes a vector argument.g <- function(chain){ chain[4]/chain[2] }
gofy <- apply(chain, 1, g) mean(gofy)
## [1] 1.088283
mcse
, mcse.mat
, mcse.multi
, and mcse.initseq
estimate the Monte Carlo standard error of \(\nu_{n}\) or \(\nu_{g,n}\).
mcse
gives consistent estimates of the square root of \(\Sigma/n\) (standard error) when \(\Sigma\) is \(1 \times 1\).mcse.mat
gives consistent estimates of the square root of the diagonals of \(\Sigma/n\).mcse.multi
gives consistent estimates of \(\Sigma\).mcse.initseq
gives asymptotically conservative estimates of \(\Sigma\) using initial sequence estimators.mcmcse
Packagelibrary(mcmcse)
## mcmcse: Monte Carlo Standard Errors for MCMC ## Version 1.3-2 created on 2017-07-03. ## copyright (c) 2012, James M. Flegal, University of California, Riverside ## John Hughes, University of Colorado, Denver ## Dootika Vats, University of Warwick ## Ning Dai, University of Minnesota ## For citation information, type citation("mcmcse"). ## Type help("mcmcse-package") to get started.
mcmcse
Packagemcerror_bm <- mcse.multi(x = chain, method = "bm", size = "sqroot", g = NULL, level = .95, large = FALSE) mcerror_bart <- mcse.multi(x = chain, method = "bartlett", size = "cuberoot", g = NULL, level = .95, large = FALSE) mcerror_tuk <- mcse.multi(x = chain, method = "tukey", size = "sqroot", g = NULL, level = .95, large = FALSE) mcerror_is <- mcse.initseq(x = chain, g = NULL, level = .95, adjust = FALSE) mcerror_isadj <- mcse.initseq(x = chain, g = NULL, level = .95, adjust = TRUE)
x
takes the \(n \times p\) MCMC data. x
can take only numeric entries in the form of a matrix or data frame. The rows of x
are the iterations of the MCMC.method = ‘‘bm’’, ‘‘bartlett’’, ‘‘tukey’’
calculates the estimate using the batch means method and spectral variance methods with the modified-Bartlett and Tukey-Hanning windows.size
is the batch size for the bm
method and the truncation point for tukey
and bartlett
methods. size = ‘‘sqroot’’
sets the size
as \(\lfloor \sqrt{n} \rfloor\) and size = ‘‘cuberoot’’
sets it at \(\lfloor n^{1/3} \rfloor\). An integer value of size less than \(n\) is also valid.g
is a function that is applied to each row of x
and represents the features of interest of the process. Since here we are interested in only means, g
is NULL
.level
is the confidence level of the resulting confidence region. This is required to calculate the volume of the confidence region.large
is a logical argument. If large
is TRUE
the volume of the confidence region is the large sample volume obtained using \(\chi^2\) critical values. By default, volume is calculated using \(F\) distribution critical values.adjust
is a logical argument only used for the mcse.initseq
function. If adjust
is TRUE
, the eigenvalues of the initial sequence estimator are increased slightly.mcse.multi
and mcse.initseq
return a list with multiple components.cov
stores the estimate of \(\Sigma\) obtained using the method chosen, vol
returns the volume to the \(p\)th root of the resulting confidence region, est
stores the estimate of g
applied on the Markov chain and nsim
stores the arguments used to calculate \(\Sigma\).mcse.multi
also returns size
which indicates the size of batches/truncation, method
used, and whether a large
sample volume is returned.mcse.initseq
also returns cov.adj
, vol.adj
, and whether an adjusted estimator was used (adjust
).mcerror_bm$cov
## [,1] [,2] [,3] [,4] ## [1,] 0.0394643598 0.001375493 0.002379704 -0.0008686777 ## [2,] 0.0013754926 0.185400330 -0.006510334 0.0272105676 ## [3,] 0.0023797040 -0.006510334 0.058530865 -0.0049212774 ## [4,] -0.0008686777 0.027210568 -0.004921277 0.1957994103
mcerror_bart$cov
## [,1] [,2] [,3] [,4] ## [1,] 0.0457547735 -0.002134084 0.001193809 -0.0003560083 ## [2,] -0.0021340836 0.162503174 -0.000597813 0.0028188956 ## [3,] 0.0011938092 -0.000597813 0.056796737 -0.0010611261 ## [4,] -0.0003560083 0.002818896 -0.001061126 0.1932724203
mcerror_tuk$cov
## [,1] [,2] [,3] [,4] ## [1,] 0.0421549392 0.002086938 0.0005128396 -0.003337617 ## [2,] 0.0020869382 0.176423621 -0.0035763980 0.012953344 ## [3,] 0.0005128396 -0.003576398 0.0532081937 -0.001401419 ## [4,] -0.0033376174 0.012953344 -0.0014014185 0.197610425
mcerror_is$cov
## [,1] [,2] [,3] [,4] ## [1,] 0.0445562278 -0.0006880102 0.0010955859 -0.0017663968 ## [2,] -0.0006880102 0.1653666228 -0.0008183728 0.0025448181 ## [3,] 0.0010955859 -0.0008183728 0.0564524841 0.0002125322 ## [4,] -0.0017663968 0.0025448181 0.0002125322 0.1947821982
mcerror_isadj$cov.adj
## [,1] [,2] [,3] [,4] ## [1,] 0.0448840031 -0.0009320214 0.0004870070 -0.0011130623 ## [2,] -0.0009320214 0.1656303983 -0.0005832360 0.0019029607 ## [3,] 0.0004870070 -0.0005832360 0.0581606915 -0.0005879199 ## [4,] -0.0011130623 0.0019029607 -0.0005879199 0.1963788341
rbind(mcerror_bm$est, mcerror_bart$est, mcerror_tuk$est, mcerror_is$est, mcerror_isadj$est)
## mu1 tau1 mu2 tau2 ## [1,] 4.704665 1.31599 5.488848 1.295256 ## [2,] 4.704665 1.31599 5.488848 1.295256 ## [3,] 4.704665 1.31599 5.488848 1.295256 ## [4,] 4.704665 1.31599 5.488848 1.295256 ## [5,] 4.704665 1.31599 5.488848 1.295256
c(mcerror_bm$vol, mcerror_bart$vol, mcerror_tuk$vol, mcerror_is$vol, mcerror_isadj$vol)
## [1] 0.006425778 0.006328959 0.006289714 0.006323004 0.006323004
mcse
and mcse.mat
return \(\sigma^2_{ii} / \sqrt{n}\).mcse
does it for one component and mcse.mat
does it for all diagonals.mcse(x = chain[,1], method = "bm", g = NULL)
## $est ## [1] 4.704665 ## ## $se ## [1] 0.0008884184
mcse.mat(x = chain, method = "bm", g = NULL)
## est se ## mu1 4.704665 0.0008884184 ## tau1 1.315990 0.0019256185 ## mu2 5.488848 0.0010819507 ## tau2 1.295256 0.0019788856
R
function g
we had defined before.g
should be a function that takes vector inputs.g
## function(chain){ ## chain[4]/chain[2] ## } ## <bytecode: 0x7fbe3ac4e3c8>
mcerror_g_bm <- mcse.multi(x = chain, g = g) mcerror_g_is <- mcse.initseq(x = chain, g = g)
c(mcerror_g_bm$cov, mcerror_g_is$cov)
## [1] 0.2822692 0.2912683
c(sqrt(mcerror_g_bm$cov/N), sqrt(mcerror_g_is$cov/N))
## [1] 0.002376002 0.002413580
confRegion
in the package, the user can create joint confidence regions for two parameters.mcse.multi
or mcse.initseq
function.cov
, est
, and nsim
from the output list.mcse.initseq
is input and adjust = TRUE
had been used, then cov.adj
is used instead of cov
.mcse.multi
also uses the attribute size
.plot(confRegion(mcerror_bm, which = c(1,2), level = .90), type = 'l', asp = 1, xlab="mu1", ylab="tau1") lines(confRegion(mcerror_bart, which = c(1,2), level = .90), col = "red")
mcse.initseq
this time.plot(confRegion(mcerror_is, which = c(1,2), level = .95), type = 'l', asp = 1, xlab="mu1", ylab="tau1") lines(confRegion(mcerror_is, which = c(1,2), level = .90), col = "red")
minESS
to ascertain what is the minimum effective sample size needed for stable analysis.p
is the dimension of the estimation problem.alpha
is the confidence level.eps
is the tolerance level. Default is .05. Reasonable levels are anywhere from .01 to .05. The smaller the tolerance, the larger the minimum effective samples. eps
represents a tolerance level relative to the variability in the target distribution. It is akin to the idea of margin-of-error.minESS(p = 4, alpha = .05, eps = .05)
## minESS ## 8431
minESS(p = 1, alpha = .05, eps = .05)
## minESS ## 6146
minESS
is independent of the Markov chain or process, and is only a function of the p
, alpha
, and eps
.minESS
and then sample their process until the required minimum samples are achieved.eps
tolerance level the number of estimated effective samples correspond to.minESS(p = 4, alpha = .05, ess = 1000)
## Epsilon ## 0.1451773
minESS(p = 1, alpha = .05, ess = 1000)
## Epsilon ## 0.123959
multiESS
and ess
are two functions that calculate the effective sample size of a correlated sample.ess
calculations are component-wise while multiESS
utilizes the multivariate nature of the problem.ess(chain)
## mu1 tau1 mu2 tau2 ## 55801.49 44545.19 49721.07 50076.02
multiESS(chain)
## [1] 50237.53
multiESS(chain, covmat = mcerror_bart$cov)
## [1] 50233.54
ess
produces a different estimate for each component, conservative practice dictates choosing the smallest of the values.multiESS
returns one estimate of the effective sample size based on the whole sample.mcse.multi
function to obtain a batch means estimate of \(\Sigma\). User can provide another estimate of \(\Sigma\) using the covmat
argument.multiESS
.estvssamp
plots the Monte Carlo estimates versus the sample size for a component of the MCMC output. This plot indicates whether the Monte Carlo estimate has stabilized.estvssamp(chain[,1])
mcse.multi
function, a QQ plot of the standardized estimates gives an idea of whether asymptopia has been achieved.mu1 <- sample.l[,1] tau1 <- sample.l[,2] mu2 <- sample.n[,1] tau2 <- sample.n[,2] mu.diff <- mu2-mu1 tau.ratio <- tau2/tau1 post.sum <- function(vec){ return(c(mean(vec), sd(vec), quantile(vec, probs=c(0.25, .5, .975)))) } sum.table <- rbind(post.sum(mu1), post.sum(mu2), post.sum(mu.diff), post.sum(tau1), post.sum(tau2), post.sum(tau.ratio)) rownames(sum.table) <- c("mu1", "mu2", "mu2-mu1", "tau1", "tau2", "tau2/tau1") colnames(sum.table) <- c("mean", "sd", "2.5%", "50%", "97.5%") round(sum.table, 3)
## mean sd 2.5% 50% 97.5% ## mu1 4.705 0.210 4.566 4.705 5.117 ## mu2 5.489 0.241 5.332 5.489 5.968 ## mu2-mu1 0.784 0.319 0.572 0.784 1.415 ## tau1 1.316 0.406 1.024 1.274 2.227 ## tau2 1.295 0.443 0.976 1.246 2.299 ## tau2/tau1 1.088 0.543 0.711 0.975 2.439
mcmcse
package to add MCSEs to the summary statistics in sum.table
.mcse.q
function for quantiles.method
and size
did you use? Why?mcmcse
package of your choosing. Explain carefully what the plot is illustrating.