STAT 203: Lecture 12

Agenda

Trauma Data Example Again
Parameter Estimation with MCSE
Reference Priors for Binomial Regression
O-Ring Data Presentations

Trauma Data

Dr. Turner Osler, a trauma surgeon and former head of the Burn Unit at the University of New Mexico Trauma Center, provided data on survival of patients admitted to the University of New Mexico Trauma Center between 1991 and 1994.

The predictor variables are injury severity score (ISS), revised trauma score (RTS), AGE, and type of injury (TI). TI is either blunt (\(TI = 0\)), e.g., the result of a car crash, or penetrating (\(TI = 1\)), e.g., gunshot or stab wounds.
ISS is an overall index of a patient's injuries based on the approximately 1,300 injuries cataloged in the Abbreviated Injury Scale. Takes on values from 0 for a patient with no injuries to 75 for a patient with severe injuries in three or more body areas.
RTS is an index of physiologic status derived from the Glasgow Coma Scale. It is a weighted average of a number of measurements on an incoming patient such as systolic blood pressure and respiratory rate. Takes on values from 0 for a patient with no vital signs to 7.84 for a patient with normal vital signs.

Trauma Data

Data available here.

# trauma <- read.table("http://www.ics.uci.edu/~wjohnson/BIDA/Ch8/trauma300.txt, 
#                      yheader=TRUE)
traumadata <- read.table("trauma300.txt", header=TRUE)
attach(traumadata)
head(traumadata)

##     ID death ISS TI    RTS AGE
## 1 2979     0   1  1 7.8408  25
## 2 1167     0   9  0 7.8408  63
## 3  116     0  29  0 2.9304  32
## 4 1118     0  10  0 7.8408  61
## 5 3126     0   2  1 7.8408  43
## 6 2173     0   9  0 7.8408  27

Trauma Data

Dr. Osler proposed a logistic regression model to estimate the probability of a patient's death using an intercept; the predictors ISS, RTS, AGE, and TI; and an interaction between AGE and TI. AGE is viewed as a surrogate for a patient's physiologic reserve, i.e., their ability to withstand trauma.

The model \[ \text{logit}(\theta_i) = \beta_1 + \beta_2 ISS_i + \beta_3 RTS_i + \beta_4 AGE_i + \beta_5 TI_i + \beta_6 (AGE*TI)_i \] is similar to logistic models used by trauma centers throughout the United States.

Introduce \(x_{i6} = (AGE*TI)_i\) and correct for the mean values of all the continuous predictor variables, then the model becomes \[ \text{logit}(\theta_i) = \alpha + \beta_2 x_{i2} + \beta_3 x_{i3} + \beta_4 x_{i4} + \beta_5 x_{i5} + \beta_6 x_{i6}. \]

Trauma Data

x.check <- cbind(ISS-mean(ISS), RTS-mean(RTS), AGE-mean(AGE), TI, AGE*TI-mean(AGE*TI))
colnames(x.check) <- c("x2", "x3", "x4", "x5", "x6")
head(x.check)

##          x2        x3    x4 x5        x6
## [1,] -13.28  0.553934 -6.41  1 17.633333
## [2,]  -5.28  0.553934 31.59  0 -7.366667
## [3,]  14.72 -4.356466  0.59  0 -7.366667
## [4,]  -4.28  0.553934 29.59  0 -7.366667
## [5,] -12.28  0.553934 11.59  1 35.633333
## [6,]  -5.28  0.553934 -4.41  0 -7.366667

model2 <- glm(death ~ x.check, family = binomial(), x = TRUE)

Binomial Regression Analysis

For our Bayesian analysis we assume the same data model as the frequentist, i.e. we continue to use the logit link function.
We assume the prior distribution of the parameters (the regression coefficients) makes them independent and identically normally distributed with mean 0 and variance \(\sigma^2 = 25\).

Unnormalized Posterior

The log unnormalized posterior density calculated by the following R function.

x <- model2$x
y <- model2$y
lupost <- function(beta, x, y) {
     eta <- as.numeric(x %*% beta)
     logp <- ifelse(eta < 0, eta - log1p(exp(eta)), - log1p(exp(- eta)))
     logq <- ifelse(eta < 0, - log1p(exp(eta)), - eta - log1p(exp(- eta)))
     logl <- sum(logp[y == 1]) + sum(logq[y == 0])
     return(logl - sum(beta^2) / 50)
}

Posterior Simulations

Use mcmc package and metrop function to obtain posterior simulations.
Following code runs the Metropolis algorithm to simulate the posterior.

library(mcmc)
set.seed(1)
beta.init <- as.numeric(coefficients(model2))
out <- metrop(lupost, beta.init, 5e4, scale=0.03, x = x, y = y)
out$accept

## [1] 0.19804

library(mcmcse)

## mcmcse: Monte Carlo Standard Errors for MCMC
## Version 1.3-2 created on 2017-07-03.
## copyright (c) 2012, James M. Flegal, University of California, Riverside
##                     John Hughes, University of Colorado, Denver
##                     Dootika Vats, University of Warwick
##                     Ning Dai, University of Minnesota
##  For citation information, type citation("mcmcse").
##  Type help("mcmcse-package") to get started.

Diagnostics

plot(ts(out$batch))

Diagnostics

ess(out$batch)

## [1] 228.1051 485.8939 269.3972 614.3843 225.2397 403.7840

multiESS(out$batch)

## [1] 968.6197

Effective sample sizes are small!
See highest correlation components 1, 3, and 5, which also have the smallest ESS.
Try to improve mixing by using a vector for scale.

Diagnostics

out <- metrop(out, scale=c(.1, .03, .1, .03, .1, .03), x = x, y = y)
out$accept

## [1] 0.17346

out$time

##    user  system elapsed 
##   5.736   1.297   7.062

Diagnostics

plot(ts(out$batch))

Diagnostics

ess(out$batch)

## [1] 265.4184 789.0774 650.0411 686.0127 232.3179 322.1590

multiESS(out$batch)

## [1] 1116.557

Getting better!

Autocorrelation Plots

acf(out$batch[,1:3])

Autocorrelation Plots

acf(out$batch[,4:6])

Effective Sample Size

Recall we can use the function minESS to ascertain what is the minimum effective sample size needed for stable analysis.
p is the dimension of the estimation problem.
alpha is the confidence level.
eps is the tolerance level. Default is .05. Reasonable levels are anywhere from .01 to .05. The smaller the tolerance, the larger the minimum effective samples. eps represents a tolerance level relative to the variability in the target distribution. It is akin to the idea of margin-of-error.

minESS(p = 6, alpha = .05, eps = .05)

## minESS 
##   8708

Basically, we want to run our chain at least 8 times longer to achieve this minimum ESS.

Longer Run

Run it 4 times longer to speed recompiling in R Markdown.

out <- metrop(out, nbatch=2e5, x = x, y = y)
out$accept

## [1] 0.177515

out$time

##    user  system elapsed 
##  22.780   5.017  27.955

Estimate and MCSE

ess(out$batch)

## [1]  573.0223 2089.4155 2142.8925 1234.9522  472.3659  626.1789

multiESS(out$batch)

## [1] 3401.668

round(mcse.mat(out$batch), 4)

##          est     se
## [1,] -4.1892 0.0259
## [2,]  0.0859 0.0006
## [3,] -0.5821 0.0038
## [4,]  0.0571 0.0005
## [5,]  1.6485 0.0667
## [6,] -0.0142 0.0015

Prior and Posterior

Reference Priors

General model has \(r\) predictor variables.
Can elicit expert knowledge about \(r\) probabilities \(\tilde{\theta}_h\) and induce a prior on \(\beta\).
But when \(r\) is large, this becomes difficult, e.g. for the trauma data with \(r = 6\).
Moreover, if the sample size \(n\) is large, it may not be important to develop informative priors since the data should dominate the prior.

Reference Priors

Two approaches to defining an informative prior, directly on \(\beta\) or inducing it from \(\tilde{\theta}_h\).
Also two approaches to defining a reference prior.
- Can put independent proper reference priors on the \(\tilde{\theta}_h\) and induce a proper reference prior on the \(\beta_j\)s. Here the proper reference priors for probabilities are Jeffreys' and the uniform.
- Historically, the improper flat prior \(p(\beta)=1\) has been used as a reference prior for regression coefficients. Corresponds for any to essentially splitting the prior probability evenly between 0 and 1.

Proper Reference Priors

Traditional method for defining an informative prior directly on \(\beta\) uses a multivariate normal distribution.
Multivariate normals are now also used as proper reference priors.
Common practice is to use independent \(\beta_j \sim N(0,b)\) distributions where \(b\) is large, often \(b = 10^6\).
Induces priors on any \(\tilde{\theta}_h\) that put half the prior probability very near 0 and the other half very near 1.
'As silly as these priors seem, they will usually not affect the posterior very much.'

Proper Reference Priors

Text suggests using independent normal priors but picking \(b\) so that the induced distributions on the \(\theta_i\)s are as close to uniform as we can make them.
Specifically, we suggest independent \(\beta_j \sim N(0, s^2_j / b)\) priors, where \(s_j\) is the sample standard deviation.
Select \(b\) to make the prior distributions on the \(\theta_i\)s as uniform as possible.

O-Ring Data Presentations

Write a Metropolis normal random walk sampler using the metrop function with the O-Ring Data.
Find a reasonable scale and illustrate the correlation in your sampler.
How long did you run your sampler? Why?
Group 1: Find predictive probabilities of O-ring failure along with 90% probability intervals. What do you observe for temperatures near the freezing point of 32 degrees?
Group 2: Perform a prior sensitivity analysis. Be sure to include the informative prior similar to that suggested in the text and at least one non-informative prior. How much impact does you prior selection have on posterior means, posterior medians, and credible intervals for parameters in the model?