- Motivating Example
- Binomial Regression
- Trauma Data Example
- Posterior Sampling with
mcmc
Package - Exercise on O-Ring Data
mcmc
Package# Oring <- read.table("http://www.ics.uci.edu/~wjohnson/BIDA/Ch8/OringData.txt", # yheader=TRUE) Oring <- read.table("OringData.txt", header=TRUE) head(Oring)
## Flight Failure Temperature ## 1 14 1 53 ## 2 9 1 57 ## 3 23 1 58 ## 4 10 1 63 ## 5 1 0 66 ## 6 5 0 67
model1 <- glm(Failure ~ Temperature, data = Oring, family = binomial(), x = TRUE) summary(model1)
## ## Call: ## glm(formula = Failure ~ Temperature, family = binomial(), data = Oring, ## x = TRUE) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.0611 -0.7613 -0.3783 0.4524 2.2175 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 15.0429 7.3786 2.039 0.0415 * ## Temperature -0.2322 0.1082 -2.145 0.0320 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 28.267 on 22 degrees of freedom ## Residual deviance: 20.315 on 21 degrees of freedom ## AIC: 24.315 ## ## Number of Fisher Scoring iterations: 5
Dr. Turner Osler, a trauma surgeon and former head of the Burn Unit at the University of New Mexico Trauma Center, provided data on survival of patients admitted to the University of New Mexico Trauma Center between 1991 and 1994.
# trauma <- read.table("http://www.ics.uci.edu/~wjohnson/BIDA/Ch8/trauma300.txt, # yheader=TRUE) traumadata <- read.table("trauma300.txt", header=TRUE) attach(traumadata) head(traumadata)
## ID death ISS TI RTS AGE ## 1 2979 0 1 1 7.8408 25 ## 2 1167 0 9 0 7.8408 63 ## 3 116 0 29 0 2.9304 32 ## 4 1118 0 10 0 7.8408 61 ## 5 3126 0 2 1 7.8408 43 ## 6 2173 0 9 0 7.8408 27
Dr. Osler proposed a logistic regression model to estimate the probability of a patient's death using an intercept; the predictors ISS, RTS, AGE, and TI; and an interaction between AGE and TI. AGE is viewed as a surrogate for a patient's physiologic reserve, i.e., their ability to withstand trauma.
The model \[ \text{logit}(\theta_i) = \beta_1 + \beta_2 ISS_i + \beta_3 RTS_i + \beta_4 AGE_i + \beta_5 TI_i + \beta_6 (AGE*TI)_i \] is similar to logistic models used by trauma centers throughout the United States.
Introduce \(x_{i6} = (AGE*TI)_i\) and correct for the mean values of all the continuous predictor variables, then the model becomes \[ \text{logit}(\theta_i) = \alpha + \beta_2 x_{i2} + \beta_3 x_{i3} + \beta_4 x_{i4} + \beta_5 x_{i5} + \beta_6 x_{i6}. \]
x.check <- cbind(ISS-mean(ISS), RTS-mean(RTS), AGE-mean(AGE), TI, AGE*TI-mean(AGE*TI)) colnames(x.check) <- c("x2", "x3", "x4", "x5", "x6") head(x.check)
## x2 x3 x4 x5 x6 ## [1,] -13.28 0.553934 -6.41 1 17.633333 ## [2,] -5.28 0.553934 31.59 0 -7.366667 ## [3,] 14.72 -4.356466 0.59 0 -7.366667 ## [4,] -4.28 0.553934 29.59 0 -7.366667 ## [5,] -12.28 0.553934 11.59 1 35.633333 ## [6,] -5.28 0.553934 -4.41 0 -7.366667
model2 <- glm(death ~ x.check, family = binomial(), x = TRUE)
## ## Call: ## glm(formula = death ~ x.check, family = binomial(), x = TRUE) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -2.5388 -0.2921 -0.1912 -0.1135 2.9019 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -3.934673 0.596240 -6.599 4.14e-11 *** ## x.checkx2 0.082666 0.027508 3.005 0.00265 ** ## x.checkx3 -0.552683 0.170824 -3.235 0.00121 ** ## x.checkx4 0.053456 0.016382 3.263 0.00110 ** ## x.checkx5 1.338003 1.333584 1.003 0.31571 ## x.checkx6 -0.005013 0.031696 -0.158 0.87433 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 157.31 on 299 degrees of freedom ## Residual deviance: 100.03 on 294 degrees of freedom ## AIC: 112.03 ## ## Number of Fisher Scoring iterations: 7
x <- model2$x y <- model2$y lupost <- function(beta, x, y) { eta <- as.numeric(x %*% beta) logp <- ifelse(eta < 0, eta - log1p(exp(eta)), - log1p(exp(- eta))) logq <- ifelse(eta < 0, - log1p(exp(eta)), - eta - log1p(exp(- eta))) logl <- sum(logp[y == 1]) + sum(logq[y == 0]) return(logl - sum(beta^2) / 50) }
R
function log1p
, which calculates the function correctly for small \(x\) avoids all problems.mcmc
package and metrop
function to obtain posterior simulations.library(mcmc) set.seed(1) beta.init <- as.numeric(coefficients(model2)) out <- metrop(lupost, beta.init, 1e3, x = x, y = y) names(out)
## [1] "accept" "batch" "initial" "final" ## [5] "accept.batch" "initial.seed" "final.seed" "time" ## [9] "lud" "nbatch" "blen" "nspac" ## [13] "scale" "debug"
out$accept
## [1] 0
scale
input.out <- metrop(out, scale=0.1, x = x, y = y) out$accept
## [1] 0.01
out <- metrop(out, scale=0.03, x = x, y = y) out$accept
## [1] 0.223
metrop
function is the output of a previous invocation. The Markov chain continues where the previous run stopped, doing just what it would have done if it had kept going, the initial state and random seed being the final state and random seed of the previous invocation. Everything stays the same except for the arguments supplied (here scale
).scale
controls the size of the Metropolis normal random walk proposal.scale = 1
.metrop
is the output of the previous invocation), each run serves as “burn-in” for its successor.out <- metrop(out, nbatch = 1e4, x = x, y = y) out$accept
## [1] 0.1994
out$time
## user system elapsed ## 1.224 0.058 1.295
plot(ts(out$batch))
acf(out$batch[,1:3])
acf(out$batch[,4:6])
acf(out$batch)
metrop
function with the O-Ring Data.scale
and illustrate the correlation in your sampler.