- Course overview
- Class objectives
- First 'presentations' on Thursday!
- Motivating example
- Review from undergraduate Mathematical Statistics
- Software tutorials
This class is an introduction to Bayesian statistics including "subjective probability, Renyi axiom system, Savage axioms, coherence, Bayes theorem, credibility intervals, Lindley paradox, empirical Bayes estimation, natural conjugate priors, de Finetti's theorem, approximation methods, Bayesian bootstrap, Bayesian computer programs".
The class will be taught in the R language.
There will approximately eight participation / presentation exercises, five homework, and a final exam. Grades will be calculated as follows:
Strongly recommended books:
Other books that may be helpful:
There are many online resources for learning about it and working with it, in addition to the textbooks:
You are encouraged to discuss course material, including assignments, with your classmates. All work you turn in, however, must be your own. This includes both writing and code. Copying from other students, from books, or from websites (1) does nothing to help you learn, (2) is easy to detect, and (3) has serious negative consequences.
sample()
function in R
and weight away from participants recently chosen.names <- c("VJ", "Lauren", "Cangao", "Yuting", "Ying", "Huiling", "Samantha", "Jinhui", "Song", "Mi") count <- rep(1, length(names)) chosen <- sample(names, 1, prob=1/count) chosen
## [1] "Lauren"
count[chosen==names] <- count[chosen==names] + 1 # +1 subject to change count
## [1] 1 2 1 1 1 1 1 1 1 1
like
or dislike
about Bayesian statistics? Are you a Bayesian? What are some lurking complexities? What did you find interesting in reading Chapter 1? What was interesting in Efron's paper?This new probability distribution, called the posterior distribution, describes knowledge about \(\theta\) and is the fundamental tool in Bayesian statistical analysis. Typically, we use computer simulations to approximate the posterior distribution. Occasionally, we can find it mathematically such as in this example.
A 95% equal-tailed probability interval for \(\theta\) is then (0.08,0.10).
The data changes your uncertainty, which is then described by a new probability distribution called your posterior distribution
Most of Bayesian inference is about how to go from prior to posterior
Suppose the prior distribution for \(p\) is Beta(\(\alpha_1, \alpha_2\)) and the conditional distribution of \(x\) given \(p\) is Bin(\(n\), \(p\)). Then \[ f(x|p) = {n \choose p} p^x (1-p)^{n-x} \] and \[ g(p) = \frac{\Gamma(\alpha_1 + \alpha_2)}{\Gamma(\alpha_1)\Gamma(\alpha_2)} p^{\alpha_1 -1} (1-p)^{\alpha_2 - 1}. \] Then \[ f(x|p) g(p) = {n \choose p} \frac{\Gamma(\alpha_1 + \alpha_2)}{\Gamma(\alpha_1)\Gamma(\alpha_2)} p^{x + \alpha_1 -1} (1-p)^{n - x + \alpha_2 - 1} \] and this, considered as a function of \(p\) for fixed \(x\) is, except for constants, the PDF of a Beta(\(x + \alpha_1, n - x + \alpha_2\)) distribution. So that is the posterior.
Why? \[ \begin{aligned} h (p | x) & = \frac{f( x | p) g(p)}{\int f( x | p) g(p) d p} \\ & \propto f( x | p) g(p) \\ & = {n \choose p} \frac{\Gamma(\alpha_1 + \alpha_2)}{\Gamma(\alpha_1)\Gamma(\alpha_2)} p^{x + \alpha_1 -1} (1-p)^{n - x + \alpha_2 - 1} \\ & \propto p^{x + \alpha_1 -1} (1-p)^{n - x + \alpha_2 - 1} \end{aligned} \] And there is only one PDF with support \([0,1]\) of that form, i.e. a Beta(\(x + \alpha_1, n - x + \alpha_2\)) distribution. So that is the posterior.
\[ \text{likelihood } × \text{ unnormalized prior } = \text{ unnormalized posterior} \]
In our example we could have multiplied likelihood \[ p^x (1-p)^{n-x} \] times unnormalized prior \[ p^{\alpha_1 -1} (1-p)^{\alpha_2 - 1} \] to get unnormalized posterior \[ p^{x + \alpha_1 -1} (1-p)^{n - x + \alpha_2 - 1} \] which, as before, can be recognized as an unnormalized beta PDF.
Suppose \(X_1 , \dots, X_n\) are i.i.d. \(N(\theta , \sigma^2)\) where \(\sigma^2\) is known. Suppose further we have a prior \(\theta \sim N(\mu, \tau^2)\). Then the posterior can be obtained as follows, \[ \begin{aligned} f (\theta | x) & \propto f(\theta) \prod_{i=1}^n f(x_i | \theta) \\ & \propto \exp \left \{ -\frac{1}{2} \left( \frac{(\theta-\mu)^2}{\tau^2} + \frac{\sum_{i=1}^{n} (x_i - \theta)^2}{\sigma^2} \right) \right\} \\ & \propto \exp \left \{ -\frac{1}{2} \frac{\left( \theta - \displaystyle \frac{\mu / \tau^2 + n\bar{x} / \sigma^2}{1/\tau^2 + n/\sigma^2} \right)^2}{\displaystyle \frac{1}{1/\tau^2 + n/\sigma^2}} \right\}. \end{aligned} \]
Or \(f(\theta | x) \sim N( \mu_n, \tau_n^2)\) where \[ \mu_n = \left( \frac{\mu}{\tau^2} + \frac{n \bar{x}}{\sigma^2} \right) \tau_n ^2 \quad \mbox{and} \quad \tau_n^2 = \frac{1}{1/\tau^2 + n/\sigma^2} . \] We will call this a conjugate Bayes model. Also note a 95% credible region for \(\theta\) is given by (this is also the HPD, highest posterior density) \[ \left( \mu_n - 1.96 \tau_n, \mu_n + 1.96 \tau_n \right) . \] For large \(n\), the data will overwhelm the prior.
R
or a package in R
, JAGS, MATLAB, OpenBUGS, or Stan.mcmc
and mcmcse
R
packages will be presented during Week 5 (potentially Week 4). My tutorial will be made using Markdown.