Agenda

  • The Monte Hall Problem
  • Science, Priors, and Prediction
  • Statistical Models
  • Binomial Data (again)

Bayes' Theorem

  • For two events A and B the conditional probability of \(A\) given \(B\) is defined as \[ P(A|B) = \frac{P(A\cap B)}{P(B)}, \] where \(A\cap B\) denotes the intersection of \(A\) and \(B\). Let \(A^c\) denote the complement of \(A\).

  • Then Bayes Theorem allows us to compute \(P(A|B)\) from \(P(B|A)\), \(P(B|A^c)\), and \(P(A)\) via \[ P(A|B) = \frac{P(B|A) P(A)}{P(B|A)P(A) + P(B|A^c)P(A^c)}. \]

  • Direct result of the definition of conditional probability (numerator) and the Law of Total Probability (denominator).

The Monte Hall Problem

On the television show Let's Make a Deal, hosted by Monte Hall, the grand prize was awarded in the following manner. The prize was placed behind one of three doors. The contestant selected a door. Monte then showed the contestant what was behind one of the other two doors but it was never the grand prize. Finally, the contestant was allowed either to keep their initial choice or switch to the remaining unopened door.

Some people's intuition is that there is a 50/50 chance that the prize is behind either of the two remaining unopened doors, so it would not matter if you switch. In fact, the probability is 2/3 that the prize is behind the other door that Monte did not open. One intuitive way to arrive at this conclusion argues that you already know the prize is not behind one of the two doors you did not select and the fact that Monte showed you it was not behind one of them gives you no additional information. However, by switching from your initial choice, essentially you are being allowed to get both of the other two doors, and thus have a 2/3s chance of getting the prize. This argument is rather inexact, so we now give a careful argument using Bayes' Theorem.

The Monte Hall Problem

The Monte Hall Problem

  • Three variables involved, let \(P\) denote the door that contains the prize, let \(C\) denote the door that you initially chose, and let \(S\) denote the door that Monte shows you. We assume that the prize is randomly placed so \[ P(P = p) = 1/3 \text{ for } p = 1,2,3. \]
  • Prize placed prior to your choice of door, so it is independent of C and \[ P(P = p) = P(P=p | C=c) = 1/3 \text{ for all } p \text{ and } c. \]
  • What Monte shows you depends on where the prize is and what door you have chosen, so the door he shows you is selected according to a conditional probability \(P(S = s|P = p,C = c)\). Assume that you initially chose door number 1, so \(C = 1\).

The Monte Hall Problem

  • Monte never shows you the prize. If the prize is behind door number 1, Monte randomly picks either door 2 or door 3 and shows it to us. If the prize is behind door number 2, Monte shows us door 3. If the prize is behind door number 3, Monte shows us door 2.
  • Write \(f(s|p) = P(S=s | P=p, C=1)\), then
Summary of probabilities.
1 2 3
f(s|1) 0 0.5 0.5
f(s|2) 0 0.0 1.0
f(s|3) 0 1.0 0.0

The Monte Hall Problem

  • Using Bayes' Theorem \[ \begin{aligned} P(P=1|S=2) & = \frac{f(2|1) P(P=1)}{f(2|1) P(P=1)+f(2|2) P(P=2)+f(2|3) P(P=3)} \\ & = \frac{0.5 (1/3)}{0.5 (1/3) + 0 (1/3) + 1 (1/3)} \\ & = 1/3 \end{aligned} \] and \[ \begin{aligned} P(P=3|S=2) & = \frac{f(2|3) P(P=3)}{f(2|1) P(P=1)+f(2|2) P(P=2)+f(2|3) P(P=3)} \\ & = \frac{1 (1/3)}{0.5 (1/3) + 0 (1/3) + 1 (1/3)} \\ & = 2/3 . \end{aligned} \]

The Monte Hall Problem

  • Probability of getting the prize is 2/3 if we switch doors.
  • Similarly, if Monte shows us door number 3 the probability of getting the prize is 2/3 if we switch doors.
  • Similar results hold regardless of the initial choice C.
  • From the movie 21 and show Numb3rs
  • Also an \(n\)-stage Monte Hall problem

Science, Priors, and Prediction

  • Bayesian statistics starts by using (prior) probabilities to describe your current state of knowledge. It then incorporates information through the collection of data. This results in new (posterior) probabilities to describe your state of knowledge after combining the prior probabilities with the data.
  • All uncertainty and information are incorporated through probability distributions, and all conclusions obey the laws of probability theory.
  • Some form of prior 'information' is always available, but may not translate to a probability distribution. The 'information' also may not directly relate to the parameters of statistical models.
  • Typically, we obtain 'characteristics' from an expert about the population under study, then identify a mathematically convenient prior that agrees.
  • After statisticians develop such a prior distribution, they should always return to the expert to validate that the prior is a reasonable approximation to the expert's actual information.

Science, Priors, and Prediction

  • Although parameters are often mere conveniences, frequently the parameter \(\theta\) has some basis in physical reality.
  • Rather than describing where \(\theta\) really is, the prior describes beliefs about where \(\theta\) is.
  • Say \(\theta\) is the mean global temperature change in the last 50 years.
  • Different climate scientists will have different knowledge bases and therefore different probabilities for \(\theta\).
  • If they analyze the same data, they will continue to have different opinions about \(\theta\) until sufficient data are collected so that their beliefs converge and a consensus is reached.
  • This should occur unless one or more of them is unrealistically dogmatic.

Statistical Models

  • Statistical models are useful tools for scientific prediction.
  • Parameters \(\theta\) are often selected for convenience in building models that predict well.
  • Use of parameters is not a fundamental aspect of Bayesian analysis.
  • Relationship between parameters and observations may not be obvious.
  • Instead we can focus on observables (prediction) and parameters that are closely related to observables.
  • Before discussing prediction, we discuss posterior distributions for the parameters of a statistical model.

Statistical Models

  • Statistical models typically involve multiple observations (random variables), say, \(y_1, \dots, y_n\).
  • Observations are collected independently given the parameters of the model, which we denote \(\theta = (\theta_1, \dots, \theta_r)\).
  • Bayesian statistics begins with prior information about the state of nature \(\theta\) embodied in the prior density \(p(\theta)\).
  • Use Bayes' Theorem and the random data \(y\), with sampling density \(f(y|\theta)\), to update this information into a posterior density \(p(\theta|y)\) that incorporates both the prior information and the data.
  • Bayes' Theorem tells us that \[ p(\theta|y) = \frac{f(y|\theta)p(\theta)}{\int f(y|\theta)p(\theta) d \theta}. \]
  • Text illustrates these ideas with Binomial, Bernoulli, and normal data. In these cases the mathematics is relatively simple but by no means trivial.

Binomial Data

  • Suppose we are interested in assessing the proportion of U.S. transportation industry workers who use drugs on the job.
  • Let \(\theta\) denote this proportion and assume that a random sample of \(n\) workers is to be taken
  • The number of positive tests is denoted \(y\). In particular, we took \(n = 10\) samples and obtained \(y = 2\) positive test results.
  • Data obtained like this follow a Binomial distribution, where we write \[ y | \theta \sim \text{Bin}(n, \theta). \]
  • Recall, the discrete density function for \(y = 0, \dots, n\) is \[ f(y|\theta) = {n \choose y} \theta^y (1 - \theta)^{n-y}. \]

Beta Prior

  • It is convenient (but not necessary) to use a Beta distribution as a model for the prior.
  • We will later see the Beta distribution is conjugate to the Binomial distribution.
  • We will say \[ \theta \sim \text{Beta}(a,b), \] with density \[ p(\theta) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \theta^{a-1} (1 - \theta)^{b-1} I(0 \le \theta \le 1). \]
  • Hyperparameters \(a\) and \(b\) are selected to reflect the researcher's beliefs and uncertainty.

Posterior Distribution

  • The posterior distribution turns out to be another Beta distribution, specifically \[ \theta | y \sim \text{Beta}(y+a,n-y+b), \]
  • For the transportation example, we observed \(n = 10\) and \(y = 2\). Then if \(a = 3.44\) and \(b = 22.99\) (justified later) we have posterior \[ \theta | y \sim \text{Beta}(2+3.44,10-2+22.99) = \text{Beta}(5.44,30.99). \]

Posterior Distribution

## Warning: package 'tidyr' was built under R version 3.4.2