--- title: 'Simulation' author: "James M. Flegal" output: ioslides_presentation: smaller: yes --- ## Agenda - Simulating from distributions - Quantile transform method - Rejection sampling ## Simulation Why simulate? - We want to see what a probability model actually does - We want to understand how our procedure works on a test case - We want to use a partly-random procedure All of these require drawing random variables from distributions ## Simulation We have seen R has built in distributions: `beta`, `binom`, `cauchy`, `chisq`, `exp`, `f`, `gamma`, `geom`, `hyper`, `logis`, `lnorm`, `nbinom`, `norm`, `pois`, `t`, `tukey`, `unif`, `weibull`, `wilcox`, `signrank` Every distribution that R handles has four functions. - `p` for "probability", the cumulative distribution function (c. d. f.) - `q` for "quantile", the inverse c. d. f. - `d` for "density", the density function (p. f. or p. d. f.) - `r` for "random", a random variable having the specified distribution ## Simulation - Usually, R gets Uniform$(0,1)$ random variates via a pseudorandom generator, e.g. the linear congruential generator - Uses a sequence of Uniform$(0,1)$ random variates to generate other distributions - How? ## Example: Binomial - Suppose we want to generate a Binomial$(1,1/3)$ using a $U \sim \text{Uniform}(0,1)$ - Consider the function $X^* = I(00} \frac{f_x(x)}{f_y(x)}\\ & = \sup_{x>0} \frac{\frac{2}{\pi} \sqrt{x} e^{-x}}{\frac{1}{(n-1)!} x^{n-1} e^{-x}}\\ & = c \sup_{x>0} x^{-n+3/2} = \infty \end{aligned} \] since \[ n < 3/2 \text{ implies } x^{-n+3/2} \to \infty \text{ as } x \to \infty \] and \[ n > 3/2 \text{ implies } x^{-n+3/2} \to \infty \text{ as } x \to 0 \] ## Example: Gamma - Hence, we need to be a little more creative with our proposal distribution - We could consider a mixture distribution. That is, if $f_1(z)$ and $f_2(z)$ are both densities and $p \in [0,1]$. Then \[ p f_1(z) + (1-p) f_2(z) \] is also a density - Consider a proposal that is a mixture of a $\Gamma(1,1) = \text{Exp}(1)$ and a $\Gamma(2,1)$, i.e. \[ f_y(y) = \left[ p e^{-y} + (1-p) y e^{-y} \right] I(0 < y < \infty) \] ## Example: Gamma Now, we have \[ \begin{aligned} M & = \sup_{x>0} \frac{f_x(x)}{f_y(x)}\\ & = \sup_{x>0} \frac{\frac{2}{\pi} \sqrt{x} e^{-x}}{p e^{-x} + (1-p) x e^{-x}}\\ & = \frac{2}{\pi} \sup_{x>0} \frac{ \sqrt{x}}{p + (1-p) x} \\ & = \frac{2}{\pi} \frac{1}{2 \sqrt{p(1-p)}} \end{aligned} \] Exercise: Prove the last line, i.e. maximize $h(x) = \frac{ \sqrt{x}}{p + (1-p) x}$ for $x>0$ or $\log h(x)$. ## Example: Gamma - Note that $M$ is minimized when $p=1/2$ so that $M_{1/2} = 2 / \sqrt{\pi} \approx 1.1283$. - Then the accept-reject algorithm to simulate $X \sim \Gamma(3/2, 1)$ is as follows 1. Draw $Y \sim f_y$ with \[ f_y(y) = \left[ p e^{-y} + (1-p) y e^{-y} \right] I(0 < y < \infty) \] and and independently draw $U \sim \text{Uniform}(0,1)$ 2. If \[ u < \frac{2}{\sqrt{\pi}} \frac{f_x(y)}{f_y(y)}=\frac{2 \sqrt{y}}{1+y} \] set $X=Y$; otherwise return to 1 ## Simulating from mixtures - Write $f(z) = p f_1(z) + (1-p) f_2(z)$ as the marginal of the joint given by \[ f(z | w) = f_1(z) I(w=1) + f_2(z) I(w=0) \] where $W \sim \text{Binomial}(1,p)$ - Thus to simulate from $f(z)$ 1. Draw $U \sim \text{Uniform}(0,1)$ 2. If $u < p$ take $Z \sim f_1(z)$; otherwise take $Z \sim f_2(z)$ - Exercise: Show $Z \sim f(z)$ ## Example: Gamma ```{r} ar.gamma <- function(n=100){ x <- double(n) i <- 1 while(i < (n+1)) { u <- runif(1) if(u < .5){ y <- -1 * log(1-runif(1)) } else { y <- sum(-1 * log(1-runif(2))) } u <- runif(1) temp <- 2 * sqrt(y) / (1+y) if(u < temp){ x[i] <- y i <- i+1 } } return(x) } ``` ## Example: Gamma ```{r} x <- ar.gamma(10000) hist(x) mean(x) ``` ## Example: Gamma ```{r} true.x <- seq(0,10, .1) true.y <- dgamma(true.x, 3/2, 1) hist(x, freq=F, breaks=30, xlab="x", ylab="f(x)", main="Histogram and Theoretical") points(true.x, true.y, type="l", col="red", lw=2) ``` ## Example: Beta Suppose the pdf $f$ is zero outside an interval $[c,d]$, and $\leq M$ on the interval. ```{r echo=FALSE} plot(c(0,1), c(0,3), ty="n", main="A Sample Distribution", ylab="Density f(x)", xlab="x") curve (dbeta(x, 3, 6), add=TRUE) lines(c(0,0,1,1), c(0,3,3,0)) ``` ## Example: Beta We know how to draw from uniform distributions in any dimension. Do it in two: ```{r} x1 <- runif(300, 0, 1); y1 <- runif(300, 0, 2.6); selected <- y1 < dbeta(x1, 3, 6) ``` ```{r echo=FALSE} plot(c(0,1), c(0,3), ty="n", main="A Sample Distribution", ylab="Density f(x)", xlab="x") curve (dbeta(x, 3, 6), add=TRUE) lines(c(0,0,1,1), c(0,3,3,0)) points (x1, y1, col=1+1*selected, cex=0.1) ``` ## Example: Beta ```{r} mean(selected) accepted.points <- x1[selected] mean(accepted.points < 0.5) pbeta(0.5, 3, 6) ``` ## Example: Beta For this to work efficiently, we have to cover the target distribution with one that sits close to it. ```{r} x2 <- runif(100000, 0, 1); y2 <- runif(100000, 0, 10); selected <- y2 < dbeta(x2, 3, 6) mean(selected) ``` ## Example: Beta ```{r echo=FALSE} plot(c(0,1), c(0,6), ty="n", main="A Sample Distribution", ylab="Density f(x)", xlab="x") curve (dbeta(x, 3, 6), add=TRUE) lines(c(0,0,1,1), c(0,3,3,0)) points (x2, y2, col=1+1*selected, cex=0.1) ``` ## Alternatives - Squeezed rejection sampling may help if evaluating $f$ is expensive - Adaptive rejection sampling may help generate an envelope - ... ## Box-Muller - __Box-Muller transformation__ transform generates pairs of independent, standard normally distributed random numbers, given a source of uniformly distributed random numbers - Let $U \sim \text{Uniform}(0,1)$ and $V \sim \text{Uniform}(0,1)$ and set $$ R=\sqrt{-2\log U} \hspace{10mm} \textrm{ and } \hspace{10mm} \theta = 2\pi V $$ - Then the following transformation yields two independent normal random variates $$ X=R\cos(\theta) \hspace{10mm} \textrm{ and } \hspace{10mm} Y=R\sin(\theta) $$ ## Summary - Can transform uniform draws into other distributions when we can compute the distribution function + Quantile method when we can invert the CDF + The rejection method if all we have is the density - Basic R commands encapsulate a lot of this for us - Optimized algorithms based on distribution and parameter values ## Exercise: Box-Muller 1. Write a function named `bmnormal` that simulates `n` draws from Normal random variable with mean `mu` and standard deviation `sd` using the Box-Muller transformation. 2. Inputs to your function should be `n`, `mu`, and `sd`. 3. Simulate 2000 draws from a Normal with mean 10 and standard deviation 3. 4. Convince yourself with a plot your sampler is working correctly. Is there a test you could consider also?