--- title: 'Simulation' author: "James M. Flegal" output: ioslides_presentation: smaller: yes --- ## Agenda - Simulating from distributions - Quantile transform method - Rejection sampling ## Simulation Why simulate? - We want to see what a probability model actually does - We want to understand how our procedure works on a test case - We want to use a partly-random procedure All of these require drawing random variables from distributions ## Simulation We have seen R has built in distributions: beta, binom, cauchy, chisq, exp, f, gamma, geom, hyper, logis, lnorm, nbinom, norm, pois, t, tukey, unif, weibull, wilcox, signrank Every distribution that R handles has four functions. - p for "probability", the cumulative distribution function (c. d. f.) - q for "quantile", the inverse c. d. f. - d for "density", the density function (p. f. or p. d. f.) - r for "random", a random variable having the specified distribution ## Simulation - Usually, R gets Uniform$(0,1)$ random variates via a pseudorandom generator, e.g. the linear congruential generator - Uses a sequence of Uniform$(0,1)$ random variates to generate other distributions - How? ## Example: Binomial - Suppose we want to generate a Binomial$(1,1/3)$ using a $U \sim \text{Uniform}(0,1)$ - Consider the function X^* = I(00} \frac{f_x(x)}{f_y(x)}\\ & = \sup_{x>0} \frac{\frac{2}{\pi} \sqrt{x} e^{-x}}{\frac{1}{(n-1)!} x^{n-1} e^{-x}}\\ & = c \sup_{x>0} x^{-n+3/2} = \infty \end{aligned} \] since $n < 3/2 \text{ implies } x^{-n+3/2} \to \infty \text{ as } x \to \infty$ and $n > 3/2 \text{ implies } x^{-n+3/2} \to \infty \text{ as } x \to 0$ ## Example: Gamma - Hence, we need to be a little more creative with our proposal distribution - We could consider a mixture distribution. That is, iff_1(z)$and$f_2(z)$are both densities and$p \in [0,1]$. Then $p f_1(z) + (1-p) f_2(z)$ is also a density - Consider a proposal that is a mixture of a$\Gamma(1,1) = \text{Exp}(1)$and a$\Gamma(2,1), i.e. $f_y(y) = \left[ p e^{-y} + (1-p) y e^{-y} \right] I(0 < y < \infty)$ ## Example: Gamma Now, we have \begin{aligned} M & = \sup_{x>0} \frac{f_x(x)}{f_y(x)}\\ & = \sup_{x>0} \frac{\frac{2}{\pi} \sqrt{x} e^{-x}}{p e^{-x} + (1-p) x e^{-x}}\\ & = \frac{2}{\pi} \sup_{x>0} \frac{ \sqrt{x}}{p + (1-p) x} \\ & = \frac{2}{\pi} \frac{1}{2 \sqrt{p(1-p)}} \end{aligned} Exercise: Prove the last line, i.e. maximizeh(x) = \frac{ \sqrt{x}}{p + (1-p) x}$for$x>0$or$\log h(x)$. ## Example: Gamma - Note that$M$is minimized when$p=1/2$so that$M_{1/2} = 2 / \sqrt{\pi} \approx 1.1283$. - Then the accept-reject algorithm to simulate$X \sim \Gamma(3/2, 1)$is as follows 1. Draw$Y \sim f_y$with $f_y(y) = \left[ p e^{-y} + (1-p) y e^{-y} \right] I(0 < y < \infty)$ and and independently draw$U \sim \text{Uniform}(0,1)$2. If $u < \frac{2}{\sqrt{\pi}} \frac{f_x(y)}{f_y(y)}=\frac{2 \sqrt{y}}{1+y}$ set$X=Y$; otherwise return to 1 ## Simulating from mixtures - Write$f(z) = p f_1(z) + (1-p) f_2(z)$as the marginal of the joint given by $f(z | w) = f_1(z) I(w=1) + f_2(z) I(w=0)$ where$W \sim \text{Binomial}(1,p)$- Thus to simulate from$f(z)$1. Draw$U \sim \text{Uniform}(0,1)$2. If$u < p$take$Z \sim f_1(z)$; otherwise take$Z \sim f_2(z)$- Exercise: Show$Z \sim f(z)$## Example: Gamma {r} ar.gamma <- function(n=100){ x <- double(n) i <- 1 while(i < (n+1)) { u <- runif(1) if(u < .5){ y <- -1 * log(1-runif(1)) } else { y <- sum(-1 * log(1-runif(2))) } u <- runif(1) temp <- 2 * sqrt(y) / (1+y) if(u < temp){ x[i] <- y i <- i+1 } } return(x) }  ## Example: Gamma {r} x <- ar.gamma(10000) hist(x) mean(x)  ## Example: Gamma {r} true.x <- seq(0,10, .1) true.y <- dgamma(true.x, 3/2, 1) hist(x, freq=F, breaks=30, xlab="x", ylab="f(x)", main="Histogram and Theoretical") points(true.x, true.y, type="l", col="red", lw=2)  ## Example: Beta Suppose the pdf$f$is zero outside an interval$[c,d]$, and$\leq M$on the interval. {r echo=FALSE} plot(c(0,1), c(0,3), ty="n", main="A Sample Distribution", ylab="Density f(x)", xlab="x") curve (dbeta(x, 3, 6), add=TRUE) lines(c(0,0,1,1), c(0,3,3,0))  ## Example: Beta We know how to draw from uniform distributions in any dimension. Do it in two: {r} x1 <- runif(300, 0, 1); y1 <- runif(300, 0, 2.6); selected <- y1 < dbeta(x1, 3, 6)  {r echo=FALSE} plot(c(0,1), c(0,3), ty="n", main="A Sample Distribution", ylab="Density f(x)", xlab="x") curve (dbeta(x, 3, 6), add=TRUE) lines(c(0,0,1,1), c(0,3,3,0)) points (x1, y1, col=1+1*selected, cex=0.1)  ## Example: Beta {r} mean(selected) accepted.points <- x1[selected] mean(accepted.points < 0.5) pbeta(0.5, 3, 6)  ## Example: Beta For this to work efficiently, we have to cover the target distribution with one that sits close to it. {r} x2 <- runif(100000, 0, 1); y2 <- runif(100000, 0, 10); selected <- y2 < dbeta(x2, 3, 6) mean(selected)  ## Example: Beta {r echo=FALSE} plot(c(0,1), c(0,6), ty="n", main="A Sample Distribution", ylab="Density f(x)", xlab="x") curve (dbeta(x, 3, 6), add=TRUE) lines(c(0,0,1,1), c(0,3,3,0)) points (x2, y2, col=1+1*selected, cex=0.1)  ## Alternatives - Squeezed rejection sampling may help if evaluating$f$is expensive - Adaptive rejection sampling may help generate an envelope - ... ## Box-Muller - __Box-Muller transformation__ transform generates pairs of independent, standard normally distributed random numbers, given a source of uniformly distributed random numbers - Let$U \sim \text{Uniform}(0,1)$and$V \sim \text{Uniform}(0,1)\$ and set $$R=\sqrt{-2\log U} \hspace{10mm} \textrm{ and } \hspace{10mm} \theta = 2\pi V$$ - Then the following transformation yields two independent normal random variates $$X=R\cos(\theta) \hspace{10mm} \textrm{ and } \hspace{10mm} Y=R\sin(\theta)$$ ## Summary - Can transform uniform draws into other distributions when we can compute the distribution function + Quantile method when we can invert the CDF + The rejection method if all we have is the density - Basic R commands encapsulate a lot of this for us - Optimized algorithms based on distribution and parameter values ## Exercise: Box-Muller 1. Write a function named bmnormal that simulates n draws from Normal random variable with mean mu and standard deviation sd using the Box-Muller transformation. 2. Inputs to your function should be n, mu, and sd. 3. Simulate 2000 draws from a Normal with mean 10 and standard deviation 3. 4. Convince yourself with a plot your sampler is working correctly. Is there a test you could consider also?