- Simulating from distributions
- Quantile transform method
- Rejection sampling
Why simulate?
All of these require drawing random variables from distributions
We have seen R has built in distributions: beta
, binom
, cauchy
, chisq
, exp
, f
, gamma
, geom
, hyper
, logis
, lnorm
, nbinom
, norm
, pois
, t
, tukey
, unif
, weibull
, wilcox
, signrank
Every distribution that R handles has four functions.
p
for "probability", the cumulative distribution function (c. d. f.)q
for "quantile", the inverse c. d. f.d
for "density", the density function (p. f. or p. d. f.)r
for "random", a random variable having the specified distributionmy.binom.1 <- function(n=1, p=1/3){ u <- runif(n) binom <- sum(u<p) return(binom) } my.binom.1(1000)
## [1] 354
my.binom.1(1000, .5)
## [1] 525
my.binom.2 <- function(n=1, p=1/3){ u <- runif(1) binom <- qbinom(u, size=n, prob=p) return(binom) } my.binom.2(1000)
## [1] 349
my.binom.2(1000, .5)
## [1] 493
Given \(U \sim \text{Uniform}(0,1)\) and CDF \(F\) from a continuous distribution. Then \(X = F^{-1}(U)\) is a random variable with CDF \(F\).
Proof: \[ P(X\le a) = P (F^{-1}(U) \le a) = P ( U \le F(a)) = F(a) \]
Suppose \(X \sim \text{Exp}(\beta)\). Then we have density \[ f(x) = \beta^{-1} e^{-x/\beta} I(0<x<\infty) \] and CDF \[ F(x) = 1 - e^{-x/\beta} \] Also \[ y = 1 - e^{-x/\beta} \text{ iff } -x/\beta = \log (1-y) \text{ iff } x = -\beta \log (1-y). \] Thus, \(F^{-1} (y) = -\beta \log(1-y)\).
So if \(U \sim \text{Uniform}(0,1)\), then \(F^{-1} (u) = -\beta \log(1-u) \sim \text{Exp}(\beta)\).
x <- runif(10000) y <- - 3 * log(1-x) hist(y)
mean(y)
## [1] 2.98428
true.x <- seq(0,30, .5) true.y <- dexp(true.x, 1/3) hist(y, freq=F, breaks=30) points(true.x, true.y, type="l", col="red", lw=2)
Theorem: Let \(X \sim f_x\) and \(Y \sim f_y\) where the two densities have common support. Define \[ M = \sup_{x} \frac{f_x(x)}{f_y(x)}. \] If \(M< \infty\) then we can generate \(X \sim f_x\) as follows,
Exercise: Why is \(M\ge 1\)?
Proof: \[ \begin{aligned} P(X \le x) & = P \left( Y \le x \; \middle| \; \text{STOP} \right)\\ & = P \left( Y \le x \; \middle| \; u \le \frac{f_x(y)}{M f_y(y)} \right) \\ & = \frac{P \left( Y \le x , u \le \frac{f_x(y)}{M f_y(y)} \right)}{P \left( u \le \frac{f_x(y)}{M f_y(y)} \right)} \\ & = \frac{A}{B} \end{aligned} \]
Now, we have \[ \begin{aligned} A & = P \left( Y \le x , u \le \frac{f_x(y)}{M f_y(y)} \right) \\ & = E \left[ P \left( Y \le x , u \le \frac{f_x(y)}{M f_y(y)} \right) \; \middle| \; y \right] \\ & = E \left[ I (y \le x) \frac{f_x(y)}{M f_y(y)} \right] \\ & = \int_{-\infty}^{\infty} I (y \le x) \frac{f_x(y)}{M f_y(y)} f_y(y) dy \\ & = \frac{1}{M} \int_{-\infty}^{x} f_x(y) dy = \frac{F_x(x)}{M} \end{aligned} \]
Similarly, we have \[ \begin{aligned} B & = P \left( u \le \frac{f_x(y)}{M f_y(y)} \right) \\ & = E \left[ P \left( u \le \frac{f_x(y)}{M f_y(y)} \right) \; \middle| \; y \right] \\ & = E \left[ \frac{f_x(y)}{M f_y(y)} \right] \\ & = \int_{-\infty}^{\infty} \frac{f_x(y)}{M f_y(y)} f_y(y) dy \\ & = \frac{1}{M} \int_{-\infty}^{\infty} f_x(y) dy = \frac{1}{M} \end{aligned} \]
Hence, \[ \begin{aligned} P(X \le x) & = \frac{A}{B} \\ & = \frac{\frac{F_x(x)}{M}}{\frac{1}{M}} = F_x(x) \end{aligned} \] And the proof is complete. That is, \(X \sim f_x\).
Now, we have \[ \begin{aligned} M & = \sup_{x>0} \frac{f_x(x)}{f_y(x)}\\ & = \sup_{x>0} \frac{\frac{2}{\pi} \sqrt{x} e^{-x}}{p e^{-x} + (1-p) x e^{-x}}\\ & = \frac{2}{\pi} \sup_{x>0} \frac{ \sqrt{x}}{p + (1-p) x} \\ & = \frac{2}{\pi} \frac{1}{2 \sqrt{p(1-p)}} \end{aligned} \] Exercise: Prove the last line, i.e. maximize \(h(x) = \frac{ \sqrt{x}}{p + (1-p) x}\) for \(x>0\) or \(\log h(x)\).
ar.gamma <- function(n=100){ x <- double(n) i <- 1 while(i < (n+1)) { u <- runif(1) if(u < .5){ y <- -1 * log(1-runif(1)) } else { y <- sum(-1 * log(1-runif(2))) } u <- runif(1) temp <- 2 * sqrt(y) / (1+y) if(u < temp){ x[i] <- y i <- i+1 } } return(x) }
x <- ar.gamma(10000) hist(x)
mean(x)
## [1] 1.482398
true.x <- seq(0,10, .1) true.y <- dgamma(true.x, 3/2, 1) hist(x, freq=F, breaks=30, xlab="x", ylab="f(x)", main="Histogram and Theoretical") points(true.x, true.y, type="l", col="red", lw=2)
Suppose the pdf \(f\) is zero outside an interval \([c,d]\), and \(\leq M\) on the interval.
We know how to draw from uniform distributions in any dimension. Do it in two:
x1 <- runif(300, 0, 1); y1 <- runif(300, 0, 2.6); selected <- y1 < dbeta(x1, 3, 6)
mean(selected)
## [1] 0.3833333
accepted.points <- x1[selected] mean(accepted.points < 0.5)
## [1] 0.8521739
pbeta(0.5, 3, 6)
## [1] 0.8554688
For this to work efficiently, we have to cover the target distribution with one that sits close to it.
x2 <- runif(100000, 0, 1); y2 <- runif(100000, 0, 10); selected <- y2 < dbeta(x2, 3, 6) mean(selected)
## [1] 0.09907
bmnormal
that simulates n
draws from Normal random variable with mean mu
and standard deviation sd
using the Box-Muller transformation.n
, mu
, and sd
.