--- title: 'Distributions' author: "James M. Flegal" output: ioslides_presentation: smaller: yes --- ## Agenda - Random number generation - Distributions in R - Distributional Models - Moments, generalized moments, likelihood - Visual comparisons and basic testing ## Random number generation How does R get “random” numbers? - It doesn’t, instead it uses pseudorandom numbers that we hope are indistinguishable from random numbers - Pseudorandom generators produce a deterministic sequence that is indistinguishable from a true random sequence if you don’t know how it started - Why? Truly random numbers are expensive ## Random number generation - Pseudorandom generators produce a sequence of Uniform$(0,1)$ random variates - [Linear congruential generator](https://en.wikipedia.org/wiki/Linear_congruential_generator) is one of the oldest and best-known pseudorandom number generator algorithms - Other distributions usually based such a sequence - Example: If $U \sim \text{Unif}(0,1)$ then $-\beta \ln (1-U) \sim \text{Exp}(\beta)$ - More on this later ... ## Example: `runif()` ```{r} runif(1:10) set.seed(10) runif(1:10) set.seed(10) runif(1:10) ``` ## Linear congruential generator - Easily implemented and fast - Modulo arithmetic via storage-bit truncation - Defined as \[ X_{n+1}=\left(aX_{n}+c\right)~~{\bmod {~}}~m \] where $X$ is the sequence of pseudorandom values ## Linear congruential generator ```{r} seed <- 10 new.random <- function (a=5, c=12, m=16) { out <- (a*seed + c) %% m seed <<- out return(out) } out.length <- 20 variates <- rep (NA, out.length) for (kk in 1:out.length) variates[kk] <- new.random() variates ``` Unfortunately, this sequence has period 8 ## Linear congruential generator ```{r} variates <- rep (NA, out.length) for (kk in 1:out.length) variates[kk] <- new.random(a=131, c=7, m=16) variates ``` ```{r} variates <- rep (NA, out.length) for (kk in 1:out.length) variates[kk] <- new.random(a=129, c=7, m=16) variates ``` ## Linear congruential generator ```{r} variates <- rep (NA, out.length) for (kk in 1:out.length) variates[kk] <- new.random(a=1664545, c=1013904223, m=2^32) variates ``` ## How to distinguish non-randomness? - Look at period - Missing some values - Proper distribution in the limit - Autocorrelation - Gets harder for higher dimensions ## Distributions - Beta, Binomial, Cauchy, Chi-Square, Exponential, F, Gamma, Geometric, Hypergeometric, Logistic, Log Normal, Negative Binomial, Normal, Poisson, Student t, Studentized Range, Uniform, Weibull, Wilcoxon Rank Sum Statistic, Wilcoxon Signed Rank Statistic - Parameters of these distributions may not agree with textbooks - Lots of distributions, but they all work the same way ## Distributions Every distribution that R handles has four functions. There is a root name, for example, the root name for the normal distribution is `norm`. This root is prefixed by one of the letters - `p` for "probability", the cumulative distribution function (c. d. f.) - `q` for "quantile", the inverse c. d. f. - `d` for "density", the density function (p. f. or p. d. f.) - `r` for "random", a random variable having the specified distribution ## Distributions - Beta, Binomial, Cauchy, Chi-Square, Exponential, F, Gamma, Geometric, Hypergeometric, Logistic, Log Normal, Negative Binomial, Normal, Poisson, Student t, Studentized Range, Uniform, Weibull, Wilcoxon Rank Sum Statistic, Wilcoxon Signed Rank Statistic - `beta`, `binom`, `cauchy`, `chisq`, `exp`, `f`, `gamma`, `geom`, `hyper`, `logis`, `lnorm`, `nbinom`, `norm`, `pois`, `t`, `tukey`, `unif`, `weibull`, `wilcox`, `signrank` - Exponential distribution has `pexp`, `qexp`, `dexp`, and `rexp` ## Exponential distribution ```{r} this.range <- seq(0, 8, .05) plot (this.range, dexp(this.range), ty="l", main="Exponential Distributions", xlab="x", ylab="f(x)") lines (this.range, dexp(this.range, rate=0.5), col="red") lines (this.range, dexp(this.range, rate=0.2), col="blue") ``` ## Exponential distribution ```{r} this.range <- seq(0, 8, .05) plot (this.range, pexp(this.range), ty="l", main="Exponential Distributions", xlab="x", ylab="P(X