--- title: 'Distributions' author: "James M. Flegal" output: ioslides_presentation: smaller: yes --- ## Agenda - Random number generation - Distributions in R - Distributional Models - Moments, generalized moments, likelihood - Visual comparisons and basic testing ## Random number generation How does R get “random” numbers? - It doesn’t, instead it uses pseudorandom numbers that we hope are indistinguishable from random numbers - Pseudorandom generators produce a deterministic sequence that is indistinguishable from a true random sequence if you don’t know how it started - Why? Truly random numbers are expensive ## Random number generation - Pseudorandom generators produce a sequence of Uniform$(0,1)$ random variates - [Linear congruential generator](https://en.wikipedia.org/wiki/Linear_congruential_generator) is one of the oldest and best-known pseudorandom number generator algorithms - Other distributions usually based such a sequence - Example: If $U \sim \text{Unif}(0,1)$ then $-\beta \ln (1-U) \sim \text{Exp}(\beta)$ - More on this later ... ## Example: runif() {r} runif(1:10) set.seed(10) runif(1:10) set.seed(10) runif(1:10)  ## Linear congruential generator - Easily implemented and fast - Modulo arithmetic via storage-bit truncation - Defined as $X_{n+1}=\left(aX_{n}+c\right)~~{\bmod {~}}~m$ where $X$ is the sequence of pseudorandom values ## Linear congruential generator {r} seed <- 10 new.random <- function (a=5, c=12, m=16) { out <- (a*seed + c) %% m seed <<- out return(out) } out.length <- 20 variates <- rep (NA, out.length) for (kk in 1:out.length) variates[kk] <- new.random() variates  Unfortunately, this sequence has period 8 ## Linear congruential generator {r} variates <- rep (NA, out.length) for (kk in 1:out.length) variates[kk] <- new.random(a=131, c=7, m=16) variates  {r} variates <- rep (NA, out.length) for (kk in 1:out.length) variates[kk] <- new.random(a=129, c=7, m=16) variates  ## Linear congruential generator {r} variates <- rep (NA, out.length) for (kk in 1:out.length) variates[kk] <- new.random(a=1664545, c=1013904223, m=2^32) variates  ## How to distinguish non-randomness? - Look at period - Missing some values - Proper distribution in the limit - Autocorrelation - Gets harder for higher dimensions ## Distributions - Beta, Binomial, Cauchy, Chi-Square, Exponential, F, Gamma, Geometric, Hypergeometric, Logistic, Log Normal, Negative Binomial, Normal, Poisson, Student t, Studentized Range, Uniform, Weibull, Wilcoxon Rank Sum Statistic, Wilcoxon Signed Rank Statistic - Parameters of these distributions may not agree with textbooks - Lots of distributions, but they all work the same way ## Distributions Every distribution that R handles has four functions. There is a root name, for example, the root name for the normal distribution is norm. This root is prefixed by one of the letters - p for "probability", the cumulative distribution function (c. d. f.) - q for "quantile", the inverse c. d. f. - d for "density", the density function (p. f. or p. d. f.) - r for "random", a random variable having the specified distribution ## Distributions - Beta, Binomial, Cauchy, Chi-Square, Exponential, F, Gamma, Geometric, Hypergeometric, Logistic, Log Normal, Negative Binomial, Normal, Poisson, Student t, Studentized Range, Uniform, Weibull, Wilcoxon Rank Sum Statistic, Wilcoxon Signed Rank Statistic - beta, binom, cauchy, chisq, exp, f, gamma, geom, hyper, logis, lnorm, nbinom, norm, pois, t, tukey, unif, weibull, wilcox, signrank - Exponential distribution has pexp, qexp, dexp, and rexp ## Exponential distribution {r} this.range <- seq(0, 8, .05) plot (this.range, dexp(this.range), ty="l", main="Exponential Distributions", xlab="x", ylab="f(x)") lines (this.range, dexp(this.range, rate=0.5), col="red") lines (this.range, dexp(this.range, rate=0.2), col="blue")  ## Exponential distribution `{r} this.range <- seq(0, 8, .05) plot (this.range, pexp(this.range), ty="l", main="Exponential Distributions", xlab="x", ylab="P(X