---
title: "Stan Tutorial"
author: "Lauren Cappiello"
date: "2/8/2018"
output: pdf_document
fig_width: 5
fig_height: 3
---

## Installing Stan

**Prerequisites**: R (or RStudio)

**Toolchain**: Stan requires a toolchain to compile and run C++ code through R.

**Configuration**: get R ready to install Stan

**Installing Stan**: the RStan package

The following links will take you through the appropriate methods for your operating system:

- [\textcolor{blue}{Installing RStan on Mac or Linux}](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Mac-or-Linux)

- [\textcolor{blue}{Installing RStan on Windows}](https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Windows)

[\textcolor{blue}{Not an R user?}](http://mc-stan.org/users/interfaces/index.html)

## Loading Stan in R

The loading message will suggest some parallel processing options. Be aware that the default is to use all of your computer's cores. These lines of code are included (commented out).

```{r results='hide', message=FALSE, warning=FALSE}
library("rstan")
#rstan_options(auto_write = TRUE)
#options(mc.cores = parallel::detectCores())
```

## Stan Code

**Data Types**

- Basic types: real, int, vector, row_vector, matrix
- Constrained types: simplex, unit_vector, ordered, positive_ordered, corr_matrix, cov_matrix

**Bounded Variables**

- Can put upper and lower bounds on variables
- Ex: Let $\sigma^2$ = sigsq
- real\<lower=0\> sigsq;

**Program Blocks**

- data
- transformed data
- parameters
- transformed parameters
- model
- generated quantities

The only required program block is the model block. 

## Limitations

- No discrete parameters
- Missing data must be coded as parameters
- C++ template code can be difficult to work with
- Limited data types and constraints
- Sampling is relatively slow
- Some models may not work in Stan or may require extensive reparameterization
- "Black box" modeling 

## So what is Stan doing? 

Stan uses gradient-based MCMC for Bayesian inference, gradient-based variational methods for approximate Bayesian inference, and gradient-based optimization for penalized MLE. 

...but it's difficult to determine the details of what Stan does within these frameworks. 

## Eight Schools Example

This example can also be found in the "getting started" section for RStan.

- SAT scores from eight high schools with some estimated treatment effect and estimated standard error for each treatment effect
- The model of interest is $y_j = \theta + error = \mu + \tau \eta_j + error$, $j = 1,...,8$. 
- $\eta \sim N(0,1)$ and $y \sim N(\theta,\sigma^2)$ where $\sigma$ is the standard error of the effect measurements. This is included in the data. 
- The following code will estimate $\mu$, $\tau$, and $\eta_j$ (as well as $\theta_j = \mu + \tau \eta_j$). 

We start by writing Stan code as a character string.
Note: You may instead want to create separate Stan files. As the Stan input gets more complex, this is preferable to character strings.

```{r results='hide', message=FALSE, warning=FALSE}
eightschools <- "
data {
    int<lower=0> J;           // number of schools
    real y[J];                // estimated treatment effects
    real<lower=0> sigma[J];   // s.e. of effect estimates
}
parameters {
    real mu;                  // mean effect for schools
    real<lower=0> tau;        // variance
    real eta[J];              // individual school effect
}
transformed parameters {    // theta is a function of our parameters
    real theta[J];
    for (j in 1:J)
        theta[j] = mu + tau * eta[j];
}
model {
    target += normal_lpdf(eta | 0, 1);        // eta ~ N(0,1)
    target += normal_lpdf(y | theta, sigma);  // y ~ N(theta, sigma^2), theta(mu, tau, eta)
}"
```

\newpage First we input everything that appears in our data block as a list and then pass it to our `stan` function.

```{r results='hide', message=FALSE, warning=FALSE}
require("rstan")
set.seed(0)
schools_dat <- list(J = 8, 
                    y = c(28,  8, -3,  7, -1,  1, 18, 12),
                    sigma = c(15, 10, 16, 11,  9, 11, 10, 18))
fit <- stan(model_code = eightschools,
            data = schools_dat, iter = 10000, warmup = 100, chains = 4)
```

## The `stan` Function

`stan` function arguments

- `model_code`: the character string that contains your Stan code
- `file`: the filepath to a .stan file containing your Stan code
- `data`: the data to be passed to the `data` block
- `chains`: the number of Markov chains to run (default is 4)
- `iter`: the number of iterations for each chain (includes warmup)
- `warmup`: the number of warmup iterations per chain

The `stan` function can also handle previous fits, parameters of interest, instructions for thinning, initial values, and a seed. See the `stan` help file for details. 

## Results 

Printing the fit will automatically give the estimated mean, standard error of the mean, standard deviation, and a number of percentiles for each estimated paramter. Stan also reports an effective sample size and how well the chains are mixing ($\hat{R}$ close to $1$ indicate good mixing). 

```{r}
print(fit)
#summary(fit)
```

Plotting the fit gives default 80% and 95% confidence intervals for each parameter. These confidence levels can be adjusted based on user preference. I have set the main confidence interval to 95% and the outer interval to 99.9%. 

```{r}
plot(fit, ci_level = 0.95, outer_level = 0.999)
```

\newpage
## Diasorin Example

```{r}
diasorin <- "
data {
    int<lower=0> n1;  //n low
    int<lower=0> n2;  //n high
    real low[n1];     //low data of length n1
    real norm[n2];    //high data of length n2
}
parameters {
    real<lower=0> tau1;
    real<lower=0> tau2;
    real mu1;
    real mu2;
}
transformed parameters{
    real t1;
    real t2;
    t1 = 1.0/(tau1^(0.5));    //Stan uses standard deviation
    t2 = 1.0/(tau2^(0.5));
}
model { 
    mu1 ~ normal(4.87, 0.05366563);    //Priors
    mu2 ~ normal(5.39, 0.05291503);
    tau1 ~ gamma(1.0376, 0.001);
    tau2 ~ gamma(1.0465, 0.001);
    low ~ normal(mu1, t1);            //Target
    norm ~ normal(mu2, t2);
}"
```

## Pull Diasorin Stan Code Into R

```{r results='hide', message=FALSE, warning=FALSE}
set.seed(1)
dias.dat <- list(n1 = 19, n2 = 15,
                 low = 
                     log(c(91, 46,  95,  60, 33, 410, 105, 43, 189,1097, 54,178, 114, 137, 233, 101, 25,70,357)),
                 norm = log(c(370, 267, 99,157, 75,1281, 48, 298, 268, 62,804,430,171,694,404)))
fit2 <- stan(model_code = diasorin,
            data = dias.dat, iter = 10000, warmup = 100, chains = 4)
```

## Results

```{r}
print(fit2)
#summary(fit2)
```

`show_density` has the `plot` function give the density of the estimated parameters instead of just a confidence interval.

The density of $\mu_1$ based on the simulation:
```{r}
plot(fit2, show_density=TRUE, pars="mu1", ci_level=0.95, outer_level=0.999)
```

\newpage
The density of $\tau_1$ based on the simulation:
```{r}
plot(fit2, show_density=TRUE, pars="tau1", ci_level=0.95, outer_level=0.999)
```

\newpage
Trace plots of the fit show all four chains overlaid for each parameter. This should reflect our $\hat{R}$ values from our fit summary.
```{r}
traceplot(fit2)
```

Indeed, the chains appear to be mixing well for all parameters. 

## Additional Information About Stan

[\textcolor{blue}{More information about Stan}](http://mc-stan.org/users/interfaces/rstan)

[\textcolor{blue}{Stan Modeling Language User's Guide and Reference Manual}](https://github.com/stan-dev/stan/releases/download/v2.17.0/stan-reference-2.17.0.pdf)

[\textcolor{blue}{"Introduction to Bayesian Data Analysis and Stan with Andrew Gelman"}](https://www.youtube.com/watch?v=T1gYvX5c2sM) (and the [\textcolor{blue}{corresponding slides}](https://www.dropbox.com/s/sfi0pcf7hais91r/Gelman_Stan_talk.pdf)), a presentation by one of Stan's creators and core developers. Dr. Gelman discusses an example examining World Cup soccer rankings, another using geometry to model golf ball trajectories, and finally an example on relative birth rates per day of the year.