- Lecture: TR 9:30 a.m. - 10:50 a.m. MSE 011
- Discussion: R 8:00 a.m. - 8:50 a.m. MSE 011
- Instructor: James M. Flegal, jflegal@ucr.edu
- Zoom Hours: F 9:00 a.m. - 10:00 a.m. (Meeting ID: 190-110-462)
- TA: Jinhui Yang, jyang065@ucr.edu
- Zoom Hours: M 4:00 p.m. - 5:00 p.m. (Meeting ID: 531-006-2824)
- Internet: iLearn STAT 206

This class is an introduction to statistical computing including statistical programming, simulation studies, smoothing and density estimation, generating random variables, optimization, Monte Carlo methods, Bootstrap, permutation methods, cross-validation.

Computational data analysis is an essential part of modern statistics. Competent statisticians must not just be able to run existing programs, but to understand the principles on which they work. They must also be able to read, modify and write code, so that they can assemble the computational tools needed to solve their data-analysis problems, rather than distorting problems to fit tools provided by others.

The class will be taught in the R language. Portions of the class may be *redundant* for those who know a lot about programming. The class will be *incomprehensible* for those who do not know statistics.

There will be a weekly in-class lab, homework nearly every week, and a final exam. Grades will be calculated as follows:

- Labs: 20%
- Homework: 40%
- Final exam: 40%

R is a free, open-source programming language for statistical computing. Almost all of our work in this class will be done using R. You will need regular, reliable access to a computer running an up-to-date version of R. If this is a problem, let me know right away.

RStudio is a free, open-source R programming environment. It contains a built-in code editor, many features to make working with R easier, and works the same way across different operating systems. Use of RStudio is strongly recommended.

All assignments must be turned in electronically and will involve writing a combination of code and actual prose. You must submit your assignment using **R Markdown**. Exceptions may be made, with prior permission, for those who want to use Sweave or knitr. (If you don’t know what those are, plan to use R Markdown.) Work submitted in any other format will receive an automatic grade of 0, without exceptions.

Every file you submit should have a file name which includes your first and last name, and clearly indicates the type of assignment (homework, lab, etc.) and its number. For example **JamesFlegal_Lab1.pdf**.

There will be a homework assignment nearly every week. Each homework will be graded out of three points: one point for making a good-faith effort at every part of the assignment; one point for technically-correct, working solutions to each part; and one point for clean, well-formatted, easily readable code.

There will be a 50 minute lab period every Thursday morning. The labs will be short exercises, generally related to that week’s homework. Attendance is mandatory for on campus students. Lab assignments are due the following Monday.

The final exam is open book/internet access, but absolutely no communicating with other humans.

Recommended books:

- Lawrence Leemis, Learning Base R with sample code

Other books that may be helpful:

- Geof Givens and Jennifer Hoeting Computational Statistics Second Edition
- Norman Matloff, The Art of R Programming: A Tour of Statistical Software Design
- Phil Spector, Data Manipulation with R
- Paul Teetor, The R Cookbook

There are many online resources for learning about it and working with it, in addition to the textbooks:

- The official intro, “An Introduction to R”, available online in HTML and PDF
- John Verzani, “simpleR”, in PDF
- Patrick Burns, The R Inferno. “If you are using R and you think you’re in hell, this is a map for you.”
- Cosma Shalizi and Andrew Thomas, Statistical Computing 36-350 at CMU

You are encouraged to discuss course material, including assignments, with your classmates. All work you turn in, however, must be your own. This includes both writing and code. Copying from other students, from books, or from websites (1) does nothing to help you learn how to program, (2) is easy for us to detect, and (3) has serious negative consequences.

Date | Lecture | Topic |
---|---|---|

9/26/19 | 1 | Introduction and Basics of Data, Slides and R Markdown |

10/1/19 | 2 | Bigger Data, Arrays, and Data Frames, Slides and R Markdown |

10/3/19 | 3 | Control Flow and Strings, Slides and R Markdown |

10/8/19 | 4 | Graphics, Slides and R Markdown |

10/10/19 | 5 | (Recorded Course) Writing functions, Slides and R Markdown, gmp.dat and gmp-2013.dat |

10/15/19 | 6 | Getting Data and Linear Models, Slides and R Markdown |

10/17/19 | 7 | Distributions, Slides and R Markdown |

10/22/19 | 8 | Optimization I, Slides and R Markdown |

10/24/19 | 9 | Optimization II, Slides and R Markdown |

10/30/19 | 10 | Simulations, Slides and R Markdown |

10/31/19 | 11 | Monte Carlo Methods, Slides and R Markdown |

11/5/19 | 12 | Bootstrap, Slides and R Markdown |

11/7/19 | 13 | Cross-Validation, Slides and R Markdown |

11/12/19 | 14 | Density Estimation, Slides and R Markdown |

11/14/19 | 15 | Bayesian Statistics, Slides and R Markdown |

11/19/19 | 16 | Markov Chain Monte Carlo I, Slides and R Markdown |

11/21/19 | 17 | Markov Chain Monte Carlo II, Slides and R Markdown |

11/26/19 | 18 | Permutation Tests, Slides and R Markdown |

11/28/19 | - | Thanksgiving Holiday |

12/3/19 | 19 | Databases, Slides and R Markdown; Final Exam Handout |

12/5/19 | 20 | TBD |

12/9/19 | Final Exam Due by 05:00 p.m. |