Description

This class is an introduction to statistical computing including statistical programming, simulation studies, smoothing and density estimation, generating random variables, optimization, Monte Carlo methods, Bootstrap, permutation methods, cross-validation.

Computational data analysis is an essential part of modern statistics. Competent statisticians must not just be able to run existing programs, but to understand the principles on which they work. They must also be able to read, modify and write code, so that they can assemble the computational tools needed to solve their data-analysis problems, rather than distorting problems to fit tools provided by others.

The class will be taught in the R language. Portions of the class may be redundant for those who know a lot about programming. The class will be incomprehensible for those who do not know statistics.

Course Mechanics and Grading

There will be a weekly in-class lab, homework nearly every week, and a final exam. Grades will be calculated as follows:

R and RStudio

R is a free, open-source programming language for statistical computing. Almost all of our work in this class will be done using R. You will need regular, reliable access to a computer running an up-to-date version of R. If this is a problem, let me know right away.

RStudio is a free, open-source R programming environment. It contains a built-in code editor, many features to make working with R easier, and works the same way across different operating systems. Use of RStudio is strongly recommended.

Assignment Formatting

All assignments must be turned in electronically and will involve writing a combination of code and actual prose. You must submit your assignment using R Markdown. Exceptions may be made, with prior permission, for those who want to use Sweave or knitr. (If you don’t know what those are, plan to use R Markdown.) Work submitted in any other format will receive an automatic grade of 0, without exceptions.

Every file you submit should have a file name which includes your first and last name, and clearly indicates the type of assignment (homework, lab, etc.) and its number. For example JamesFlegal_Lab1.pdf.

Homework

There will be a homework assignment nearly every week. Each homework will be graded out of three points: one point for making a good-faith effort at every part of the assignment; one point for technically-correct, working solutions to each part; and one point for clean, well-formatted, easily readable code.

Labs

There will be a 50 minute lab period every Thursday morning. The labs will be short exercises, generally related to that week’s homework. Attendance is mandatory for on campus students. Lab assignments are due the following Monday.

Final Exam

The final exam is open book/internet access, but absolutely no communicating with other humans.

Textbooks

Recommended books:

Other books that may be helpful:

Some R Resources

There are many online resources for learning about it and working with it, in addition to the textbooks:

  • The official intro, “An Introduction to R”, available online in HTML and PDF
  • John Verzani, “simpleR”, in PDF
  • Patrick Burns, The R Inferno. “If you are using R and you think you’re in hell, this is a map for you.”
  • Cosma Shalizi and Andrew Thomas, Statistical Computing 36-350 at CMU

Collaboration, Copying and Plagiarism

You are encouraged to discuss course material, including assignments, with your classmates. All work you turn in, however, must be your own. This includes both writing and code. Copying from other students, from books, or from websites (1) does nothing to help you learn how to program, (2) is easy for us to detect, and (3) has serious negative consequences.

Calendar and topics

Date Lecture Topic
9/26/19 1 Introduction and Basics of Data, Slides and R Markdown
10/1/19 2 Bigger Data, Arrays, and Data Frames, Slides and R Markdown
10/3/19 3 Control Flow and Strings, Slides and R Markdown
10/8/19 4 Graphics, Slides and R Markdown
10/10/19 5 (Recorded Course) Writing functions, Slides and R Markdown, gmp.dat and gmp-2013.dat
10/15/19 6 Getting Data and Linear Models, Slides and R Markdown
10/17/19 7 Distributions, Slides and R Markdown
10/22/19 8 Optimization I, Slides and R Markdown
10/24/19 9 Optimization II, Slides and R Markdown
10/30/19 10 Simulations, Slides and R Markdown
10/31/19 11 Monte Carlo Methods, Slides and R Markdown
11/5/19 12 Bootstrap, Slides and R Markdown
11/7/19 13 Cross-Validation, Slides and R Markdown
11/12/19 14 Density Estimation, Slides and R Markdown
11/14/19 15 Bayesian Statistics, Slides and R Markdown
11/19/19 16 Markov Chain Monte Carlo I, Slides and R Markdown
11/21/19 17 Markov Chain Monte Carlo II, Slides and R Markdown
11/26/19 18 Permutation Tests, Slides and R Markdown
11/28/19 - Thanksgiving Holiday
12/3/19 19 Databases, Slides and R Markdown; Final Exam Handout
12/5/19 20 TBD
12/9/19 Final Exam Due by 05:00 p.m.