This class is an introduction to statistical computing including statistical programming, simulation studies, smoothing and density estimation, generating random variables, optimization, Monte Carlo methods, Bootstrap, permutation methods, cross-validation.
Computational data analysis is an essential part of modern statistics. Competent statisticians must not just be able to run existing programs, but to understand the principles on which they work. They must also be able to read, modify and write code, so that they can assemble the computational tools needed to solve their data-analysis problems, rather than distorting problems to fit tools provided by others.
The class will be taught in the R language. Portions of the class may be redundant for those who know a lot about programming. The class will be incomprehensible for those who do not know statistics.
There will be a weekly in-class lab, homework nearly every week, and a final exam. Grades will be calculated as follows:
R is a free, open-source programming language for statistical computing. Almost all of our work in this class will be done using R. You will need regular, reliable access to a computer running an up-to-date version of R. If this is a problem, let me know right away.
RStudio is a free, open-source R programming environment. It contains a built-in code editor, many features to make working with R easier, and works the same way across different operating systems. Use of RStudio is strongly recommended.
All assignments must be turned in electronically and will involve writing a combination of code and actual prose. You must submit your assignment using R Markdown. Exceptions may be made, with prior permission, for those who want to use Sweave or knitr. (If you don’t know what those are, plan to use R Markdown.) Work submitted in any other format will receive an automatic grade of 0, without exceptions.
Every file you submit should have a file name which includes your first and last name, and clearly indicates the type of assignment (homework, lab, etc.) and its number. For example JamesFlegal_Lab1.pdf.
There will be a homework assignment nearly every week. Each homework will be graded out of three points: one point for making a good-faith effort at every part of the assignment; one point for technically-correct, working solutions to each part; and one point for clean, well-formatted, easily readable code.
There will be a 50 minute lab period every Thursday morning. The labs will be short exercises, generally related to that week’s homework. Attendance is mandatory for on campus students. Lab assignments are due the following Monday.
The final exam is open book/internet access, but absolutely no communicating with other humans.
Recommended books:
Other books that may be helpful:
There are many online resources for learning about it and working with it, in addition to the textbooks:
You are encouraged to discuss course material, including assignments, with your classmates. All work you turn in, however, must be your own. This includes both writing and code. Copying from other students, from books, or from websites (1) does nothing to help you learn how to program, (2) is easy for us to detect, and (3) has serious negative consequences.
Date | Lecture | Topic |
---|---|---|
9/26/19 | 1 | Introduction and Basics of Data, Slides and R Markdown |
10/1/19 | 2 | Bigger Data, Arrays, and Data Frames, Slides and R Markdown |
10/3/19 | 3 | Control Flow and Strings, Slides and R Markdown |
10/8/19 | 4 | Graphics, Slides and R Markdown |
10/10/19 | 5 | (Recorded Course) Writing functions, Slides and R Markdown, gmp.dat and gmp-2013.dat |
10/15/19 | 6 | Getting Data and Linear Models, Slides and R Markdown |
10/17/19 | 7 | Distributions, Slides and R Markdown |
10/22/19 | 8 | Optimization I, Slides and R Markdown |
10/24/19 | 9 | Optimization II, Slides and R Markdown |
10/30/19 | 10 | Simulations, Slides and R Markdown |
10/31/19 | 11 | Monte Carlo Methods, Slides and R Markdown |
11/5/19 | 12 | Bootstrap, Slides and R Markdown |
11/7/19 | 13 | Cross-Validation, Slides and R Markdown |
11/12/19 | 14 | Density Estimation, Slides and R Markdown |
11/14/19 | 15 | Bayesian Statistics, Slides and R Markdown |
11/19/19 | 16 | Markov Chain Monte Carlo I, Slides and R Markdown |
11/21/19 | 17 | Markov Chain Monte Carlo II, Slides and R Markdown |
11/26/19 | 18 | Permutation Tests, Slides and R Markdown |
11/28/19 | - | Thanksgiving Holiday |
12/3/19 | 19 | Databases, Slides and R Markdown; Final Exam Handout |
12/5/19 | 20 | TBD |
12/9/19 | Final Exam Due by 05:00 p.m. |