--- title: 'Control Flow' author: "James M. Flegal" output: ioslides_presentation: smaller: yes beamer_presentation: default --- ## Agenda - Control flow (or alternatively, flow of control) - if(), for(), and while() - Avoiding iteration - Introduction to strings and string operations ## Control flow *Control flow* is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated A *control flow statement* is a statement whose execution results in a choice being made as to which of two or more paths should be followed ## Conditionals Have the computer decide what to do next - Mathematically: \[ |x| = \left\{ \begin{array}{cl} x & \mathrm{if}~x\geq 0 \\ -x &\mathrm{if}~ x < 0\end{array}\right. ~,~ \psi(x) = \left\{ \begin{array}{cl} x^2 & \mathrm{if}~|x|\leq 1\\ 2|x|-1 &\mathrm{if}~ |x| > 1\end{array}\right. \] Exercise: plot $\psi$ in R - Computationally: ``` if the country code is not "US", multiply prices by current exchange rate ``` ## if() Simplest conditional: ``` if (x >= 0) { x } else { -x } ``` Condition in `if` needs to give _one_ `TRUE` or `FALSE` value `else` clause is optional one-line actions don't need braces ``` if (x >= 0) x else -x ``` ## if() `if` can *nest* arbitrarily deeply: ``` if (x^2 < 1) { x^2 } else { if (x >= 0) { 2*x-1 } else { -2*x-1 } } ``` Can get ugly though ## Combining Booleans `&` work `|` like `+` or `*`: combine terms element-wise Flow control wants *one* Boolean value, and to skip calculating what's not needed `&&` and `||` give _one_ Boolean, lazily: ```{r} (0 > 0) && (all.equal(42%%6, 169%%13)) ``` This *never* evaluates the complex expression on the right Use `&&` and `||` for control, `&` and `|` for subsetting ## Iteration Repeat similar actions multiple times: ```{r} table.of.logarithms <- vector(length=7,mode="numeric") table.of.logarithms for (i in 1:length(table.of.logarithms)) { table.of.logarithms[i] <- log(i) } table.of.logarithms ``` ## for() ``` for (i in 1:length(table.of.logarithms)) { table.of.logarithms[i] <- log(i) } ``` `for` increments a **counter** (here `i`) along a vector (here `1:length(table.of.logarithms)`) and **loops through** the **body* until it runs through the vector "**iterates over** the vector" Note, there is a better way to do this job! Can contain just about anything, including: - if() clauses - other for() loops (nested iteration) ## Nested iteration example ``` c <- matrix(0, nrow=nrow(a), ncol=ncol(b)) if (ncol(a) == nrow(b)) { for (i in 1:nrow(c)) { for (j in 1:ncol(c)) { for (k in 1:ncol(a)) { c[i,j] <- c[i,j] + a[i,k]*b[k,j] } } } } else { stop("matrices a and b non-conformable") } ``` ## while() ``` while (max(x) - 1 > 1e-06) { x <- sqrt(x) } ``` Condition in the argument to `while` must be a single Boolean value (like `if`) Body is looped over until the condition is `FALSE` so can loop forever Loop never begins unless the condition starts `TRUE` ## for() vs. while() for() is better when the number of times to repeat (values to iterate over) is clear in advance while() is better when you can recognize when to stop once you're there, even if you can't guess it to begin with Every for() could be replaced with a while() Exercise: show this ## Avoiding iteration R has many ways of _avoiding_ iteration, by acting on whole objects - It's conceptually clearer - It leads to simpler code - It's faster (sometimes a little, sometimes drastically) ## Vectorized arithmetic How many languages add 2 vectors: ``` c <- vector(length(a)) for (i in 1:length(a)) { c[i] <- a[i] + b[i] } ``` How R adds 2 vectors: ``` a+b ``` or a triple `for()` loop for matrix multiplication vs. `a %*% b` ## Advantages of vectorizing - Clarity: the syntax is about _what_ we're doing - Concision: we write less - Abstraction: the syntax hides _how the computer does it_ - Generality: same syntax works for numbers, vectors, arrays, ... - Speed: modifying big vectors over and over is slow in R; work gets done by optimized low-level code ## Vectorized calculations Many functions are set up to vectorize automatically ```{r} abs(-3:3) log(1:7) ``` See also `apply()` ## Vectorized conditions: ifelse() ``` ifelse(x^2 > 1, 2*abs(x)-1, x^2) ``` 1st argument is a Boolean vector, then pick from the 2nd or 3rd vector arguments as `TRUE` or `FALSE` ## What Is Truth? 0 counts as `FALSE`; other numeric values count as `TRUE`; the strings "TRUE" and "FALSE" count as you'd hope; most everything else gives an error Advice: Don't play games here; try to make sure control expressions are getting Boolean values Conversely, in arithmetic, `FALSE` is 0 and `TRUE` is 1 ```{r} library(datasets) states <- data.frame(state.x77, abb=state.abb, region=state.region, division=state.division) mean(states$Murder > 7) ``` ## switch() Simplify nested `if` with `switch()`: give a variable to select on, then a value for each option ``` switch(type.of.summary, mean=mean(states$Murder), median=median(states$Murder), histogram=hist(states$Murder), "I don't understand") ``` ## Exercise (off-line) Set `type.of.summary` to, succesively, "mean", "median", "histogram", and "mode", and explain what happens ## Unconditional iteration ``` repeat { print("Help! I am Dr. Morris Culpepper, trapped in an endless loop!") } ``` ## "Manual" control over iteration ``` repeat { if (watched) { next() } print("Help! I am Dr. Morris Culpepper, trapped in an endless loop!") if (rescued) { break() } } ``` `break()` exits the loop; `next()` skips the rest of the body and goes back into the loop both work with `for()` and `while()` as well Exercise: how would you replace `while()` with `repeat()`? ## Strings and string operations Most data we deal with is in character form! - web pages can be scraped - email can be analyzed for network properties - survey responses must be processed and compared Even if you only care about numbers, it helps to be able to extract them from text and manipulate them easily. ## Characters vs. Strings - ***Character***: a symbol in a written language, specifically what you can enter at a keyboard: letters, numerals, punctuation, space, newlines, etc. ``` 'L', 'i', 'n', 'c', 'o', 'l' ``` - ***String***: a sequence of characters bound together ``` Lincoln ``` Note: R does not have a separate type for characters and strings ```{r} mode("L") mode("Lincoln") class("Lincoln") ``` ## Making Strings Use single or double quotes to construct a string; use `nchar()` to get the length of a single string. Why do we prefer double quotes? ```{r} "Lincoln" "Abraham Lincoln" "Abraham Lincoln's Hat" "As Lincoln never said, \"Four score and seven beers ago\"" ``` ## Whitespace The space, `" "` is a character; so are multiple spaces `" "` and the empty string, `""`. Some characters are special, so we have "escape characters" to specify them in strings. - quotes within strings: `\"` - tab: `\t` - new line `\n` and carriage return `\r` -- use the former rather than the latter when possible ## Character data type One of the atomic data types, like `numeric` or `logical` Can go into scalars, vectors, arrays, lists, or be the type of a column in a data frame. ```{r} length("Abraham Lincoln's beard") length(c("Abraham", "Lincoln's", "beard")) nchar("Abraham Lincoln's beard") nchar(c("Abraham", "Lincoln's", "beard")) ``` ## Character-valued variables They work just like others, e.g., with vectors: ```{r} president <- "Lincoln" nchar(president) # NOT 9 presidents <- c("Fillmore","Pierce","Buchanan","Davis","Johnson") presidents[3] presidents[-(1:3)] ``` ## Displaying characters We know `print()`, of course; `cat()` writes the string directly to the console. If you're debugging, `message()` is R's preferred syntax. ```{r} print("Abraham Lincoln") cat("Abraham Lincoln") cat(presidents) message(presidents) ``` ## Substring operations ***Substring***: a smaller string from the big string, but still a string in its own right. A string is not a vector or a list, so we ***cannot*** use subscripts like `[[ ]]` or `[ ]` to extract substrings; we use `substr()` instead. ```{r} phrase <- "Christmas Bonus" substr (phrase, start=8, stop=12) ``` We can also use `substr` to replace elements: ```{r} substr(phrase, 13, 13) <- "g" phrase ``` ## substr() for string vectors `substr()` vectorizes over all its arguments: ```{r} presidents substr(presidents,1,2) # First two characters substr(presidents,nchar(presidents)-1,nchar(presidents)) # Last two substr(presidents,20,21) # No such substrings so return the null string substr(presidents,7,7) # Explain! ``` ## Dividing strings into vectors `strsplit()` divides a string according to key characters, by splitting each element of the character vector `x` at appearances of the pattern `split`. ```{r} scarborough.fair <- "parsley, sage, rosemary, thyme" strsplit (scarborough.fair, ",") strsplit (scarborough.fair, ", ") ``` Pattern is recycled over elements of the input vector: ```{r} strsplit (c(scarborough.fair, "Garfunkel, Oates", "Clement, McKenzie"), ", ") ``` Note that it outputs a `list` of character vectors -- why should this be the default? ## Combining vectors into strings Converting one variable type to another is called ***casting***: ```{r} as.character(7.2) # Obvious as.character(7.2e12) # Obvious as.character(c(7.2,7.2e12)) # Obvious as.character(7.2e5) # Not quite so obvious ``` ## Building strings from multiple parts The `paste()` function is very flexible! With one vector argument, works like `as.character()`: ```{r} paste(41:45) ``` ## Building strings from multiple parts With 2 or more vector arguments, combines them with recycling: ```{r} paste(presidents,41:45) paste(presidents,c("R","D")) # Not historically accurate! paste(presidents,"(",c("R","D"),41:45,")") ``` ## Building strings from multiple parts Changing the separator between pasted-together terms: ```{r} paste(presidents, " (", 41:45, ")", sep="_") paste(presidents, " (", 41:45, ")", sep="") ``` Exercise: what happens if you give `sep` a vector? ## More complicated example of recycling Exercise: Convince yourself of why this works as it does ```{r} paste(c("HW","Lab"),rep(1:11,times=rep(2,11))) ``` ## Condensing multiple strings Producing one big string: ```{r} paste(presidents, " (", 41:45, ")", sep="", collapse="; ") ``` Default value of `collapse` is `NULL` -- that is, it won't use it ## Function for writing regression formulas R has a standard syntax for models: outcome and predictors. ```{r} my.formula <- function(dep,indeps,df) { rhs <- paste(colnames(df)[indeps], collapse="+") return(paste(colnames(df)[dep], " ~ ", rhs, collapse="")) } my.formula(2,c(3,5,7),df=state.x77) ``` ## General search - Use `grep()` to find which strings have a matching search term - Reconstituting, make one long string, then split the words - Counting words with `table()` - Need to learn how to work with text patterns and not just constants - Searching for text patterns using regular expressions ## Summary - `if`, nested `if`, `switch` - Iteration: `for`, `while` - Avoiding iteration with whole-object ("vectorized") operations - Text is data, just like everything else