--- title: 'Control Flow' author: "James M. Flegal" output: ioslides_presentation: smaller: yes beamer_presentation: default --- ## Agenda - Control flow (or alternatively, flow of control) - if(), for(), and while() - Avoiding iteration - Introduction to strings and string operations ## Control flow *Control flow* is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated A *control flow statement* is a statement whose execution results in a choice being made as to which of two or more paths should be followed ## Conditionals Have the computer decide what to do next - Mathematically: $|x| = \left\{ \begin{array}{cl} x & \mathrm{if}~x\geq 0 \\ -x &\mathrm{if}~ x < 0\end{array}\right. ~,~ \psi(x) = \left\{ \begin{array}{cl} x^2 & \mathrm{if}~|x|\leq 1\\ 2|x|-1 &\mathrm{if}~ |x| > 1\end{array}\right.$ Exercise: plot $\psi$ in R - Computationally:  if the country code is not "US", multiply prices by current exchange rate  ## if() Simplest conditional:  if (x >= 0) { x } else { -x }  Condition in if needs to give _one_ TRUE or FALSE value else clause is optional one-line actions don't need braces  if (x >= 0) x else -x  ## if() if can *nest* arbitrarily deeply:  if (x^2 < 1) { x^2 } else { if (x >= 0) { 2*x-1 } else { -2*x-1 } }  Can get ugly though ## Combining Booleans & work | like + or *: combine terms element-wise Flow control wants *one* Boolean value, and to skip calculating what's not needed && and || give _one_ Boolean, lazily: {r} (0 > 0) && (all.equal(42%%6, 169%%13))  This *never* evaluates the complex expression on the right Use && and || for control, & and | for subsetting ## Iteration Repeat similar actions multiple times: {r} table.of.logarithms <- vector(length=7,mode="numeric") table.of.logarithms for (i in 1:length(table.of.logarithms)) { table.of.logarithms[i] <- log(i) } table.of.logarithms  ## for()  for (i in 1:length(table.of.logarithms)) { table.of.logarithms[i] <- log(i) }  for increments a **counter** (here i) along a vector (here 1:length(table.of.logarithms)) and **loops through** the **body* until it runs through the vector "**iterates over** the vector" Note, there is a better way to do this job! Can contain just about anything, including: - if() clauses - other for() loops (nested iteration) ## Nested iteration example  c <- matrix(0, nrow=nrow(a), ncol=ncol(b)) if (ncol(a) == nrow(b)) { for (i in 1:nrow(c)) { for (j in 1:ncol(c)) { for (k in 1:ncol(a)) { c[i,j] <- c[i,j] + a[i,k]*b[k,j] } } } } else { stop("matrices a and b non-conformable") }  ## while()  while (max(x) - 1 > 1e-06) { x <- sqrt(x) }  Condition in the argument to while must be a single Boolean value (like if) Body is looped over until the condition is FALSE so can loop forever Loop never begins unless the condition starts TRUE ## for() vs. while() for() is better when the number of times to repeat (values to iterate over) is clear in advance while() is better when you can recognize when to stop once you're there, even if you can't guess it to begin with Every for() could be replaced with a while() Exercise: show this ## Avoiding iteration R has many ways of _avoiding_ iteration, by acting on whole objects - It's conceptually clearer - It leads to simpler code - It's faster (sometimes a little, sometimes drastically) ## Vectorized arithmetic How many languages add 2 vectors:  c <- vector(length(a)) for (i in 1:length(a)) { c[i] <- a[i] + b[i] }  How R adds 2 vectors:  a+b  or a triple for() loop for matrix multiplication vs. a %*% b ## Advantages of vectorizing - Clarity: the syntax is about _what_ we're doing - Concision: we write less - Abstraction: the syntax hides _how the computer does it_ - Generality: same syntax works for numbers, vectors, arrays, ... - Speed: modifying big vectors over and over is slow in R; work gets done by optimized low-level code ## Vectorized calculations Many functions are set up to vectorize automatically {r} abs(-3:3) log(1:7)  See also apply() ## Vectorized conditions: ifelse()  ifelse(x^2 > 1, 2*abs(x)-1, x^2)  1st argument is a Boolean vector, then pick from the 2nd or 3rd vector arguments as TRUE or FALSE ## What Is Truth? 0 counts as FALSE; other numeric values count as TRUE; the strings "TRUE" and "FALSE" count as you'd hope; most everything else gives an error Advice: Don't play games here; try to make sure control expressions are getting Boolean values Conversely, in arithmetic, FALSE is 0 and TRUE is 1 {r} library(datasets) states <- data.frame(state.x77, abb=state.abb, region=state.region, division=state.division) mean(states$Murder > 7)  ## switch() Simplify nested if with switch(): give a variable to select on, then a value for each option  switch(type.of.summary, mean=mean(states$Murder), median=median(states$Murder), histogram=hist(states$Murder), "I don't understand")  ## Exercise (off-line) Set type.of.summary to, succesively, "mean", "median", "histogram", and "mode", and explain what happens ## Unconditional iteration  repeat { print("Help! I am Dr. Morris Culpepper, trapped in an endless loop!") }  ## "Manual" control over iteration  repeat { if (watched) { next() } print("Help! I am Dr. Morris Culpepper, trapped in an endless loop!") if (rescued) { break() } }  break() exits the loop; next() skips the rest of the body and goes back into the loop both work with for() and while() as well Exercise: how would you replace while() with repeat()? ## Strings and string operations Most data we deal with is in character form! - web pages can be scraped - email can be analyzed for network properties - survey responses must be processed and compared Even if you only care about numbers, it helps to be able to extract them from text and manipulate them easily. ## Characters vs. Strings - ***Character***: a symbol in a written language, specifically what you can enter at a keyboard: letters, numerals, punctuation, space, newlines, etc.  'L', 'i', 'n', 'c', 'o', 'l'  - ***String***: a sequence of characters bound together  Lincoln  Note: R does not have a separate type for characters and strings {r} mode("L") mode("Lincoln") class("Lincoln")  ## Making Strings Use single or double quotes to construct a string; use nchar() to get the length of a single string. Why do we prefer double quotes? {r} "Lincoln" "Abraham Lincoln" "Abraham Lincoln's Hat" "As Lincoln never said, \"Four score and seven beers ago\""  ## Whitespace The space, " " is a character; so are multiple spaces " " and the empty string, "". Some characters are special, so we have "escape characters" to specify them in strings. - quotes within strings: \" - tab: \t - new line \n and carriage return \r -- use the former rather than the latter when possible ## Character data type One of the atomic data types, like numeric or logical Can go into scalars, vectors, arrays, lists, or be the type of a column in a data frame. {r} length("Abraham Lincoln's beard") length(c("Abraham", "Lincoln's", "beard")) nchar("Abraham Lincoln's beard") nchar(c("Abraham", "Lincoln's", "beard"))  ## Character-valued variables They work just like others, e.g., with vectors: {r} president <- "Lincoln" nchar(president) # NOT 9 presidents <- c("Fillmore","Pierce","Buchanan","Davis","Johnson") presidents[3] presidents[-(1:3)]  ## Displaying characters We know print(), of course; cat() writes the string directly to the console. If you're debugging, message() is R's preferred syntax. {r} print("Abraham Lincoln") cat("Abraham Lincoln") cat(presidents) message(presidents)  ## Substring operations ***Substring***: a smaller string from the big string, but still a string in its own right. A string is not a vector or a list, so we ***cannot*** use subscripts like [[ ]] or [ ] to extract substrings; we use substr() instead. {r} phrase <- "Christmas Bonus" substr (phrase, start=8, stop=12)  We can also use substr to replace elements: {r} substr(phrase, 13, 13) <- "g" phrase  ## substr() for string vectors substr() vectorizes over all its arguments: {r} presidents substr(presidents,1,2) # First two characters substr(presidents,nchar(presidents)-1,nchar(presidents)) # Last two substr(presidents,20,21) # No such substrings so return the null string substr(presidents,7,7) # Explain!  ## Dividing strings into vectors strsplit() divides a string according to key characters, by splitting each element of the character vector x at appearances of the pattern split. {r} scarborough.fair <- "parsley, sage, rosemary, thyme" strsplit (scarborough.fair, ",") strsplit (scarborough.fair, ", ")  Pattern is recycled over elements of the input vector: {r} strsplit (c(scarborough.fair, "Garfunkel, Oates", "Clement, McKenzie"), ", ")  Note that it outputs a list of character vectors -- why should this be the default? ## Combining vectors into strings Converting one variable type to another is called ***casting***: {r} as.character(7.2) # Obvious as.character(7.2e12) # Obvious as.character(c(7.2,7.2e12)) # Obvious as.character(7.2e5) # Not quite so obvious  ## Building strings from multiple parts The paste() function is very flexible! With one vector argument, works like as.character(): {r} paste(41:45)  ## Building strings from multiple parts With 2 or more vector arguments, combines them with recycling: {r} paste(presidents,41:45) paste(presidents,c("R","D")) # Not historically accurate! paste(presidents,"(",c("R","D"),41:45,")")  ## Building strings from multiple parts Changing the separator between pasted-together terms: {r} paste(presidents, " (", 41:45, ")", sep="_") paste(presidents, " (", 41:45, ")", sep="")  Exercise: what happens if you give sep a vector? ## More complicated example of recycling Exercise: Convince yourself of why this works as it does {r} paste(c("HW","Lab"),rep(1:11,times=rep(2,11)))  ## Condensing multiple strings Producing one big string: {r} paste(presidents, " (", 41:45, ")", sep="", collapse="; ")  Default value of collapse is NULL -- that is, it won't use it ## Function for writing regression formulas R has a standard syntax for models: outcome and predictors. {r} my.formula <- function(dep,indeps,df) { rhs <- paste(colnames(df)[indeps], collapse="+") return(paste(colnames(df)[dep], " ~ ", rhs, collapse="")) } my.formula(2,c(3,5,7),df=state.x77)  ## General search - Use grep() to find which strings have a matching search term - Reconstituting, make one long string, then split the words - Counting words with table() - Need to learn how to work with text patterns and not just constants - Searching for text patterns using regular expressions ## Summary - if, nested if, switch - Iteration: for, while - Avoiding iteration with whole-object ("vectorized") operations - Text is data, just like everything else