- Control flow (or alternatively, flow of control)
- if(), for(), and while()
- Avoiding iteration
- Introduction to strings and string operations

- Control flow (or alternatively, flow of control)
- if(), for(), and while()
- Avoiding iteration
- Introduction to strings and string operations

*Control flow* is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated

A *control flow statement* is a statement whose execution results in a choice being made as to which of two or more paths should be followed

Have the computer decide what to do next

Mathematically: \[ |x| = \left\{ \begin{array}{cl} x & \mathrm{if}~x\geq 0 \\ -x &\mathrm{if}~ x < 0\end{array}\right. ~,~ \psi(x) = \left\{ \begin{array}{cl} x^2 & \mathrm{if}~|x|\leq 1\\ 2|x|-1 &\mathrm{if}~ |x| > 1\end{array}\right. \]

Exercise: plot \(\psi\) in RComputationally:

if the country code is not "US", multiply prices by current exchange rate

Simplest conditional:

if (x >= 0) { x } else { -x }

Condition in `if`

needs to give *one* `TRUE`

or `FALSE`

value

`else`

clause is optional

one-line actions don't need braces

if (x >= 0) x else -x

`if`

can *nest* arbitrarily deeply:

if (x^2 < 1) { x^2 } else { if (x >= 0) { 2*x-1 } else { -2*x-1 } }

Can get ugly though

`&`

work `|`

like `+`

or `*`

: combine terms element-wise

Flow control wants *one* Boolean value, and to skip calculating what's not needed

`&&`

and `||`

give *one* Boolean, lazily:

(0 > 0) && (all.equal(42%%6, 169%%13))

## [1] FALSE

This *never* evaluates the complex expression on the right

Use `&&`

and `||`

for control, `&`

and `|`

for subsetting

Repeat similar actions multiple times:

table.of.logarithms <- vector(length=7,mode="numeric") table.of.logarithms

## [1] 0 0 0 0 0 0 0

for (i in 1:length(table.of.logarithms)) { table.of.logarithms[i] <- log(i) } table.of.logarithms

## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101

for (i in 1:length(table.of.logarithms)) { table.of.logarithms[i] <- log(i) }

`for`

increments a **counter** (here `i`

) along a vector (here `1:length(table.of.logarithms)`

) and **loops through** the **body* until it runs through the vector

"**iterates over** the vector"

Note, there is a better way to do this job!

Can contain just about anything, including:

- if() clauses
- other for() loops (nested iteration)

c <- matrix(0, nrow=nrow(a), ncol=ncol(b)) if (ncol(a) == nrow(b)) { for (i in 1:nrow(c)) { for (j in 1:ncol(c)) { for (k in 1:ncol(a)) { c[i,j] <- c[i,j] + a[i,k]*b[k,j] } } } } else { stop("matrices a and b non-conformable") }

while (max(x) - 1 > 1e-06) { x <- sqrt(x) }

Condition in the argument to `while`

must be a single Boolean value (like `if`

)

Body is looped over until the condition is `FALSE`

so can loop forever

Loop never begins unless the condition starts `TRUE`

for() is better when the number of times to repeat (values to iterate over) is clear in advance

while() is better when you can recognize when to stop once you're there, even if you can't guess it to begin with

Every for() could be replaced with a while()

Exercise: show this

R has many ways of *avoiding* iteration, by acting on whole objects

- It's conceptually clearer
- It leads to simpler code
- It's faster (sometimes a little, sometimes drastically)

How many languages add 2 vectors:

c <- vector(length(a)) for (i in 1:length(a)) { c[i] <- a[i] + b[i] }

How R adds 2 vectors:

a+b

or a triple `for()`

loop for matrix multiplication vs. `a %*% b`

- Clarity: the syntax is about
*what*we're doing - Concision: we write less
- Abstraction: the syntax hides
*how the computer does it* - Generality: same syntax works for numbers, vectors, arrays, …
- Speed: modifying big vectors over and over is slow in R; work gets done by optimized low-level code

Many functions are set up to vectorize automatically

abs(-3:3)

## [1] 3 2 1 0 1 2 3

log(1:7)

## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101

See also `apply()`

ifelse(x^2 > 1, 2*abs(x)-1, x^2)

1st argument is a Boolean vector, then pick from the 2nd or 3rd vector arguments as `TRUE`

or `FALSE`

0 counts as `FALSE`

; other numeric values count as `TRUE`

; the strings "TRUE" and "FALSE" count as you'd hope; most everything else gives an error

Advice: Don't play games here; try to make sure control expressions are getting Boolean values

Conversely, in arithmetic, `FALSE`

is 0 and `TRUE`

is 1

library(datasets) states <- data.frame(state.x77, abb=state.abb, region=state.region, division=state.division) mean(states$Murder > 7)

## [1] 0.48

Simplify nested `if`

with `switch()`

: give a variable to select on, then a value for each option

switch(type.of.summary, mean=mean(states$Murder), median=median(states$Murder), histogram=hist(states$Murder), "I don't understand")

Set `type.of.summary`

to, succesively, "mean", "median", "histogram", and "mode", and explain what happens

repeat { print("Help! I am Dr. Morris Culpepper, trapped in an endless loop!") }

repeat { if (watched) { next() } print("Help! I am Dr. Morris Culpepper, trapped in an endless loop!") if (rescued) { break() } }

`break()`

exits the loop; `next()`

skips the rest of the body and goes back into the loop

both work with `for()`

and `while()`

as well

Exercise: how would you replace `while()`

with `repeat()`

?

Most data we deal with is in character form!

- web pages can be scraped
- email can be analyzed for network properties
- survey responses must be processed and compared

Even if you only care about numbers, it helps to be able to extract them from text and manipulate them easily.

: a symbol in a written language, specifically what you can enter at a keyboard: letters, numerals, punctuation, space, newlines, etc.*Character*

'L', 'i', 'n', 'c', 'o', 'l'

: a sequence of characters bound together*String*Lincoln

Note: R does not have a separate type for characters and strings

mode("L")

## [1] "character"

mode("Lincoln")

## [1] "character"

class("Lincoln")

## [1] "character"

Use single or double quotes to construct a string; use `nchar()`

to get the length of a single string. Why do we prefer double quotes?

"Lincoln"

## [1] "Lincoln"

"Abraham Lincoln"

## [1] "Abraham Lincoln"

"Abraham Lincoln's Hat"

## [1] "Abraham Lincoln's Hat"

"As Lincoln never said, \"Four score and seven beers ago\""

## [1] "As Lincoln never said, \"Four score and seven beers ago\""

The space, `" "`

is a character; so are multiple spaces `" "`

and the empty string, `""`

.

Some characters are special, so we have "escape characters" to specify them in strings. - quotes within strings: `\"`

- tab: `\t`

- new line `\n`

and carriage return `\r`

– use the former rather than the latter when possible

One of the atomic data types, like `numeric`

or `logical`

Can go into scalars, vectors, arrays, lists, or be the type of a column in a data frame.

length("Abraham Lincoln's beard")

## [1] 1

length(c("Abraham", "Lincoln's", "beard"))

## [1] 3

nchar("Abraham Lincoln's beard")

## [1] 23

nchar(c("Abraham", "Lincoln's", "beard"))

## [1] 7 9 5

They work just like others, e.g., with vectors:

president <- "Lincoln" nchar(president) # NOT 9

## [1] 7

presidents <- c("Fillmore","Pierce","Buchanan","Davis","Johnson") presidents[3]

## [1] "Buchanan"

presidents[-(1:3)]

## [1] "Davis" "Johnson"

We know `print()`

, of course; `cat()`

writes the string directly to the console. If you're debugging, `message()`

is R's preferred syntax.

print("Abraham Lincoln")

## [1] "Abraham Lincoln"

cat("Abraham Lincoln")

## Abraham Lincoln

cat(presidents)

## Fillmore Pierce Buchanan Davis Johnson

message(presidents)

## FillmorePierceBuchananDavisJohnson

** Substring**: a smaller string from the big string, but still a string in its own right.

A string is not a vector or a list, so we ** cannot** use subscripts like

`[[ ]]`

or `[ ]`

to extract substrings; we use `substr()`

instead.phrase <- "Christmas Bonus" substr (phrase, start=8, stop=12)

## [1] "as Bo"

We can also use `substr`

to replace elements:

substr(phrase, 13, 13) <- "g" phrase

## [1] "Christmas Bogus"

`substr()`

vectorizes over all its arguments:

presidents

## [1] "Fillmore" "Pierce" "Buchanan" "Davis" "Johnson"

substr(presidents,1,2) # First two characters

## [1] "Fi" "Pi" "Bu" "Da" "Jo"

substr(presidents,nchar(presidents)-1,nchar(presidents)) # Last two

## [1] "re" "ce" "an" "is" "on"

substr(presidents,20,21) # No such substrings so return the null string

## [1] "" "" "" "" ""

substr(presidents,7,7) # Explain!

## [1] "r" "" "a" "" "n"

`strsplit()`

divides a string according to key characters, by splitting each element of the character vector `x`

at appearances of the pattern `split`

.

scarborough.fair <- "parsley, sage, rosemary, thyme" strsplit (scarborough.fair, ",")

## [[1]] ## [1] "parsley" " sage" " rosemary" " thyme"

strsplit (scarborough.fair, ", ")

## [[1]] ## [1] "parsley" "sage" "rosemary" "thyme"

Pattern is recycled over elements of the input vector:

strsplit (c(scarborough.fair, "Garfunkel, Oates", "Clement, McKenzie"), ", ")

## [[1]] ## [1] "parsley" "sage" "rosemary" "thyme" ## ## [[2]] ## [1] "Garfunkel" "Oates" ## ## [[3]] ## [1] "Clement" "McKenzie"

Note that it outputs a `list`

of character vectors – why should this be the default?

Converting one variable type to another is called ** casting**:

as.character(7.2) # Obvious

## [1] "7.2"

as.character(7.2e12) # Obvious

## [1] "7.2e+12"

as.character(c(7.2,7.2e12)) # Obvious

## [1] "7.2" "7.2e+12"

as.character(7.2e5) # Not quite so obvious

## [1] "720000"

The `paste()`

function is very flexible!

With one vector argument, works like `as.character()`

:

paste(41:45)

## [1] "41" "42" "43" "44" "45"

With 2 or more vector arguments, combines them with recycling:

paste(presidents,41:45)

## [1] "Fillmore 41" "Pierce 42" "Buchanan 43" "Davis 44" "Johnson 45"

paste(presidents,c("R","D")) # Not historically accurate!

## [1] "Fillmore R" "Pierce D" "Buchanan R" "Davis D" "Johnson R"

paste(presidents,"(",c("R","D"),41:45,")")

## [1] "Fillmore ( R 41 )" "Pierce ( D 42 )" "Buchanan ( R 43 )" ## [4] "Davis ( D 44 )" "Johnson ( R 45 )"

Changing the separator between pasted-together terms:

paste(presidents, " (", 41:45, ")", sep="_")

## [1] "Fillmore_ (_41_)" "Pierce_ (_42_)" "Buchanan_ (_43_)" ## [4] "Davis_ (_44_)" "Johnson_ (_45_)"

paste(presidents, " (", 41:45, ")", sep="")

## [1] "Fillmore (41)" "Pierce (42)" "Buchanan (43)" "Davis (44)" ## [5] "Johnson (45)"

Exercise: what happens if you give `sep`

a vector?

Exercise: Convince yourself of why this works as it does

paste(c("HW","Lab"),rep(1:11,times=rep(2,11)))

## [1] "HW 1" "Lab 1" "HW 2" "Lab 2" "HW 3" "Lab 3" "HW 4" ## [8] "Lab 4" "HW 5" "Lab 5" "HW 6" "Lab 6" "HW 7" "Lab 7" ## [15] "HW 8" "Lab 8" "HW 9" "Lab 9" "HW 10" "Lab 10" "HW 11" ## [22] "Lab 11"

Producing one big string:

paste(presidents, " (", 41:45, ")", sep="", collapse="; ")

## [1] "Fillmore (41); Pierce (42); Buchanan (43); Davis (44); Johnson (45)"

Default value of `collapse`

is `NULL`

– that is, it won't use it

R has a standard syntax for models: outcome and predictors.

my.formula <- function(dep,indeps,df) { rhs <- paste(colnames(df)[indeps], collapse="+") return(paste(colnames(df)[dep], " ~ ", rhs, collapse="")) } my.formula(2,c(3,5,7),df=state.x77)

## [1] "Income ~ Illiteracy+Murder+Frost"

- Use
`grep()`

to find which strings have a matching search term - Reconstituting, make one long string, then split the words
- Counting words with
`table()`

- Need to learn how to work with text patterns and not just constants
- Searching for text patterns using regular expressions

`if`

, nested`if`

,`switch`

- Iteration:
`for`

,`while`

- Avoiding iteration with whole-object ("vectorized") operations
- Text is data, just like everything else