## Agenda

• Defining functions: Tying related commands into bundles
• Interfaces: Controlling what the function can see and do
• Example: Parameter estimation code
• Multiple functions
• Recursion: Making hard problems simpler

## Why Functions?

• Data structures tie related values into one object

• Functions tie related commands into one object

• In both cases: easier to understand, easier to work with, easier to build into larger things

## Example cubic function

cube <- function(x) x ^ 3
cube
## function(x) x ^ 3
cube(3)
## [1] 27
cube(1:10)
##  [1]    1    8   27   64  125  216  343  512  729 1000
cube(matrix(1:8, 2, 4))
##      [,1] [,2] [,3] [,4]
## [1,]    1   27  125  343
## [2,]    8   64  216  512
matrix(cube(1:8), 2, 4)
##      [,1] [,2] [,3] [,4]
## [1,]    1   27  125  343
## [2,]    8   64  216  512
# cube(array(1:24, c(2, 3, 4))) # cube each element in an array
mode(cube)
## [1] "function"

## Example

# "Robust" loss function, for outlier-resistant regression
# Inputs: vector of numbers (x)
# Outputs: vector with x^2 for small entries, 2|x|-1 for large ones
psi.1 <- function(x) {
psi <- ifelse(x^2 > 1, 2*abs(x)-1, x^2)
return(psi)
}

Our functions get used just like the built-in ones:

z <- c(-0.5,-5,0.9,9)
psi.1(z)
## [1]  0.25  9.00  0.81 17.00

Go back to the declaration and look at the parts:

# "Robust" loss function, for outlier-resistant regression
# Inputs: vector of numbers (x)
# Outputs: vector with x^2 for small entries, |x| for large ones
psi.1 <- function(x) {
psi <- ifelse(x^2 > 1, 2*abs(x)-1, x^2)
return(psi)
}

Interfaces: the inputs or arguments; the outputs or return value

Calls other functions ifelse(), abs(), operators ^ and >, and could also call other functions we've written

return() says what the output is; alternately, return the last evaluation

Comments: Not required by R, but a good idea

## What should be a function?

• Things you're going to re-run, especially if it will be re-run with changes
• Chunks of code you keep highlighting and hitting return on
• Chunks of code which are small parts of bigger analyses
• Chunks which are very similar to other chunks

## Named and default arguments

# "Robust" loss function, for outlier-resistant regression
# Inputs: vector of numbers (x), scale for crossover (c)
# Outputs: vector with x^2 for small entries, 2c|x|-c^2 for large ones
psi.2 <- function(x,c=1) {
psi <- ifelse(x^2 > c^2, 2*c*abs(x)-c^2, x^2)
return(psi)
}
identical(psi.1(z), psi.2(z,c=1))
## [1] TRUE

Default values get used if names are missing:

identical(psi.2(z,c=1), psi.2(z))
## [1] TRUE

Named arguments can go in any order when explicitly tagged:

identical(psi.2(x=z,c=2), psi.2(c=2,x=z))
## [1] TRUE

## Checking Arguments

Problem: Odd behavior when arguments aren't as we expect

psi.2(x=z,c=c(1,1,1,10))
## [1]  0.25  9.00  0.81 81.00
psi.2(x=z,c=-1)
## [1]   0.25 -11.00   0.81 -19.00

Solution: Put little sanity checks into the code

# "Robust" loss function, for outlier-resistant regression
# Inputs: vector of numbers (x), scale for crossover (c)
# Outputs: vector with x^2 for small entries, 2c|x|-c^2 for large ones
psi.3 <- function(x,c=1) {
# Scale should be a single positive number
stopifnot(length(c) == 1,c>0)
psi <- ifelse(x^2 > c^2, 2*c*abs(x)-c^2, x^2)
return(psi)
}

Arguments to stopifnot() are a series of expressions which should all be TRUE; execution halts, with error message, at first FALSE (try it!)

## What the function can see and do

• Each function has its own environment
• Names here over-ride names in the global environment
• Internal environment starts with the named arguments
• Assignments inside the function only change the internal environment (There are ways around this, but they are difficult and best avoided)
• Names undefined in the function are looked for in the environment the function gets called from not the environment of definition

## Internal environment examples

x <- 7
y <- c("A","C","G","T","U")
adder <- function(y) { x<- x+y; return(x) }
adder(1)
## [1] 8
x
## [1] 7
y
## [1] "A" "C" "G" "T" "U"
circle.area <- function(r) { return(pi*r^2) }
circle.area(c(1,2,3))
## [1]  3.141593 12.566371 28.274334
truepi <- pi
pi <- 3
circle.area(c(1,2,3))
## [1]  3 12 27
pi <- truepi      # Restore sanity
circle.area(c(1,2,3))
## [1]  3.141593 12.566371 28.274334

## Respect the interfaces

• Interfaces mark out a controlled inner environment for our code

• Interact with the rest of the system only at the interface

• Advice: arguments explicitly give the function all the information
• Reduces risk of confusion and error
• Exception: true universals like $$\pi$$

• Likewise, output should only be through the return value

## Fitting a Model

Fact: bigger cities tend to produce more economically per capita

A proposed statistical model (Geoffrey West et al.):

$Y = y_0 N^{a} + \mathrm{noise}$

where $$Y$$ is the per-capita "gross metropolitan product" of a city, $$N$$ is its population, and $$y_0$$ and $$a$$ are parameters

gmp <- read.table("gmp.dat")
gmp$pop <- gmp$gmp/gmp$pcgmp plot(pcgmp~pop, data=gmp, log="x", xlab="Population", ylab="Per-Capita Economic Output ($/person-year)", main="US Metropolitan Areas, 2006")
curve(6611*x^(1/8),add=TRUE,col="blue")

## Fitting a Model

$Y = y_0 N^{a} + \mathrm{noise}$

Take $$y_0=6611$$ for now and estimate $$a$$ by minimizing the mean squared error

Approximate the derivative of error w.r.t $$a$$ and move against it $\begin{eqnarray*} MSE(a) & \equiv & \frac{1}{n}\sum_{i=1}^{n}{(Y_i - y_0 N_i^a)^2}\\ MSE^{\prime}(a) & \approx & \frac{MSE(a+h) - MSE(a)}{h}\\ a_{t+1} - a_{t} & \propto & -MSE^{\prime}(a) \end{eqnarray*}$

An actual first attempt at code:

maximum.iterations <- 100
deriv.step <- 1/1000
step.scale <- 1e-12
stopping.deriv <- 1/100
iteration <- 0
deriv <- Inf
a <- 0.15
while ((iteration < maximum.iterations) && (deriv > stopping.deriv)) {
iteration <- iteration + 1
mse.1 <- mean((gmp$pcgmp - 6611*gmp$pop^a)^2)
mse.2 <- mean((gmp$pcgmp - 6611*gmp$pop^(a+deriv.step))^2)
deriv <- (mse.2 - mse.1)/deriv.step
a <- a - step.scale*deriv
}
list(a=a,iterations=iteration,converged=(iteration < maximum.iterations))
## $a ## [1] 0.1258166 ## ##$iterations
## [1] 58
##
a <- plm$a f <- function(x) { return(y0*x^a) } curve(f(x),from=from,to=to) # Return with no visible value on the terminal invisible(TRUE) } ## Predicting from a Fitted Model When one function calls another, use as a meta-argument, to pass along unspecified inputs to the called function: plot.plm.2 <- function(plm,...) { y0 <- plm$y0
a <- plm\$a
f <- function(x) { return(y0*x^a) }
# from and to are possible arguments to curve()
curve(f(x), ...)
invisible(TRUE)
}

## Sub-Functions

Solve big problems by dividing them into a few sub-problems

• Easier to understand, get the big picture at a glance
• Easier to fix, improve and modify
• Easier to design
• Easier to re-use solutions to recurring sub-problems

Rule of thumb: A function longer than a page is probably too long

## Sub-Functions or Separate Functions?

Defining a function inside another function

• Pros: Simpler code, access to local variables, doesn't clutter workspace
• Cons: Gets re-declared each time, can't access in global environment (or in other functions)
• Alternative: Declare the function in the same file, source them together

Rule of thumb: If you find yourself writing the same code in multiple places, make it a separate function

## Plotting a Power-Law Model

Our old plotting function calculated the fitted values

But so does our prediction function

plot.plm.3 <- function(plm,from,to,n=101,...) {
x <- seq(from=from,to=to,length.out=n)
y <- predict.plm(object=plm,newdata=x)
plot(x,y,...)
invisible(TRUE)
}

## Recursion

Reduce the problem to an easier one of the same form:

my.factorial <- function(n) {
if (n == 1) {
return(1)
} else {
return(n*my.factorial(n-1))
}
}

## Recursion

Or multiple calls (Fibonacci numbers):

fib <- function(n) {
if ( (n==1) || (n==0) ) {
return(1)
} else {
return (fib(n-1) + fib(n-2))
}
}

Exercise: Convince yourself that any loop can be replaced by recursion; can you always replace recursion with a loop?

## Summary

• Functions bundle related commands together into objects: easier to re-run, easier to re-use, easier to combine, easier to modify, less risk of error, easier to think about
• Interfaces control what the function can see (arguments, environment) and change (its internals, its return value)
• Calling functions we define works just like calling built-in functions: named arguments, defaults
• Multiple functions let us do multiple related jobs, either on the same object or on similar ones
• Sub-functions let us break big problems into smaller ones, and re-use the solutions to the smaller ones
• Recursion is a powerful way of making hard problems simpler