## Agenda

Suppose we have several different models for a particular data set. How should we choose the best one? Naturally, we would want to select the best performing model in order to choose the best. What is performance? How to estimate performance? Thinking about these ideas, we'll consider the following:

• Model assessment and selection
• Prediction error
• Cross-validation
• Smoothing example

## Example: Lidar data

• Consider the lidar data from SemiPar package in R
library(SemiPar)
data(lidar)
• Randomly split data into K subsets
• Each observation gets a foldid between 1 and K
K <- 10
n <- nrow(lidar)
foldid <-sample(rep(1:K, length = n))
• Split data into training and test data
lidar.train <- subset(lidar, foldid != 1)
lidar.test <-  subset(lidar, foldid == 1)
attach(lidar.train)

## Example: Lidar data

plot(range,logratio, main="Lidar Data")

## loess (locally weighted smoothing)

• loess (locally weighted smoothing) is a popular tool that creates a smooth line through a timeplot or scatter plot to help you to see relationship between variables and foresee trends
• loess is typically to
• Fitting a line to a scatter plot or time plot where noisy data values, sparse data points or weak interrelationships interfere with your ability to see a line of best fit
• Linear regression where least squares fitting doesnâ€™t create a line of good fit or is too labor-intensive to use
• Data exploration and analysis

## Nonparametric smoothing

• Benefits
• Provides a flexible approach to representing data
• Ease of use
• Computations are relatively easy (sometimes)
• No simple equation for a set of data
• Less understood than parametric smoothers
• Depends on a span parameter controlling the smoothness

## Example: Lidar data

• Can consider fitting a loess smooth to the data
obj0 <- loess(logratio ~ range, data = lidar.train,
control = loess.control(surface = 'direct'))

## Example: Lidar data

plot(obj0, xlab="range", ylab="logratio", main="Lidar Data with Loess Smooth")
points(obj0$x , obj0$fitted, type="l", col="red", lwd=2)

## Example: Lidar data

• How to choose the span parameter?
• Which model is the best? Why?
obj3 <- loess(logratio ~ range, data = lidar.train, span = 1,
control = loess.control(surface = 'direct'))
obj2 <- loess(logratio ~ range, data = lidar.train, span = .3,
control = loess.control(surface = 'direct'))
obj1 <- loess(logratio ~ range, data = lidar.train, span = .02,
control = loess.control(surface = 'direct'))
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 390
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 3
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 4
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : Chernobyl! trL>n 198

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : Chernobyl! trL>n 198
## Warning in sqrt(sum.squares/one.delta): NaNs produced

## Example: Lidar data

• Model 1: $$\sum_i \left| Y_i - \hat{f} (X_i) \right| ^2 = 0$$
• Model 2: $$\sum_i \left| Y_i - \hat{f} (X_i) \right| ^2 = 1.03$$
• Model 3: $$\sum_i \left| Y_i - \hat{f} (X_i) \right| ^2 = 1.71$$

Therefore, Model 1 has the smallest squared error over the data

c(sum((logratio - obj1$fitted)^2), sum((logratio - obj2$fitted)^2),
sum((logratio - obj3$fitted)^2)) ## [1] 1.396971e-30 1.195109e+00 1.879064e+00 c(mean((logratio - obj1$fitted)^2), mean((logratio - obj2$fitted)^2), mean((logratio - obj3$fitted)^2))
## [1] 7.055410e-33 6.035906e-03 9.490224e-03

## Example: Lidar data

• Suppose now that we have new data from a second experiment, lidar.test
• How well do my three models predict the new data? Consider again $$\sum_i \left| Y_i^{(new)} - \hat{f} (X_i^{(new)}) \right| ^2$$

## Example: Lidar data

plot(lidar.test$range,lidar.test$logratio, xlab="range",
ylab="logratio", main="New Lidar Data")