- High-level graphics
- Custom graphics
- Layered graphics in
ggplot2
ggplot2
hist()
, boxplot()
, plot()
, points()
, lines()
, text()
, mtext()
, axis()
, etc. form a suite that plot graphs and add features to the graphpar()
can be used to set or query graphical parametersx = state.x77[ , 2] # 50 average state incomes in 1977 hist(x)
hist(x, breaks = 8, xlab="Income", main="Histogram of State Income in 1977")
y = quakes$depth # 1000 earthquake depths hist(y, seq(0, 700, by = 70), xlab="Earthquake Depth", main="Histogram of Earthquake Depths")
Function ecdf()
provides data for empirical cdf
plot.ecdf(x)
Can add vertical lines and remove dots
plot.ecdf(x, verticals = T, pch = "", xlab="Income", main="ECDF of State Income in 1977")
plot.ecdf(y, verticals = T, pch = "", xlab="Earthquake Depth", main="ECDF of Earthquake Depths")
qqnorm()
and qqplot()
qqnorm()
and qqplot()
qqnorm(x) # qq plot for the earthquake depths qqline(x, col = "red") # red reference line
qqnorm()
and qqplot()
qqnorm(y) # qq plot for the earthquake depths qqline(y, col = "red") # red reference line
boxplot(count ~ spray, data = InsectSprays)
plot(x, y)
plot(quakes$long, quakes$lat, xlab="Latitude", ylab="Longitude", main="Location of Earthquake Epicenters")
plot(x, y)
symbols(quakes$long, quakes$lat, circles = 10 ^ quakes$mag, xlab="Latitude", ylab="Longitude", main="Location of Earthquake Epicenters")
pairs(x)
pairs(trees)
contour(crimtab, main="Contour Plot of Criminal Data")
image(crimtab, main="Image Plot of Criminal Data")
persp(crimtab, theta=30, main="Perspective Plot of Criminal Data")
pie.sales = c(0.12, 0.30, 0.26, 0.16, 0.04, 0.12) names(pie.sales) = c("Blueberry", "Cherry", "Apple", "Boston Creme", "Other", "Vanilla Creme") pie(pie.sales, col = c("blue", "red", "green", "wheat", "orange", "white"))
dotchart()
and barplot()
also available
barplot(VADeaths, beside = T, legend = T, main = "Virginia Death Rates per 1000 in 1940")
ts.plot(AirPassengers, xlab="Date", ylab="Passengers (in thousands)" , main="International Airline Passengers")
ts.plot(presidents, xlab="Date", ylab="Approval Rating" , main="Presidential Approval Ratings")
par()
can be used to set or query graphical parametersadj
: text justificationbg
: background colorcol
, col.axis
, col.lab
, …: color specificationlty
: line type, e.g. dashed, dotted, solid (default), longdash, …lwd
: line width (helpful to increase for presentation plots)mfcol
and mfrow
: subsequent figures will be drawn in an nr-by-nc array on the devicepch
: point typesxlog
: plots to log scale if TRUE
Plot of binomial distribution with \(n=5\) and \(p=.4\)
x = 0:5 y = dbinom(x, 5, 2 / 5) plot(x, y, type = "h", main="Binomial Distribution", xlab="Value", ylab="Probability")
Probability density function for the standard Normal distribution from -3 to 3
x = seq(-3, 3, by = 0.01) y = dnorm(x) plot(x, y, type = "l", main="Normal Distribution", ylab="f(x)")
Puromycin
datasetx = Puromycin$rate[Puromycin$state == "treated"] y = Puromycin$rate[Puromycin$state == "untreated"]
Puromycin
datasetplot.ecdf(x, verticals = TRUE, pch = "", xlim = c(60, 200), main="Treated versus Untreated") lines(ecdf(y), verticals = TRUE, pch = "", xlim = c(60, 200), col="blue") legend("bottomright", c("Treated", "Untreated"), pch = "", col=c("black", "blue"), lwd = 1)
postscript()
, pdf()
, tiff()
, jpeg()
, …dev.off()
pdf("2cdfs.pdf", width=6, height=4) plot.ecdf(x, verticals = TRUE, pch = "", xlim = c(60, 200), main="Treated versus Untreated") lines(ecdf(y), verticals = TRUE, pch = "", xlim = c(60, 200), col="blue") legend("bottomright", c("Treated", "Untreated"), pch = "", col=c("black", "blue"), lwd = 1) dev.off()
## quartz_off_screen ## 2
x = seq(0, 2 * pi, length = 100) sine = sin(x) cosine = cos(x) matplot(x, cbind(sine, cosine), col = c(1, 1), type = "l")
par(mfrow = c(2, 2)) boxplot(precip) hist(precip) plot.ecdf(precip) qqnorm(precip)
par(mfrow = c(1, 1))
plot(rate ~ conc, data = Puromycin, pch = 15 * (state == "treated") + 1) legend("bottomright", legend = c("Untreated", "Treated"), pch = c(1, 16))
persp()
for wire meshx = seq(-8, 8, length = 100) y = x f = function(x, y) sin(sqrt(x ^ 2 + y ^ 2)) / (sqrt (x ^ 2 + y ^ 2)) z = outer(x, y, f) persp(x, y, z, xlab = "", ylab = "", zlab = "", axes = F, box = F)
ggplot2
ggplot2
package is a popular by Hadley Wickhambase
and lattice
graphics and none of the bad partsExplore a data on the relationship between smoking and pulmonary function from Rosner (1999) using layered graphics created with ggplot2
. The data consists of a sample of 654 youths, aged 3 to 19, in the area of East Boston during middle to late 1970’s. Our main interest is in the relationship between smoking and FEV. Use the following commands to load the data and ggplot2
.
load(url("http://www.faculty.ucr.edu/~jflegal/fev.RData")) library(ggplot2) str(fevdata)
## 'data.frame': 654 obs. of 5 variables: ## $ age : int 9 8 7 9 9 8 6 6 8 9 ... ## $ fev : num 1.71 1.72 1.72 1.56 1.9 ... ## $ height: num 57 67.5 54.5 53 57 61 58 56 58.5 60 ... ## $ sex : Factor w/ 2 levels "female","male": 1 1 1 2 2 1 1 1 1 1 ... ## $ smoke : Factor w/ 2 levels "nonsmoker","smoker": 1 1 1 1 1 1 1 1 1 1 ...
ggplot2
ggplot2
allows you to construct multi-layered graphics. A plot in ggplot2
consists of several components:
Layers consist of:
ggplot2
uses the +
operator to build up a plot from these components. The basic plot definition looks like this:
ggplot(data, mapping) + layer( stat = "", geom = "", position = "", geom_parms = list(), stat_params = list(), )
We usually won’t write out the full specification of layer, but use shortcuts like:
geom_point() stat_summary()
Every geom has a default stat and every stat has a default geom.
Usually, data and mappings are the same for all layers and so they can be stored as defaults:
ggplot(data, mapping = aes(x = x, y = y))
All layers use the default values of data and mapping unless explicitly you override them explicitly. The aes()
function describes the mapping that will be used for each layer. You must specify a default, but you can also specify per layer mappings and data:
ggplot(data, mapping(aes(x = x, y = y))) + geom_point(aes(color = z)) + geom_line(data = another_data)
Scatterplot of age
versus fev
:
s <- ggplot(fevdata, aes(x = age, y = fev)) s + geom_point()
You can add additional layers to a plot with the +
operator. Let’s try adding a line that shows the average value of fev
for each age
. One way to do this is to construct an additional data frame with columns corresponding to age
and average value of fev
and then add a layer with this data. We will do this with the dplyr
package. Don’t worry about how it works for now. We will learn more about it later. First, you will need to install dplyr
if it is not already installed:
install.packages("dplyr")
library(dplyr)
## ## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats': ## ## filter, lag
## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union
fev_mean <- summarize(group_by(fevdata, age), fev = mean(fev)) s + geom_point() + geom_line(data = fev_mean)
Similarly, we can add a smoother to the scatterplot by first computing the smooth and storing it in data frame. Then add a layer with that data. Since smoothers are so useful, this operation is available in ggplot2
as a stat.
stat_smooth()
provides a smoothing transformation. It creates a new data frame with the values of the smooth and by default uses geom="ribbon"
so that both the smooth curve and error bands are shown.
s + geom_point() + stat_smooth()
s + geom_point() + stat_smooth(span = 1)
s + geom_point() + stat_smooth(span = 1/2)
lm
:s + geom_point() + stat_smooth(method = 'lm')
age
and fev
are highly correlated. What else is correlated with age
and fev
?age
and fev
among smokers and non-smokers? One way is to use two separates smoothers: one for smokers and one for non-smokers.p <- ggplot(fevdata, aes(x = age, y = fev, group = smoke)) p + geom_point() + stat_smooth()
A problem with this plot is that we can’t tell which group each curve corresponds to.
p + geom_point() + stat_smooth(aes(color = smoke))
Or for points and smooths
p <- ggplot(fevdata, aes(x = age, y = fev, group = smoke, color = smoke)) p + geom_point() + stat_smooth()
How is the following plot different from the others in terms of the conclusions you might draw about the relation between smoke
and fev
?
p + geom_point() + stat_smooth(method = 'lm')
Faceting is a technique for constructing multiple subplots that show different subsets of data. ggplot2
has two ways to facet: facet_wrap
and facet_grid
. Grid faceting allows you use specify up to two factor variables: one for rows and one for columns of the grid. Wrap faceting allows you to specify one factor variable. We can use faceting to examine the relationship between fev
and height
for different combinations of age
and sex
. This will allow us to view four variables simultaneously.
p <- ggplot(fevdata, aes(x = height, y = fev)) + facet_grid(sex ~ age) p + geom_point()
One problem with the previous plot is that there are too many ages and relatively few observations at each age. We can instead try dividing age into a smaller number of groups using the cut()
function. cut()
creates a new factor variable by cutting its input. Here we cut age into 5 intervals of equal length:
fevdata <- transform(fevdata, age_group = cut(age, breaks = 5))
Then make new plots:
p <- ggplot(fevdata, aes(x = height, y = fev)) + facet_grid(sex ~ age_group) p + geom_point(aes(color = smoke))