equation with error, along with assumptions about distributions is the "statistical model"
equation with sample estimates of population parameters is the "estimating equation"
main SAS programs for doing regression are REG, GLM, and MIXED
Return to the chapter table of contents
2.2.3 Several independent variables
using the data set AUCTION, add the numbers of other
animals sold in the market to create a multiple regression model
proc reg data=auction;
model cost=cattle calves hogs sheep;
understand and interpret:
the H0 and meaning of the F test and df
Rsq and adjustment to Rsq
parameter estimates, predicted values, estimated standard errors of
coefficients, t and p
2.2.4 Sequential (SS1) and Partial (SS2) sums of squares
model cost=cattle calves hogs sheep / ss1 ss2;
reduction notation for the two ss types R(b1|b0) means ss due to b1, controlling
b0
F tests can be based on any type 1 or type 2 SS using MSE as the denominator
2.2.5 Tests for subsets and linear combinations
The test statement in REG after the model allows testing of hypotheses about
single coefficients or combinations, for example
model cost=cattle calves hogs sheep;
label_for_this_test test hogs=0; tests the hypothesis that the b for
hogs is zero
lable2 test hogs= 0, sheep=0; performs two tests
label3 test intercept=0; is a test that b0 is 0
label4 test hogs=1; tests if hogs coefficient differs from 1
lable5 test hogs-calves=.5; tests if hogs is greater than calves+.5
2.2.6 Restricted models
like tests for particular coefficients, test hypotheses about parameters
model cost=cattle calves hogs sheep;
restrict intercept=0, hogs-sheep=0; estimates model with no intercept
and with two coefficients forced to be equal.
output provides a single degree of freedom test for the significance of the
difference due to restriction, as opposed to freeing the parameter(s)
caution: when using the / noint option in PROC REG, R2 and other
statistics are not correctly estimated.
2.2.7. Exact linear dependency
In the AUCTION dataset, volume is the sum or the four livestock types.
Entering it, along with the four component parts creates exact dependency, SAS
will print a warning message and set one of the variables to zero. This
does not bias the remaining coefficients.
Return to the chapter table of contents
2.3.1. GLM for linear regression
proc glm;
model cost=cattle calves hogs sheep;
run;
produces the same output as REG
there are differences in the meaning of the SS types in some anova models. Type III ss in glm corresponds to type II (partial) in reg.
2.3.2 Using contrast statements to test regression parameters
Contrasts test hypotheses about linear combinations of regression parameters
model ...
contrast 'contrast_name' effect values; for
example:
contrast 'hogcost=0' intercept 0 cattle 0 calves 0 hogs 1
sheep 0; tests whether the coefficient for hogs is zero, ignoring the other
coefficients
contrast 'hogcost=sheepcost' hogs 1 sheep -1; tests whether
the difference between coefficient for hogs and coefficient for sheep is zero.
contrast 'hogcost=sheepcost=0' hogs 1, sheep 1; tests two
contrasts: are hogs zero, are sheep zero.
output identifies each contrast by name, and provides a proper F test for it.
2.3.3. Using ESTIMATE to estimate linear combinations of parameters
ESTIMATE works the same as contrast, but creates linear parameter estimates and
tests them for significant difference from zero, for example...
model ....;
estimate 'hogcost=sheepcost' hogs 1 sheep -1;
essentially takes the coefficient for hogs, subtracts the coefficient for sheep,
prints the difference and tests whether this difference is significant.
Return to the chapter table of contents
2.4.1 the linear regression model
terminology for representing the equation in matrix form (Y = column
vector of Y scores for individuals; X = matrix of cases by x variables; e
as the vector of cases by the residual). beta prime is the vector of
regression parameters. Part of the ols solution is 1 over Xinverse*X
often called C, or the matrix of variances and covariances of
regression parameter estimates.
2.4.2 partitioning sums of squares
total, model, and error SS. MSE = error SS / (n-m-1)
tests are often done with reductions in sums of squares of a model compared to
another, when nested, with df equal to difference
2.4.3 hypothesis tests about confidence intervals
three most common tests: all slopes are simultaneously zero (standard F
test); test that some sub-set of parameters are zero (done by comparing the full
model to a reduced model); confidence intervals about a particular
sub-population defined by simultaneous scores on the several X variables.
2.4.4. using the generalized inverse
The generalized inverse is used when the model is under-identifed by using all
values of X as independent variables. There are any number of
parameterizations possible in this circumstance. But, the SS error is not
affected by parameterization. coefficients, and linear combinations of
them are affected.
Return to the chapter table of contents
AUCTION.sav
Return to the chapter table of contents