ppr {modreg} | R Documentation |
Fit a projection pursuit regression model.
ppr(formula, data = sys.parent(), weights, subset, na.action, contrasts = NULL, ww = rep(1,q), nterms, max.terms=nterms, optlevel = 2, sm.method = c("supsmu", "spline", "gcvspline"), bass = 0, span = 0, df = 5, gcvpen = 1) ppr(x, y, weights = rep(1,n), ww = rep(1,q), nterms, max.terms = nterms, optlevel = 2, sm.method = c("supsmu", "spline", "gcvspline"), bass = 0, span = 0, df = 5, gcvpen = 1)
formula |
a regression formula specifying one or more response variables and the explanatory variables. |
x |
matrix of explanatory variables. Rows represent observations, and columns represent variables. Missing values are not accepted. |
nterms |
number of terms to include in the final model. |
data |
Data frame from which variables specified in formula are
preferentially to be taken.
|
weights |
a vector of weights for each case. |
ww |
a vector of weights for each response, so the fit criterion is
the sum over case i and responses j of
w_i ww_j (y_ij - fit_ij)^2 divided by the sum of w_i .
|
subset |
An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) |
na.action |
A function to specify the action to be taken if NA s are
found. The default action is for the procedure to fail. An
alternative is na.omit , which leads to rejection of cases
with missing values on any required variable. (NOTE: If given, this
argument must be named.)
|
contrasts |
the contrasts to be used when any factor explanatory variables are coded. |
max.terms |
maximum number of terms to choose from when building the model. |
optlevel |
integer from 0 to 3 which determines the thoroughness of an optimization routine in the SMART program. See the Details section. |
sm.method |
the method used for smoothing the ridge functions. The default is to
use Friedman's super smoother supsmu . The alternatives are to use
the smoothing spline code underlying smooth.spline , either with a
specified (equivalent) degrees of freedom for each ridge functions, or
to allow the smoothness to be chosen by GCV.
|
bass |
super smoother bass tone control used with automatic span selection
(see supsmu ); the range of values is 0 to 10, with larger values
resulting in increased smoothing.
|
span |
super smoother span control (see supsmu ). The default, 0 ,
results in automatic span selection by local cross validation. span
can also take a value in (0, 1] .
|
df |
if sm.method is "spline" specifies the smoothness of
each ridge term via the requested equivalent degrees of freedom.
|
gcvpen |
if sm.method is "gcvspline" this is the penalty used
in the GCV selection for each degree of freedom used.
|
The basic method is given by Friedman (1984), and is essentially the
same code used by S-PLUS's ppreg
. This code is extremely
sensitive to the compiler used.
The algorithm first adds up to max.terms
ridge terms one at a
time; it will use less if it is unable to find a term to add that makes
sufficient difference. It then removes the least "important"
term at each step until nterm
terms are left.
The levels of optimization (argument optlevel
)
differ in how thoroughly the models are refitted during this process.
At level 0 the existing ridge terms are not refitted. At level 1
the projection directions are not refitted, but the ridge
functions and the regression coefficients are.
Levels 2 and 3 refit all the terms and are equivalent for one
response; level 3 is more careful to re-balance the contributions
from each regressor at each step and so is a little less likely to
converge to a saddle point of the sum of squares criterion.
call |
the matched call |
p |
the number of explanatory variables (after any coding) |
q |
the number of response variables |
ml |
the argument max.terms
|
gof |
the overall residual (weighted) sum of squares for the selected model |
gofn |
the overall residual (weighted) sum of squares against the number of
terms, up to max.terms . Will be invalid (and zero) for less than
nterms .
|
df |
the argument df
|
edf |
if sm.method is "spline" or "gcvspline" the equivalent number of
degrees of freedom for each ridge term used.
|
xnames |
the names of the explanatory variables |
ynames |
the names of the response variables |
alpha |
a matrix of the projection directions, with a column for each ridge term |
beta |
a matrix of the coefficients applied for each response to the ridge terms: the rows are the responses and the columns the ridge terms |
yb |
the weighted means of each response |
ys |
the overall scale factor used: internally the responses are divided by
ys to have unit total weighted sum of squares.
|
fitted.values |
the fitted values, as a matrix if q > 1
|
residuals |
the residuals, as a matrix if q > 1
|
smod |
internal work array, which includes the ridge functions evaluated at the training set points. |
Friedman, J. H. and Stuetzle, W. (1981) Projection pursuit regression. Journal of the American Statistical Association, 76, 817-823.
Friedman, J. H. (1984) SMART User's Guide. Laboratory for Computational Statistics, Stanford University Technical Report No. 1.
plot.ppr
, supsmu
, smooth.spline
# Note: your numerical values may differ data(rock) attach(rock) area1 <- area/10000; peri1 <- peri/10000 rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape, data=rock, nterms=2, max.terms=5) rock.ppr # Call: # ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock, # nterms = 2, max.terms = 5) # # Goodness of fit: # 2 terms 3 terms 4 terms 5 terms # 8.737806 5.289517 4.745799 4.490378 summary(rock.ppr) # ..... (same as above) # ..... # # Projection direction vectors: # term 1 term 2 # area1 0.34357179 0.37071027 # peri1 -0.93781471 -0.61923542 # shape 0.04961846 0.69218595 # # Coefficients of ridge terms: # term 1 term 2 # 1.6079271 0.5460971 par(mfrow=c(3,2))# maybe: , pty="s") plot(rock.ppr, main="ppr(log(perm)~ ., nterms=2, max.terms=5)") plot(update(rock.ppr, bass=5), main = "update(..., bass = 5)") plot(update(rock.ppr, sm.method="gcv", gcvpen=2), main = "update(..., sm.method=\"gcv\", gcvpen=2)")