Title: | Variable Selection in Partial Least Squares |
---|---|
Description: | Interfaces and methods for variable selection in Partial Least Squares. The methods include filter methods, wrapper methods and embedded methods. Both regression and classification is supported. |
Authors: | Kristian Hovde Liland [aut, cre] , Tahir Mehmood [ctb], Solve Sæbø [ctb] |
Maintainer: | Kristian Hovde Liland <[email protected]> |
License: | GPL (>=2) |
Version: | 0.9.12 |
Built: | 2024-11-22 04:21:33 UTC |
Source: | https://github.com/khliland/plsvarsel |
A backward variable elimination procedure for elimination of non informative variables.
bve_pls(y, X, ncomp = 10, ratio = 0.75, VIP.threshold = 1)
bve_pls(y, X, ncomp = 10, ratio = 0.75, VIP.threshold = 1)
y |
vector of response values ( |
X |
numeric predictor |
ncomp |
integer number of components (default = 10). |
ratio |
the proportion of the samples to use for calibration (default = 0.75). |
VIP.threshold |
thresholding to remove non-important variables (default = 1). |
Variables are first sorted with respect to some importancemeasure, and usually one of the filter measures described above are used. Secondly, a threshold is used to eliminate a subset of the least informative variables. Then a model is fitted again to the remaining variables and performance is measured. The procedure is repeated until maximum model performance is achieved.
Returns a vector of variable numbers corresponding to the model having lowest prediction error.
Tahir Mehmood, Kristian Hovde Liland, Solve Sæbø.
I. Frank, Intermediate least squares regression method, Chemometrics and Intelligent Laboratory Systems 1 (3) (1987) 233-242.
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
data(gasoline, package = "pls") with( gasoline, bve_pls(octane, NIR) )
data(gasoline, package = "pls") with( gasoline, bve_pls(octane, NIR) )
Sequential selection of variables based on squared covariance with response and intermediate deflation (as in Partial Least Squares).
covSel(X, Y, nvar)
covSel(X, Y, nvar)
X |
|
Y |
|
nvar |
maximum number of variables |
selected |
an integer vector of selected variables |
scores |
a matrix of score vectors |
loadings |
a matrix of loading vectors |
Yloadings |
a matrix of Y loadings |
J.M. Roger, B. Palagos, D. Bertrand, E. Fernandez-Ahumada. CovSel: Variable selection for highly multivariate and multi-response calibration: Application to IR spectroscopy. Chemom Intel Lab Syst. 2011;106(2):216-223. P. Mishra, A brief note on a new faster covariate's selection (fCovSel) algorithm, Journal of Chemometrics 36(5) 2022.
data(gasoline, package = "pls") sels <- with(gasoline, covSel(NIR, octane, 5)) matplot(t(gasoline$NIR), type = "l") abline(v = sels$selected, col = 2)
data(gasoline, package = "pls") sels <- with(gasoline, covSel(NIR, octane, 5)) matplot(t(gasoline$NIR), type = "l") abline(v = sels$selected, col = 2)
Extract the index of influential variables based on threshold defiend for LW (loading weights), RC (regression coef), JT (jackknife testing) and VIP (variable importance on projection).
filterPLSR( y, X, ncomp = 10, ncomp.opt = c("minimum", "same"), validation = "LOO", LW.threshold = NULL, RC.threshold = NULL, URC.threshold = NULL, FRC.threshold = NULL, JT.threshold = NULL, VIP.threshold = NULL, SR.threshold = NULL, sMC.threshold = NULL, mRMR.threshold = NULL, WVC.threshold = NULL, ... )
filterPLSR( y, X, ncomp = 10, ncomp.opt = c("minimum", "same"), validation = "LOO", LW.threshold = NULL, RC.threshold = NULL, URC.threshold = NULL, FRC.threshold = NULL, JT.threshold = NULL, VIP.threshold = NULL, SR.threshold = NULL, sMC.threshold = NULL, mRMR.threshold = NULL, WVC.threshold = NULL, ... )
y |
vector of response values ( |
X |
numeric predictor |
ncomp |
integer number of components (default = 10). |
ncomp.opt |
use the number of components corresponding to minimum error (minimum)
or |
validation |
type of validation in the PLS modelling (default = "LOO"). |
LW.threshold |
threshold for Loading Weights if applied (default = NULL). |
RC.threshold |
threshold for Regression Coefficients if applied (default = NULL). |
URC.threshold |
threshold for Unit normalized Regression Coefficients if applied (default = NULL). |
FRC.threshold |
threshold for Fitness normalized Regression Coefficients if applied (default = NULL). |
JT.threshold |
threshold for Jackknife Testing if applied (default = NULL). |
VIP.threshold |
threshold for Variable Importance on Projections if applied (default = NULL). |
SR.threshold |
threshold for Selectivity Ration if applied (default = NULL). |
sMC.threshold |
threshold for Significance Multivariate Correlation if applied (default = NULL). |
mRMR.threshold |
threshold for minimum Redundancy Maximum Releveance if applied (default = NULL). |
WVC.threshold |
threshold for Weighted Variable Contribution if applied (default = NULL). |
... |
additional paramters for |
Filter methods are applied for variable selection with PLSR. This function can return selected variables and Root Mean Squared Error of Cross-Validation for various filter methods and determine optimum numbers of components.
Returns a list of lists containing filters (outer list), their selected variables, optimal numbers of components and prediction accuracies.
Tahir Mehmood, Kristian Hovde Liland, Solve Sæbø.
T. Mehmood, K.H. Liland, L. Snipen, S. Sæbø, A review of variable selection methods in Partial Least Squares Regression, Chemometrics and Intelligent Laboratory Systems 118 (2012) 62-69.
VIP
(SR/sMC/LW/RC/URC/FRC/mRMR), filterPLSR
, spa_pls
,
stpls
, truncation
, bve_pls
, mcuve_pls
,
ipw_pls
, ga_pls
, rep_pls
, WVC_pls
, T2_pls
.
data(gasoline, package = "pls") ## Not run: with( gasoline, filterPLSR(octane, NIR, ncomp = 10, "minimum", validation = "LOO", RC.threshold = c(0.1,0.5), SR.threshold = 0.5)) ## End(Not run)
data(gasoline, package = "pls") ## Not run: with( gasoline, filterPLSR(octane, NIR, ncomp = 10, "minimum", validation = "LOO", RC.threshold = c(0.1,0.5), SR.threshold = 0.5)) ## End(Not run)
A subset search algorithm inspired by biological evolution theory and natural selection.
ga_pls(y, X, GA.threshold = 10, iters = 5, popSize = 100)
ga_pls(y, X, GA.threshold = 10, iters = 5, popSize = 100)
y |
vector of response values ( |
X |
numeric predictor |
GA.threshold |
the change for a zero for mutations and initialization (default = 10). (The ratio of non-selected variables for each chromosome.) |
iters |
the number of iterations (default = 5). |
popSize |
the population size (default = 100). |
1. Building an initial population of variable sets by setting bits for each variable randomly, where bit '1' represents selection of corresponding variable while '0' presents non-selection. The approximate size of the variable sets must be set in advance.
2. Fitting a PLSR-model to each variable set and computing the performance by, for instance, a leave one out cross-validation procedure.
3. A collection of variable sets with higher performance are selected to survive until the next "generation".
4. Crossover and mutation: new variable sets are formed 1) by crossover of selected variables between the surviving variable sets, and 2) by changing (mutating) the bit value for each variable by small probability.
5. The surviving and modified variable sets form the population serving as input to point 2.
Returns a vector of variable numbers corresponding to the model having lowest prediction error.
Tahir Mehmood, Kristian Hovde Liland, Solve Sæbø.
K. Hasegawa, Y. Miyashita, K. Funatsu, GA strategy for variable selection in QSAR studies: GA-based PLS analysis of calcium channel antagonists, Journal of Chemical Information and Computer Sciences 37 (1997) 306-310.
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
data(gasoline, package = "pls") # with( gasoline, ga_pls(octane, NIR, GA.threshold = 10) ) # Time-consuming
data(gasoline, package = "pls") # with( gasoline, ga_pls(octane, NIR, GA.threshold = 10) ) # Time-consuming
An iterative procedure for variable elimination.
ipw_pls( y, X, ncomp = 10, no.iter = 10, IPW.threshold = 0.01, filter = "RC", scale = TRUE ) ipw_pls_legacy(y, X, ncomp = 10, no.iter = 10, IPW.threshold = 0.1)
ipw_pls( y, X, ncomp = 10, no.iter = 10, IPW.threshold = 0.01, filter = "RC", scale = TRUE ) ipw_pls_legacy(y, X, ncomp = 10, no.iter = 10, IPW.threshold = 0.1)
y |
vector of response values ( |
X |
numeric predictor |
ncomp |
integer number of components (default = 10). |
no.iter |
the number of iterations (default = 10). |
IPW.threshold |
threshold for regression coefficients (default = 0.1). |
filter |
which filtering method to use (among "RC", "SR", "LW", "VIP", "sMC") |
scale |
standardize data (default=TRUE, as in reference) |
This is an iterative elimination procedure where a measure of predictor importance is computed after fitting a PLSR model (with complexity chosen based on predictive performance). The importance measure is used both to re-scale the original X-variables and to eliminate the least important variables before subsequent model re-fitting
The IPW implementation was corrected in plsVarSel
version 0.9.5. For backward
compatibility the old implementation is included as ipw_pls_legacy
.
Returns a vector of variable numbers corresponding to the model having lowest prediction error.
Kristian Hovde Liland
M. Forina, C. Casolino, C. Pizarro Millan, Iterative predictor weighting (IPW) PLS: a technique for the elimination of useless predictors in regression problems, Journal of Chemometrics 13 (1999) 165-184.
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, setDA
.
data(gasoline, package = "pls") with( gasoline, ipw_pls(octane, NIR) )
data(gasoline, package = "pls") with( gasoline, ipw_pls(octane, NIR) )
For each number of components LDA/QDA models are created from the scores of the supplied PLS model and classifications are performed.
lda_from_pls(model, grouping, newdata, ncomp)
lda_from_pls(model, grouping, newdata, ncomp)
model |
|
grouping |
vector of grouping labels |
newdata |
predictors in the same format as in the |
ncomp |
maximum number of PLS components |
matrix of classifications
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
data(mayonnaise, package = "pls") mayonnaise <- within(mayonnaise, {dummy <- model.matrix(~y-1,data.frame(y=factor(oil.type)))}) pls <- plsr(dummy ~ NIR, ncomp = 10, data = mayonnaise, subset = train) with(mayonnaise, { classes <- lda_from_pls(pls, oil.type[train], NIR[!train,], 10) colSums(oil.type[!train] == classes) # Number of correctly classified out of 42 })
data(mayonnaise, package = "pls") mayonnaise <- within(mayonnaise, {dummy <- model.matrix(~y-1,data.frame(y=factor(oil.type)))}) pls <- plsr(dummy ~ NIR, ncomp = 10, data = mayonnaise, subset = train) with(mayonnaise, { classes <- lda_from_pls(pls, oil.type[train], NIR[!train,], 10) colSums(oil.type[!train] == classes) # Number of correctly classified out of 42 })
For each number of components LDA/QDA models are created from the scores of the supplied PLS model and classifications are performed. This use of cross-validation has limitations. Handle with care!
lda_from_pls_cv(model, X, y, ncomp, Y.add = NULL)
lda_from_pls_cv(model, X, y, ncomp, Y.add = NULL)
model |
|
X |
predictors in the same format as in the |
y |
vector of grouping labels |
ncomp |
maximum number of PLS components |
Y.add |
additional responses |
matrix of classifications
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
data(mayonnaise, package = "pls") mayonnaise <- within(mayonnaise, {dummy <- model.matrix(~y-1,data.frame(y=factor(oil.type)))}) pls <- plsr(dummy ~ NIR, ncomp = 8, data = mayonnaise, subset = train, validation = "CV", segments = 40, segment.type = "consecutive") with(mayonnaise, { classes <- lda_from_pls_cv(pls, NIR[train,], oil.type[train], 8) colSums(oil.type[train] == classes) # Number of correctly classified out of 120 })
data(mayonnaise, package = "pls") mayonnaise <- within(mayonnaise, {dummy <- model.matrix(~y-1,data.frame(y=factor(oil.type)))}) pls <- plsr(dummy ~ NIR, ncomp = 8, data = mayonnaise, subset = train, validation = "CV", segments = 40, segment.type = "consecutive") with(mayonnaise, { classes <- lda_from_pls_cv(pls, NIR[train,], oil.type[train], 8) colSums(oil.type[train] == classes) # Number of correctly classified out of 120 })
Artificial noise variables are added to the predictor set before the PLSR model is fitted. All the original variables having lower "importance" than the artificial noise variables are eliminated before the procedure is repeated until a stop criterion is reached.
mcuve_pls(y, X, ncomp = 10, N = 3, ratio = 0.75, MCUVE.threshold = NA)
mcuve_pls(y, X, ncomp = 10, N = 3, ratio = 0.75, MCUVE.threshold = NA)
y |
vector of response values ( |
X |
numeric predictor |
ncomp |
integer number of components (default = 10). |
N |
number of samples Mone Carlo simulations (default = 3). |
ratio |
the proportion of the samples to use for calibration (default = 0.75). |
MCUVE.threshold |
thresholding separate signal from noise (default = NA creates automatic threshold from data). |
Returns a vector of variable numbers corresponding to the model having lowest prediction error.
Tahir Mehmood, Kristian Hovde Liland, Solve Sæbø.
V. Centner, D. Massart, O. de Noord, S. de Jong, B. Vandeginste, C. Sterna, Elimination of uninformative variables for multivariate calibration, Analytical Chemistry 68 (1996) 3851-3858.
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
data(gasoline, package = "pls") with( gasoline, mcuve_pls(octane, NIR) )
data(gasoline, package = "pls") with( gasoline, mcuve_pls(octane, NIR) )
Adaptation of mvr
from package pls
v 2.4.3.
mvrV( formula, ncomp, Y.add, data, subset, na.action, shrink, method = c("truncation", "stpls", "model.frame"), scale = FALSE, validation = c("none", "CV", "LOO"), model = TRUE, x = FALSE, y = FALSE, ... )
mvrV( formula, ncomp, Y.add, data, subset, na.action, shrink, method = c("truncation", "stpls", "model.frame"), scale = FALSE, validation = c("none", "CV", "LOO"), model = TRUE, x = FALSE, y = FALSE, ... )
formula |
a model formula. Most of the lm formula constructs are supported. See below. |
ncomp |
the number of components to include in the model (see below). |
Y.add |
a vector or matrix of additional responses containing relevant information about the observations. Only used for cppls. |
data |
an optional data frame with the data to fit the model from. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain missing values. The default is set by the na.action setting of options, and is na.fail if that is unset. The 'factory-fresh' default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful. See na.omit for other alternatives. |
shrink |
optional shrinkage parameter for |
method |
the multivariate regression method to be used. If "model.frame", the model frame is returned. |
scale |
numeric vector, or logical. If numeric vector, X is scaled by dividing each variable with the corresponding element of scale. If scale is TRUE, X is scaled by dividing each variable by its sample standard deviation. If cross-validation is selected, scaling by the standard deviation is done for every segment. |
validation |
character. What kind of (internal) validation to use. See below. |
model |
a logical. If TRUE, the model frame is returned. |
x |
a logical. If TRUE, the model matrix is returned. |
y |
a logical. If TRUE, the response is returned. |
... |
additional arguments, passed to the underlying fit functions, and mvrCv. |
Plot a heatmap with colorbar.
myImagePlot(x, main, ...)
myImagePlot(x, main, ...)
x |
a |
main |
header text for the plot. |
... |
additional arguments (not implemented). |
Tahir Mehmood, Kristian Hovde Liland, Solve Sæbø.
T. Mehmood, K.H. Liland, L. Snipen, S. Sæbø, A review of variable selection methods in Partial Least Squares Regression, Chemometrics and Intelligent Laboratory Systems 118 (2012) 62-69.
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
myImagePlot(matrix(1:12,3,4), 'A header')
myImagePlot(matrix(1:12,3,4), 'A header')
A large collection of variable selection methods for use with
Partial Least Squares. These include all methods in Mehmood et al. 2012
and more. All functions treat numeric responses as regression and
factor responses as classification. Default classification is PLS + LDA,
but setDA()
can be used to choose PLS + QDA or PLS with response column maximization.
T. Mehmood, K.H. Liland, L. Snipen, S. Sæbø, A review of variable selection methods in Partial Least Squares Regression, Chemometrics and Intelligent Laboratory Systems 118 (2012) 62-69. T. Mehmood, S. Sæbø, K.H. Liland, Comparison of variable selection methods in partial least squares regression, Journal of Chemometrics 34 (2020) e3226.
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
A regularized variable elimination procedure for parsimonious variable selection, where also a stepwise elimination is carried out
rep_pls(y, X, ncomp = 5, ratio = 0.75, VIP.threshold = 0.5, N = 3)
rep_pls(y, X, ncomp = 5, ratio = 0.75, VIP.threshold = 0.5, N = 3)
y |
vector of response values ( |
X |
numeric predictor |
ncomp |
integer number of components (default = 5). |
ratio |
the proportion of the samples to use for calibration (default = 0.75). |
VIP.threshold |
thresholding to remove non-important variables (default = 0.5). |
N |
number of samples in the selection matrix (default = 3). |
A stability based variable selection procedure is adopted, where the samples have been split randomly into a predefined number of training and test sets. For each split, g, the following stepwise procedure is adopted to select the variables.
Returns a vector of variable numbers corresponding to the model having lowest prediction error.
Tahir Mehmood, Kristian Hovde Liland, Solve Sæbø.
T. Mehmood, H. Martens, S. Sæbø, J. Warringer, L. Snipen, A partial least squares based algorithm for parsimonious variable selection, Algorithms for Molecular Biology 6 (2011).
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
data(gasoline, package = "pls") ## Not run: with( gasoline, rep_pls(octane, NIR) ) ## End(Not run)
data(gasoline, package = "pls") ## Not run: with( gasoline, rep_pls(octane, NIR) ) ## End(Not run)
The default methods is LDA, but QDA and column of maximum prediction can be chosen.
setDA(LQ = NULL)
setDA(LQ = NULL)
LQ |
character argument 'lda', 'qda', 'max' or NULL |
Returns the default set method.
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
## Not run: setDA() # Query 'lda', 'qda' or 'max' setDA('qda') # Set default method to QDA ## End(Not run)
## Not run: setDA() # Query 'lda', 'qda' or 'max' setDA('qda') # Set default method to QDA ## End(Not run)
One of five filter methods can be chosen for repeated shaving of
a certain percentage of the worst performing variables. Performance of the
reduced models are stored and viewable through print
and plot
methods.
shaving( y, X, ncomp = 10, method = c("SR", "VIP", "sMC", "LW", "RC"), prop = 0.2, min.left = 2, comp.type = c("CV", "max"), validation = c("CV", 1), fixed = integer(0), newy = NULL, newX = NULL, segments = 10, plsType = "plsr", Y.add = NULL, ... ) ## S3 method for class 'shaved' plot(x, y, what = c("error", "spectra"), index = "min", log = "x", ...) ## S3 method for class 'shaved' print(x, ...)
shaving( y, X, ncomp = 10, method = c("SR", "VIP", "sMC", "LW", "RC"), prop = 0.2, min.left = 2, comp.type = c("CV", "max"), validation = c("CV", 1), fixed = integer(0), newy = NULL, newX = NULL, segments = 10, plsType = "plsr", Y.add = NULL, ... ) ## S3 method for class 'shaved' plot(x, y, what = c("error", "spectra"), index = "min", log = "x", ...) ## S3 method for class 'shaved' print(x, ...)
y |
vector of response values ( |
X |
numeric predictor |
ncomp |
integer number of components (default = 10). |
method |
filter method, i.e. SR, VIP, sMC, LW or RC given as |
prop |
proportion of variables to be removed in each iteration ( |
min.left |
minimum number of remaining variables. |
comp.type |
use number of components chosen by cross-validation, |
validation |
type of validation for |
fixed |
vector of indeces for compulsory/fixed variables that should always be included in the modelling. |
newy |
validation response for RMSEP/error computations. |
newX |
validation predictors for RMSEP/error computations. |
segments |
see |
plsType |
Type of PLS model, "plsr" or "cppls". |
Y.add |
Additional response for CPPLS, see |
... |
additional arguments for |
x |
object of class |
what |
plot type. Default = "error". Alternative = "spectra". |
index |
which iteration to plot. Default = "min"; corresponding to minimum RMSEP. |
log |
logarithmic x (default) or y scale. |
Variables are first sorted with respect to some importancemeasure, and usually one of the filter measures described above are used. Secondly, a threshold is used to eliminate a subset of the least informative variables. Then a model is fitted again to the remaining variables and performance is measured. The procedure is repeated until maximum model performance is achieved.
Returns a list object of class shaved
containing the method type,
the error, number of components, and number of variables per reduced model. It
also contains a list of all sets of reduced variable sets plus the original data.
Kristian Hovde Liland
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
data(mayonnaise, package = "pls") sh <- shaving(mayonnaise$design[,1], pls::msc(mayonnaise$NIR), type = "interleaved") pars <- par(mfrow = c(2,1), mar = c(4,4,1,1)) plot(sh) plot(sh, what = "spectra") par(pars) print(sh)
data(mayonnaise, package = "pls") sh <- shaving(mayonnaise$design[,1], pls::msc(mayonnaise$NIR), type = "interleaved") pars <- par(mfrow = c(2,1), mar = c(4,4,1,1)) plot(sh) plot(sh, what = "spectra") par(pars) print(sh)
Simulate multivariate normal data.
simulate_classes(p, n1, n2) simulate_data(dims, n1 = 150, n2 = 50)
simulate_classes(p, n1, n2) simulate_data(dims, n1 = 150, n2 = 50)
p |
integer number of variables. |
n1 |
integer number of samples in each of two classes in training/calibration data. |
n2 |
integer number of samples in each of two classes in test/validation data. |
dims |
a 10 element vector of group sizes. |
The class simulation is a straigh forward simulation of mulitvariate normal data into two classes for training and test data, respectively. The data simulation uses a strictly structured multivariate normal simulation for with continuous response data.
Returns a list of predictor and response data for training and testing.
Tahir Mehmood, Kristian Hovde Liland, Solve Sæbø.
T. Mehmood, K.H. Liland, L. Snipen, S. Sæbø, A review of variable selection methods in Partial Least Squares Regression, Chemometrics and Intelligent Laboratory Systems 118 (2012) 62-69. T. Mehmood, S. Sæbø, K.H. Liland, Comparison of variable selection methods in partial least squares regression, Journal of Chemometrics 34 (2020) e3226.
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
str(simulate_classes(5,4,4))
str(simulate_classes(5,4,4))
SwPA-PLS provides the influence of each variable without considering the influence of the rest of the variables through sub-sampling of samples and variables.
spa_pls(y, X, ncomp = 10, N = 3, ratio = 0.8, Qv = 10, SPA.threshold = 0.05)
spa_pls(y, X, ncomp = 10, N = 3, ratio = 0.8, Qv = 10, SPA.threshold = 0.05)
y |
vector of response values ( |
X |
numeric predictor |
ncomp |
integer number of components (default = 10). |
N |
number of Monte Carlo simulations (default = 3). |
ratio |
the proportion of the samples to use for calibration (default = 0.8). |
Qv |
integer number of variables to be sampled in each iteration (default = 10). |
SPA.threshold |
thresholding to remove non-important variables (default = 0.05). |
Returns a vector of variable numbers corresponding to the model having lowest prediction error.
Tahir Mehmood, Kristian Hovde Liland, Solve Sæbø.
H. Li, M. Zeng, B. Tan, Y. Liang, Q. Xu, D. Cao, Recipe for revealing informative metabolites based on model population analysis, Metabolomics 6 (2010) 353-361. http://code.google.com/p/spa2010/downloads/list.
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
data(gasoline, package = "pls") with( gasoline, spa_pls(octane, NIR) )
data(gasoline, package = "pls") with( gasoline, spa_pls(octane, NIR) )
A soft-thresholding step in PLS algorithm (ST-PLS) based on ideas from the nearest shrunken centroid method.
stpls(..., method = c("stpls", "model.frame"))
stpls(..., method = c("stpls", "model.frame"))
... |
arguments passed on to |
method |
choice between the default |
The ST-PLS approach is more or less identical to the Sparse-PLS presented independently by Lè Cao et al. This implementation is an expansion of code from the pls package.
Returns an object of class mvrV, simliar to to mvr object of the pls package.
Solve Sæbø, Tahir Mehmood, Kristian Hovde Liland.
S. Sæbø, T. Almøy, J. Aarøe, A.H. Aastveit, ST-PLS: a multi-dimensional nearest shrunken centroid type classifier via pls, Journal of Chemometrics 20 (2007) 54-62.
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
data(yarn, package = "pls") st <- stpls(density~NIR, ncomp=5, shrink=c(0.1,0.2), validation="CV", data=yarn) summary(st)
data(yarn, package = "pls") st <- stpls(density~NIR, ncomp=5, shrink=c(0.1,0.2), validation="CV", data=yarn) summary(st)
Adaptation of summary.mvr
from the pls
package v 2.4.3.
## S3 method for class 'mvrV' summary( object, what = c("all", "validation", "training"), digits = 4, print.gap = 2, ... )
## S3 method for class 'mvrV' summary( object, what = c("all", "validation", "training"), digits = 4, print.gap = 2, ... )
object |
an mvrV object |
what |
one of "all", "validation" or "training" |
digits |
integer. Minimum number of significant digits in the output. Default is 4. |
print.gap |
Integer. Gap between coloumns of the printed tables. |
... |
Other arguments sent to underlying methods. |
Variable selection based on the T^2 statistic. A side effect of running the selection is printing of tables and production of plots.
T2_pls(ytr, Xtr, yts, Xts, ncomp = 10, alpha = c(0.2, 0.15, 0.1, 0.05, 0.01))
T2_pls(ytr, Xtr, yts, Xts, ncomp = 10, alpha = c(0.2, 0.15, 0.1, 0.05, 0.01))
ytr |
Vector of responses for model training. |
Xtr |
Matrix of predictors for model training. |
yts |
Vector of responses for model testing. |
Xts |
Matrix of predictors for model testing. |
ncomp |
Number of PLS components. |
alpha |
Hotelling's T^2 significance levels. |
Parameters and variables corresponding to variable selections of minimum error and minimum variable set.
Tahir Mehmood, Hotelling T^2 based variable selection in partial least squares regression, Chemometrics and Intelligent Laboratory Systems 154 (2016), pp 23-28
data(gasoline, package = "pls") library(pls) if(interactive()){ t2 <- T2_pls(gasoline$octane[1:40], gasoline$NIR[1:40,], gasoline$octane[-(1:40)], gasoline$NIR[-(1:40),], ncomp = 10, alpha = c(0.2, 0.15, 0.1, 0.05, 0.01)) matplot(t(gasoline$NIR), type = 'l', col=1, ylab='intensity') points(t2$mv[[1]], colMeans(gasoline$NIR)[t2$mv[[1]]], col=2, pch='x') points(t2$mv[[2]], colMeans(gasoline$NIR)[t2$mv[[2]]], col=3, pch='o') }
data(gasoline, package = "pls") library(pls) if(interactive()){ t2 <- T2_pls(gasoline$octane[1:40], gasoline$NIR[1:40,], gasoline$octane[-(1:40)], gasoline$NIR[-(1:40),], ncomp = 10, alpha = c(0.2, 0.15, 0.1, 0.05, 0.01)) matplot(t(gasoline$NIR), type = 'l', col=1, ylab='intensity') points(t2$mv[[1]], colMeans(gasoline$NIR)[t2$mv[[1]]], col=2, pch='x') points(t2$mv[[2]], colMeans(gasoline$NIR)[t2$mv[[2]]], col=3, pch='o') }
Distribution based truncation for variable selection in subspace methods for multivariate regression.
truncation(..., Y.add, weights, method = "truncation")
truncation(..., Y.add, weights, method = "truncation")
... |
arguments passed on to |
Y.add |
optional additional response vector/matrix found in the input data. |
weights |
optional object weighting vector. |
method |
choice (default = |
Loading weights are truncated around their median based on confidence intervals
for modelling without replicates (Lenth et al.). The arguments passed to mvrV
include
all possible arguments to cppls
and the following truncation parameters
(with defaults) trunc.pow=FALSE, truncation=NULL, trunc.width=NULL, trunc.weight=0,
reorth=FALSE, symmetric=FALSE.
The default way of performing truncation involves the following parameter values: truncation="Lenth", trunc.width=0.95, indicating Lenth's confidence intervals (assymmetric), with a confidence of 95 shrinkage instead of a hard threshold. An alternative truncation strategy can be used with: truncation="quantile", in which a quantile line is used for detecting outliers/inliers.
Returns an object of class mvrV, simliar to to mvr object of the pls package.
Kristian Hovde Liland.
K.H. Liland, M. Høy, H. Martens, S. Sæbø: Distribution based truncation for variable selection in subspace methods for multivariate regression, Chemometrics and Intelligent Laboratory Systems 122 (2013) 103-111.
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
data(yarn, package = "pls") tr <- truncation(density ~ NIR, ncomp=5, data=yarn, validation="CV", truncation="Lenth", trunc.width=0.95) # Default truncation summary(tr)
data(yarn, package = "pls") tr <- truncation(density ~ NIR, ncomp=5, data=yarn, validation="CV", truncation="Lenth", trunc.width=0.95) # Default truncation summary(tr)
Various filter methods extracting and using information from
mvr
objects to assign importance to all included variables. Available
methods are Significance Multivariate Correlation (sMC), Selectivity Ratio (SR),
Variable Importance in Projections (VIP), Loading Weights (LW), Regression Coefficients (RC).
VIP(pls.object, opt.comp, p = dim(pls.object$coef)[1]) SR(pls.object, opt.comp, X) sMC(pls.object, opt.comp, X, alpha_mc = 0.05) LW(pls.object, opt.comp) RC(pls.object, opt.comp) URC(pls.object, opt.comp) FRC(pls.object, opt.comp) mRMR(pls.object, nsel, X)
VIP(pls.object, opt.comp, p = dim(pls.object$coef)[1]) SR(pls.object, opt.comp, X) sMC(pls.object, opt.comp, X, alpha_mc = 0.05) LW(pls.object, opt.comp) RC(pls.object, opt.comp) URC(pls.object, opt.comp) FRC(pls.object, opt.comp) mRMR(pls.object, nsel, X)
pls.object |
|
opt.comp |
optimal number of components of PLS model. |
p |
number of variables in PLS model. |
X |
data matrix used as predictors in PLS modelling. |
alpha_mc |
quantile significance for automatic selection of variables in |
nsel |
number of variables to select. |
From plsVarSel 0.9.10, the VIP method handles multiple responses correctly, as does the LW method. All other filter methods implemented in this package assume a single response and will give its results based on the first response in multi-response cases.
A vector having the same lenght as the number of variables in the associated PLS model. High values are associated with high importance, explained variance or relevance to the model.
The sMC has an attribute "quantile", which is the associated quantile of the F-distribution, which can be used as a cut-off for significant variables, similar to the cut-off of 1 associated with the VIP.
Tahir Mehmood, Kristian Hovde Liland, Solve Sæbø.
T. Mehmood, K.H. Liland, L. Snipen, S. Sæbø, A review of variable selection methods in Partial Least Squares Regression, Chemometrics and Intelligent Laboratory Systems 118 (2012) 62-69. T. Mehmood, S. Sæbø, K.H. Liland, Comparison of variable selection methods in partial least squares regression, Journal of Chemometrics 34 (2020) e3226.
VIP
(SR/sMC/LW/RC), filterPLSR
, shaving
,
stpls
, truncation
,
bve_pls
, ga_pls
, ipw_pls
, mcuve_pls
,
rep_pls
, spa_pls
,
lda_from_pls
, lda_from_pls_cv
, setDA
.
data(gasoline, package = "pls") library(pls) pls <- plsr(octane ~ NIR, ncomp = 10, validation = "LOO", data = gasoline) comp <- which.min(pls$validation$PRESS) X <- unclass(gasoline$NIR) vip <- VIP(pls, comp) sr <- SR (pls, comp, X) smc <- sMC(pls, comp, X) lw <- LW (pls, comp) rc <- RC (pls, comp) urc <- URC(pls, comp) frc <- FRC(pls, comp) mrm <- mRMR(pls, 401, X)$score matplot(scale(cbind(vip, sr, smc, lw, rc, urc, frc, mrm)), type = 'l')
data(gasoline, package = "pls") library(pls) pls <- plsr(octane ~ NIR, ncomp = 10, validation = "LOO", data = gasoline) comp <- which.min(pls$validation$PRESS) X <- unclass(gasoline$NIR) vip <- VIP(pls, comp) sr <- SR (pls, comp, X) smc <- sMC(pls, comp, X) lw <- LW (pls, comp) rc <- RC (pls, comp) urc <- URC(pls, comp) frc <- FRC(pls, comp) mrm <- mRMR(pls, 401, X)$score matplot(scale(cbind(vip, sr, smc, lw, rc, urc, frc, mrm)), type = 'l')
This implements the PLS-WVC2 component dependent version of WVC from Lin et al., i.e., using Equations 14, 16 and 19. The implementation is used in T. Mehmood, S. Sæbø, K.H. Liland, Comparison of variable selection methods in partial least squares regression, Journal of Chemometrics 34 (2020) e3226. However, there is a mistake in the notation in Mehmood et al. exchanging the denominator of Equation 19 (w'X'Xw) with (w'X'Yw).
WVC_pls(y, X, ncomp, normalize = FALSE, threshold = NULL)
WVC_pls(y, X, ncomp, normalize = FALSE, threshold = NULL)
y |
Vector of responses. |
X |
Matrix of predictors. |
ncomp |
Number of components. |
normalize |
Divide WVC vectors by maximum value. |
threshold |
Set loading weights smaller than threshold to 0 and recompute component. |
loading weights, loadings, regression coefficients, scores and Y-loadings plus the WVC weights.
Variable selection in partial least squares with the weighted variable contribution to the first singular value of the covariance matrix, Weilu Lin, Haifeng Hang, Yingping Zhuang, Siliang Zhang, Chemometrics and Intelligent Laboratory Systems 183 (2018) 113–121.
library(pls) data(mayonnaise, package = "pls") wvc <- WVC_pls(factor(mayonnaise$oil.type), mayonnaise$NIR, 10) wvcNT <- WVC_pls(factor(mayonnaise$oil.type), mayonnaise$NIR, 10, TRUE, 0.5) old.par <- par(mfrow=c(3,1), mar=c(2,4,1,1)) matplot(t(mayonnaise$NIR), type='l', col=1, ylab='intensity') matplot(wvc$W[,1:3], type='l', ylab='W') matplot(wvcNT$W[,1:3], type='l', ylab='W, thr.=0.5') par(old.par)
library(pls) data(mayonnaise, package = "pls") wvc <- WVC_pls(factor(mayonnaise$oil.type), mayonnaise$NIR, 10) wvcNT <- WVC_pls(factor(mayonnaise$oil.type), mayonnaise$NIR, 10, TRUE, 0.5) old.par <- par(mfrow=c(3,1), mar=c(2,4,1,1)) matplot(t(mayonnaise$NIR), type='l', col=1, ylab='intensity') matplot(wvc$W[,1:3], type='l', ylab='W') matplot(wvcNT$W[,1:3], type='l', ylab='W, thr.=0.5') par(old.par)