Available regression methods#

The following linear regression methods are supported via the trainstation interface. They can be selected using the fit_method keyword of the available optimizers.

  • Ordinary Least-Squares (OLS)

  • Least-Squares with regularization matrix

  • Least Absolute Shrinkage and Selection Operator (LASSO)

  • Adaptive-LASSO

  • Ridge and Bayesian-ridge

  • Elasticnet

  • Recursive Feature Elimination (RFE)

  • Automatic Relevance Determination Regression (ARDR)

  • Fitting using Orthogonal Matching Pursuit (OMP)

  • L1-regularization with split-Bregman

The most commonly used fit methods for constructing cluster and force constant expansion are automatic relevance determination regression (ARDR), recursive feature elimination with \(\ell_2\)-fitting (RFE-L2), LASSO as well as ordinary least-squares optimization (OLS). Below follows a short summary of the main algorithms. More information about the available linear models can be found in the scikit-learn documentation.

Least-squares#

Ordinary least-squares (OLS) optimization is providing a solution to the linear problem

\[\boldsymbol{A}\boldsymbol{x} = \boldsymbol{y},\]

where \(\boldsymbol{A}\) is the sensing matrix, \(\boldsymbol{y}\) is the vector of target values, and \(\boldsymbol{x}\) is the solution (parameter vector) that one seeks to obtain. The objective is given by

\[\left\Vert\boldsymbol{A}\boldsymbol{x} - \boldsymbol{y}\right\Vert^2_2\]

The OLS method is chosen by setting the fit_method keyword to least-squares.

Least-squares with regularization matrix#

Similar to OLS, least-squares with regularization matrix optimization solves the linear problem

\[\boldsymbol{A}\boldsymbol{x} = \boldsymbol{y},\]

with the difference that in this case, an explicit regularization matrix \(\boldsymbol{\Lambda}\) is present such that the objective becomes

\[\left(\boldsymbol{y}-\boldsymbol{A}\boldsymbol{x} \right)' \left(\boldsymbol{y}-\boldsymbol{A}\boldsymbol{x} \right) + \boldsymbol{x}' \Lambda \boldsymbol{x}.\]

From a Bayesian perspective, the regularization matrix \(\boldsymbol{\Lambda}\) can be thought of as the inverse of the covariance matrix for the prior probability of the features [MueCed09]. It can be used to scale and couple features based on prior beliefs.

The least-squares with regularization matrix method is chosen by setting the fit_method keyword to least-squares-with-reg-matrix. The regularization matrix is set via the reg_matrix keyword. If no matrix is specified, all elements will be set to zero and the OLS result is recovered. Note that if standardization is used, the regularization matrix should still be designed with respect to the input sensing matrix and not the standardized on.

Parameter

Type

Description

Default

reg_matrix

np.ndarray

regularization matrix

None

LASSO#

The least absolute shrinkage and selection operator (LASSO) is a method for performing variable selection and regularization in problems in statistics and machine learning. The optimization objective is given by

\[\frac{1}{2 n_\text{samples}} \left\Vert\boldsymbol{A}\boldsymbol{x} - \boldsymbol{y}\right\Vert^2_2 + \alpha \Vert\boldsymbol{x}\Vert_1.\]

While the first term ensures that \(\boldsymbol{x}\) is a solution to the linear problem at hand, the second term introduces regularization and guides the algorithm toward finding sparse solutions, in the spirit of compressive sensing. In general, LASSO is suited for solving strongly underdetermined problems.

The LASSO optimizer is chosen by setting the fit_method keyword to lasso. The \(\alpha\) parameter is set via the alpha keyword. If no value is specified a line scan will be carried out automatically to determine the optimal value.

Parameter

Type

Description

Default

alpha

float

controls the sparsity of the solution vector

None

Automatic relevance determination regression (ARDR)#

Automatic relevance determination regression (ARDR) is an optimization algorithm provided by scikit-learn that is similar to Bayesian Ridge Regression, which provides a probabilistic model of the regression problem at hand. The method is also known as Sparse Bayesian Learning and Relevance Vector Machine.

The ARDR optimizer is chosen by setting the fit_method keyword to ardr. The threshold lambda parameter, which controls the sparsity of the solution vector, is set via the threshold_lambda keyword (default: 1e4).

Parameter

Type

Description

Default

threshold_lambda

float

controls the sparsity of the solution vector

1e4

Split-Bregman#

The split-Bregman method [GolOsh09] is designed to solve a broad class of \(\ell_1\)-regularized problems. The solution vector \(\boldsymbol{x}\) is given by

\[\boldsymbol{x} = \arg\min_{\boldsymbol{x}, \boldsymbol{d}} \left\Vert\boldsymbol{d}\right\Vert_1 + \frac{1}{2} \left\Vert\boldsymbol{A}\boldsymbol{x} - \boldsymbol{y}\right\Vert^2 + \frac{\lambda}{2} \left\Vert\boldsymbol{d} - \mu \boldsymbol{x} \right\Vert^2,\]

where \(\boldsymbol{d}\) is an auxiliary quantity, while \(\mu\) and \(\lambda\) are hyperparameters that control the sparseness of the solution and the efficiency of the algorithm.

The approach can be optimized by addition of a preconditioning step [ZhoSadAbe19]. This speed-up enables efficient hyperparameter optimization of \(\mu\) values. By default, the split-bregman fit method will trial a range of \(\mu\) values and choose the optimal based on cross validation.

The split-Bregman implementation supports the following keywords.

Parameter

Type

Description

Default

mu

float

sparseness parameter

None

lmbda

float

weight of additional L2-norm in split-Bregman

3

n_iters

int

maximal number of split-Bregman iterations

1000

tol

float

convergence criterion of iterative minimization

1e-4

cg_tol

float

convergence criterion of conjugate gradient step

1e-1

cg_n_iters

float

maximal number of conjugate gradient iterations

None

cv_splits

int

number of CV splits for finding optimal mu value

5

iprint

int

how often to print fitting information to stdout

False

Recursive feature elimination#

Recursive feature elimination (RFE) is a feature selection algorithm that obtains the optimal features by carrying out a series of fits, starting with the full set of parameters and then iteratively eliminating the less important ones. RFE needs to be combined with a specific fit method. Since RFE may require many hundreds of single fits its often advisable to use ordinary least-squares as training method, which is the default behavior. The present implementation is based on the implementation of feature selection in scikit-learn.

The RFE optimizer is chosen by setting the fit_method keyword to rfe. The n_features keyword allows one to specify the number of features to select. If this parameter is left unspecified RFE with cross-validation will be used to determine the optimal number of features.

After the optimal number of features has been determined the final model is trained. The fit method for the final fit can be controlled via final_estimator. Here, estimator and final_estimator can be set to any of the fit methods described in this section. For example, estimator='lasso' implies that a LASSO CV scan is carried out for each fit in the RFE algorithm.

Parameter

Type

Description

Default

n_features

int

number of features to select

None

step

int

number parameters to eliminate

float

percentage of parameters to eliminate

0.04

cv_splits

int

number of CV splits (90/10) used when optimizing n_features

5

estimator

str

fit method to be used in RFE algorithm

'least-squares'

final_estimator

str

fit method to be used in the final fit

= estimator

estimator_kwargs

dict

keyword arguments for fit method defined by estimator

{}

final_estimator_kwargs

dict

keyword arguments for fit method defined by final_estimator

{}

Note

When running on multi-core systems please be mindful of memory consumption. By default all CPUs will be used (n_jobs=-1), which will duplicate data and can require a lot of memory, potentially giving rise to errors. To prevent this behavior you can set the n_jobs parameter explicitly, which is handed over directly to scikit-learn.

Other methods#

The optimizers furthermore support the ridge method (ridge), the elastic net method (elasticnet) as well as Bayesian ridge regression (bayesian-ridge).