Available regression methods#

The following linear regression methods are supported via the trainstation interface. They can be selected using the fit_method keyword of the available optimizers.

Ordinary Least-Squares (OLS)
Least-Squares with regularization matrix
Least Absolute Shrinkage and Selection Operator (LASSO)
Adaptive-LASSO
Ridge and Bayesian-ridge
Elasticnet
Recursive Feature Elimination (RFE)
Automatic Relevance Determination Regression (ARDR)
Fitting using Orthogonal Matching Pursuit (OMP)
L1-regularization with split-Bregman

The most commonly used fit methods for constructing cluster and force constant expansion are automatic relevance determination regression (ARDR), recursive feature elimination with \(\ell_2\)-fitting (RFE-L2), LASSO as well as ordinary least-squares optimization (OLS). Below follows a short summary of the main algorithms. More information about the available linear models can be found in the scikit-learn documentation.

Least-squares#

Ordinary least-squares (OLS) optimization is providing a solution to the linear problem

\[\boldsymbol{A}\boldsymbol{x} = \boldsymbol{y},\]

where \(\boldsymbol{A}\) is the sensing matrix, \(\boldsymbol{y}\) is the vector of target values, and \(\boldsymbol{x}\) is the solution (parameter vector) that one seeks to obtain. The objective is given by

\[\left\Vert\boldsymbol{A}\boldsymbol{x} - \boldsymbol{y}\right\Vert^2_2\]

The OLS method is chosen by setting the fit_method keyword to least-squares.

Least-squares with regularization matrix#

Similar to OLS, least-squares with regularization matrix optimization solves the linear problem

\[\boldsymbol{A}\boldsymbol{x} = \boldsymbol{y},\]

with the difference that in this case, an explicit regularization matrix \(\boldsymbol{\Lambda}\) is present such that the objective becomes

\[\left(\boldsymbol{y}-\boldsymbol{A}\boldsymbol{x} \right)' \left(\boldsymbol{y}-\boldsymbol{A}\boldsymbol{x} \right) + \boldsymbol{x}' \Lambda \boldsymbol{x}.\]

From a Bayesian perspective, the regularization matrix \(\boldsymbol{\Lambda}\) can be thought of as the inverse of the covariance matrix for the prior probability of the features [MueCed09]. It can be used to scale and couple features based on prior beliefs.

The least-squares with regularization matrix method is chosen by setting the fit_method keyword to least-squares-with-reg-matrix. The regularization matrix is set via the reg_matrix keyword. If no matrix is specified, all elements will be set to zero and the OLS result is recovered. Note that if standardization is used, the regularization matrix should still be designed with respect to the input sensing matrix and not the standardized on.

Parameter	Type	Description	Default
`reg_matrix`	`np.ndarray`	regularization matrix	`None`

LASSO#

The least absolute shrinkage and selection operator (LASSO) is a method for performing variable selection and regularization in problems in statistics and machine learning. The optimization objective is given by

\[\frac{1}{2 n_\text{samples}} \left\Vert\boldsymbol{A}\boldsymbol{x} - \boldsymbol{y}\right\Vert^2_2 + \alpha \Vert\boldsymbol{x}\Vert_1.\]

While the first term ensures that \(\boldsymbol{x}\) is a solution to the linear problem at hand, the second term introduces regularization and guides the algorithm toward finding sparse solutions, in the spirit of compressive sensing. In general, LASSO is suited for solving strongly underdetermined problems.

The LASSO optimizer is chosen by setting the fit_method keyword to lasso. The \(\alpha\) parameter is set via the alpha keyword. If no value is specified a line scan will be carried out automatically to determine the optimal value.

Parameter	Type	Description	Default
`alpha`	`float`	controls the sparsity of the solution vector	`None`

Automatic relevance determination regression (ARDR)#

Automatic relevance determination regression (ARDR) is an optimization algorithm provided by scikit-learn that is similar to Bayesian Ridge Regression, which provides a probabilistic model of the regression problem at hand. The method is also known as Sparse Bayesian Learning and Relevance Vector Machine.

The ARDR optimizer is chosen by setting the fit_method keyword to ardr. The threshold lambda parameter, which controls the sparsity of the solution vector, is set via the threshold_lambda keyword (default: 1e4).

Parameter	Type	Description	Default
`threshold_lambda`	`float`	controls the sparsity of the solution vector	`1e4`

Split-Bregman#

The split-Bregman method [GolOsh09] is designed to solve a broad class of \(\ell_1\)-regularized problems. The solution vector \(\boldsymbol{x}\) is given by

\[\boldsymbol{x} = \arg\min_{\boldsymbol{x}, \boldsymbol{d}} \left\Vert\boldsymbol{d}\right\Vert_1 + \frac{1}{2} \left\Vert\boldsymbol{A}\boldsymbol{x} - \boldsymbol{y}\right\Vert^2 + \frac{\lambda}{2} \left\Vert\boldsymbol{d} - \mu \boldsymbol{x} \right\Vert^2,\]

where \(\boldsymbol{d}\) is an auxiliary quantity, while \(\mu\) and \(\lambda\) are hyperparameters that control the sparseness of the solution and the efficiency of the algorithm.

The approach can be optimized by addition of a preconditioning step [ZhoSadAbe19]. This speed-up enables efficient hyperparameter optimization of \(\mu\) values. By default, the split-bregman fit method will trial a range of \(\mu\) values and choose the optimal based on cross validation.

The split-Bregman implementation supports the following keywords.

Parameter	Type	Description	Default
`mu`	`float`	sparseness parameter	`None`
`lmbda`	`float`	weight of additional L2-norm in split-Bregman	`3`
`n_iters`	`int`	maximal number of split-Bregman iterations	`1000`
`tol`	`float`	convergence criterion of iterative minimization	`1e-4`
`cg_tol`	`float`	convergence criterion of conjugate gradient step	`1e-1`
`cg_n_iters`	`float`	maximal number of conjugate gradient iterations	`None`
`cv_splits`	`int`	number of CV splits for finding optimal mu value	`5`
`iprint`	`int`	how often to print fitting information to stdout	`False`

Recursive feature elimination#

Recursive feature elimination (RFE) is a feature selection algorithm that obtains the optimal features by carrying out a series of fits, starting with the full set of parameters and then iteratively eliminating the less important ones. RFE needs to be combined with a specific fit method. Since RFE may require many hundreds of single fits its often advisable to use ordinary least-squares as training method, which is the default behavior. The present implementation is based on the implementation of feature selection in scikit-learn.

The RFE optimizer is chosen by setting the fit_method keyword to rfe. The n_features keyword allows one to specify the number of features to select. If this parameter is left unspecified RFE with cross-validation will be used to determine the optimal number of features.

After the optimal number of features has been determined the final model is trained. The fit method for the final fit can be controlled via final_estimator. Here, estimator and final_estimator can be set to any of the fit methods described in this section. For example, estimator='lasso' implies that a LASSO CV scan is carried out for each fit in the RFE algorithm.

Parameter	Type	Description	Default
`n_features`	`int`	number of features to select	`None`
`step`	`int`	number parameters to eliminate
	`float`	percentage of parameters to eliminate	`0.04`
`cv_splits`	`int`	number of CV splits (90/10) used when optimizing `n_features`	`5`
`estimator`	`str`	fit method to be used in RFE algorithm	`'least-squares'`
`final_estimator`	`str`	fit method to be used in the final fit	= `estimator`
`estimator_kwargs`	`dict`	keyword arguments for fit method defined by `estimator`	`{}`
`final_estimator_kwargs`	`dict`	keyword arguments for fit method defined by `final_estimator`	`{}`

Note

When running on multi-core systems please be mindful of memory consumption. By default all CPUs will be used (n_jobs=-1), which will duplicate data and can require a lot of memory, potentially giving rise to errors. To prevent this behavior you can set the n_jobs parameter explicitly, which is handed over directly to scikit-learn.

Other methods#

The optimizers furthermore support the ridge method (ridge), the elastic net method (elasticnet) as well as Bayesian ridge regression (bayesian-ridge).