EnsembleOptimizer#
- class trainstation.EnsembleOptimizer(fit_data, fit_method='least-squares', standardize=True, ensemble_size=50, train_size=1.0, bootstrap=True, check_condition=True, seed=42, **kwargs)[source]#
The ensemble optimizer carries out a series of single optimization runs using the
Optimizer
class in order to solve the linear \(\boldsymbol{A}\boldsymbol{x} = \boldsymbol{y}\) problem. Subsequently, it provides access to various ensemble averaged quantities such as errors and parameters.Warning
Repeatedly setting up a
EnsembleOptimizer
and training without changing the seed for the random number generator will yield identical or correlated results, to avoid this please specify a different seed when setting up multipleEnsembleOptimizer
instances.- Parameters:
fit_data (tuple(numpy.ndarray, numpy.ndarray)) – the first element of the tuple represents the fit matrix
A
(N, M
array) while the second element represents the vector of target valuesy
(N
array); hereN
(=rows ofA
, elements ofy
) equals the number of target values andM
(=columns ofA
) equals the number of parametersfit_method (str) – method to be used for training; possible choice are “ardr”, “bayesian-ridge”, “elasticnet”, “lasso”, “least-squares”, “omp”, “rfe”, “ridge”, “split-bregman”
standardize (bool) – if True the fit matrix and target values are standardized before fitting, meaning columns in the fit matrix and th target values are rescaled to have a standard deviation of 1.0.
ensemble_size (int) – number of fits in the ensemble
train_size (float or int) – if float represents the fraction of
fit_data
(rows) to be used for training; if int, represents the absolute number of rows to be used for trainingbootstrap (bool) – if True sampling will be carried out with replacement
check_condition (bool) – if True the condition number will be checked (this can be sligthly more time consuming for larger matrices)
seed (int) – seed for pseudo random number generator
- property bootstrap: bool#
True if sampling is carried out with replacement
- property ensemble_size: int#
Number of train rounds
- property error_matrix: ndarray#
Matrix of fit errors where
N
is the number of target values andM
is the number of fits (i.e., the size of the ensemble)
- property fit_method: str#
Fit method
- get_contributions(A)#
Returns the average contribution for each row of
A
to the predicted values from each element of the parameter vector.
- property n_nonzero_parameters: int#
Number of non-zero parameters
- property n_parameters: int#
Number of parameters (=columns in
A
matrix)
- property n_target_values: int#
Number of target values (=rows in
A
matrix)
- property parameters_norm: float#
Norm of the parameter vector
- predict(A, return_std=False)[source]#
Predicts data given an input matrix \(\boldsymbol{A}\), i.e., \(\boldsymbol{A}\boldsymbol{x}\), where \(\boldsymbol{x}\) is the vector of the fitted parameters. The method returns the vector of predicted values and optionally also the vector of standard deviations.
By using all parameter vectors in the ensemble a standard deviation of the prediction can be obtained.
- Parameters:
A (
ndarray
) – fit matrix whereN
(=rows ofA
, elements ofy
) equals the number of target values andM
(=columns ofA
) equals the number of parametersreturn_std (
bool
) – whether or not to return the standard deviation of the prediction
- Return type:
- property rmse_test: float#
Ensemble average of root mean squared error over test sets
- property rmse_test_splits: ndarray#
Root mean squared test errors obtained during for each fit in ensemble
- property rmse_train: float#
Ensemble average of root mean squared error over train sets
- property rmse_train_splits: ndarray#
Root mean squared train errors obtained during for each fit in ensemble
- property seed: int#
Seed used to initialize pseudo random number generator
- property standardize: bool#
If True standardize the fit matrix before fitting
- property summary: Dict[str, Any]#
Comprehensive information about the optimizer
- train()[source]#
Carries out ensemble training and construct the final model by averaging over all models in the ensemble.
- Return type:
None
- property train_size: int#
Number of rows included in train sets; note that this will be different from the number of unique rows if boostrapping
- write_summary(fname)#
Writes summary dict to file.
- Return type:
None