EnsembleOptimizer#

class trainstation.EnsembleOptimizer(fit_data, fit_method='least-squares', standardize=True, ensemble_size=50, train_size=1.0, bootstrap=True, check_condition=True, seed=42, **kwargs)[source]#

The ensemble optimizer carries out a series of single optimization runs using the Optimizer class in order to solve the linear \(\boldsymbol{A}\boldsymbol{x} = \boldsymbol{y}\) problem. Subsequently, it provides access to various ensemble averaged quantities such as errors and parameters.

Warning

Repeatedly setting up a EnsembleOptimizer and training without changing the seed for the random number generator will yield identical or correlated results, to avoid this please specify a different seed when setting up multiple EnsembleOptimizer instances.

Parameters:
  • fit_data (tuple(numpy.ndarray, numpy.ndarray)) – the first element of the tuple represents the fit matrix A (N, M array) while the second element represents the vector of target values y (N array); here N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters

  • fit_method (str) – method to be used for training; possible choice are “ardr”, “bayesian-ridge”, “elasticnet”, “lasso”, “least-squares”, “omp”, “rfe”, “ridge”, “split-bregman”

  • standardize (bool) – if True the fit matrix and target values are standardized before fitting, meaning columns in the fit matrix and th target values are rescaled to have a standard deviation of 1.0.

  • ensemble_size (int) – number of fits in the ensemble

  • train_size (float or int) – if float represents the fraction of fit_data (rows) to be used for training; if int, represents the absolute number of rows to be used for training

  • bootstrap (bool) – if True sampling will be carried out with replacement

  • check_condition (bool) – if True the condition number will be checked (this can be sligthly more time consuming for larger matrices)

  • seed (int) – seed for pseudo random number generator

property bootstrap: bool#

True if sampling is carried out with replacement

property ensemble_size: int#

Number of train rounds

property error_matrix: ndarray#

Matrix of fit errors where N is the number of target values and M is the number of fits (i.e., the size of the ensemble)

property fit_method: str#

Fit method

get_contributions(A)#

Returns the average contribution for each row of A to the predicted values from each element of the parameter vector.

Parameters:

A (ndarray) – fit matrix where N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters

Return type:

ndarray

property n_nonzero_parameters: int#

Number of non-zero parameters

property n_parameters: int#

Number of parameters (=columns in A matrix)

property n_target_values: int#

Number of target values (=rows in A matrix)

property parameters: ndarray#

Copy of parameter vector

property parameters_norm: float#

Norm of the parameter vector

property parameters_splits: List[ndarray]#

All parameters vectors in the ensemble

property parameters_std: ndarray#

Standard deviation for each parameter

predict(A, return_std=False)[source]#

Predicts data given an input matrix \(\boldsymbol{A}\), i.e., \(\boldsymbol{A}\boldsymbol{x}\), where \(\boldsymbol{x}\) is the vector of the fitted parameters. The method returns the vector of predicted values and optionally also the vector of standard deviations.

By using all parameter vectors in the ensemble a standard deviation of the prediction can be obtained.

Parameters:
  • A (ndarray) – fit matrix where N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters

  • return_std (bool) – whether or not to return the standard deviation of the prediction

Return type:

Union[ndarray, Tuple[ndarray, ndarray]]

property rmse_test: float#

Ensemble average of root mean squared error over test sets

property rmse_test_splits: ndarray#

Root mean squared test errors obtained during for each fit in ensemble

property rmse_train: float#

Ensemble average of root mean squared error over train sets

property rmse_train_splits: ndarray#

Root mean squared train errors obtained during for each fit in ensemble

property seed: int#

Seed used to initialize pseudo random number generator

property standardize: bool#

If True standardize the fit matrix before fitting

property summary: Dict[str, Any]#

Comprehensive information about the optimizer

train()[source]#

Carries out ensemble training and construct the final model by averaging over all models in the ensemble.

Return type:

None

property train_size: int#

Number of rows included in train sets; note that this will be different from the number of unique rows if boostrapping

write_summary(fname)#

Writes summary dict to file.

Return type:

None