model_selection – peshbeen

Hyperparameters tuning methods for Univariate machine learning models

SplitTimeSeries


def SplitTimeSeries(
    n_splits:int, # Number of splits to generate.
    test_size:int, # The number of samples in each test set.
    step_size:Optional=None, # The number of samples to move the test set forward for each split. If None, it defaults to the test_size, meaning non-overlapping test sets.
)->None:

A time series cross-validator that generates train/test splits with a fixed test size and a configurable step size.

	Type	Default	Details
n_splits	int		Number of splits to generate.
test_size	int		The number of samples in each test set.
step_size	Union	None	The number of samples to move the test set forward for each split. If None, it defaults to the test_size, meaning non-overlapping test sets.
Returns	None

source

hyperopt_tune


def hyperopt_tune(
    model:object, # Forecasting model object with .fit and .forecast methods and relevant attributes.
    df:DataFrame, # Time series data with a datetime index and a target column and optionally exogenous features.
    cv_split:int, # Number of cross-validation splits.
    test_size:int, # Number of samples in each test set. For ml_direct_forecaster, this will be overridden to be the maximum horizon in model.H.
    eval_metric:Callable, # Evaluation metric function.
    param_space:dict, # Hyperparameter search space for the forecasting model.
    step_size:int=None, # Step size to move the test window forward in each split.
    eval_num:int=100, # Number of hyperparameter combinations to evaluate. Default is 100.
    verbose:bool=False, # Whether to print the evaluation metric for each hyperparameter combination. Default is False.
)->Tuple: # A tuple containing the best hyperparameters, selected lags, and selected transforms.

Tune forecasting model hyperparameters using time series cross-validation and hyperopt.

	Type	Default	Details
model	object		Forecasting model object with .fit and .forecast methods and relevant attributes.
df	DataFrame		Time series data with a datetime index and a target column and optionally exogenous features.
cv_split	int		Number of cross-validation splits.
test_size	int		Number of samples in each test set. For ml_direct_forecaster, this will be overridden to be the maximum horizon in model.H.
eval_metric	Callable		Evaluation metric function.
param_space	dict		Hyperparameter search space for the forecasting model.
step_size	int	None	Step size to move the test window forward in each split.
eval_num	int	100	Number of hyperparameter combinations to evaluate. Default is 100.
verbose	bool	False	Whether to print the evaluation metric for each hyperparameter combination. Default is False.
Returns	Tuple		A tuple containing the best hyperparameters, selected lags, and selected transforms.

other_

{'box_cox': 0.7756701052941546, 'box_cox_biasadj': False}

source

optuna_tune


def optuna_tune(
    model:object, # Forecasting model with .fit and .forecast methods.
    df:DataFrame, # Time series data (datetime index, target column, optional exogenous features).
    cv_split:int, # Number of cross-validation splits.
    test_size:int, # Number of samples in each test fold. For ml_direct_forecaster, this will be overridden to be the maximum horizon in model.H.
    eval_metric:Callable, # Metric function to minimise.
    param_space:Dict, # Each value must be a callable that accepts an Optuna `trial` and returns a value.
    step_size:int=None, # Step size between CV folds.
    eval_num:int=100, # Number of Optuna trials. Default 100.
    verbose:bool=False, # Print score for every trial. Default False.
)->Tuple: # Best hyperparameters and best lags (if 'lags' is in param_space).

Tune forecasting model hyperparameters using time series cross-validation and Optuna.

	Type	Default	Details
model	object		Forecasting model with .fit and .forecast methods.
df	DataFrame		Time series data (datetime index, target column, optional exogenous features).
cv_split	int		Number of cross-validation splits.
test_size	int		Number of samples in each test fold. For ml_direct_forecaster, this will be overridden to be the maximum horizon in model.H.
eval_metric	Callable		Metric function to minimise.
param_space	Dict		Each value must be a callable that accepts an Optuna `trial` and returns a value.
step_size	int	None	Step size between CV folds.
eval_num	int	100	Number of Optuna trials. Default 100.
verbose	bool	False	Print score for every trial. Default False.
Returns	Tuple		Best hyperparameters and best lags (if ‘lags’ is in param_space).

Hyperparameters tuning methods for Multivariate machine learning models

source

mv_hyperopt_tune


def mv_hyperopt_tune(
    model:object, # Forecasting model object with .fit and .forecast methods and relevant attributes.
    df:DataFrame, # Time series data with a datetime index and a target column and optionally exogenous features.
    target_col:str, # Name of the target column to minimize the evaluation metric on.
    cv_split:int, # Number of cross-validation splits.
    test_size:int, # Number of samples in each test set.
    eval_metric:Callable, # Evaluation metric function.
    param_space:dict, # Hyperparameter search space for the forecasting model.
    step_size:int=None, # Step size to move the test window forward in each split.
    eval_num:int=100, # Number of hyperparameter combinations to evaluate. Default is 100.
    verbose:bool=False, # Whether to print the evaluation metric for each hyperparameter combination. Default is False.
)->Tuple: # A tuple containing the best hyperparameters, selected lags, and selected transforms.

Tune forecasting model hyperparameters using time series cross-validation and hyperopt for multivariate models.

	Type	Default	Details
model	object		Forecasting model object with .fit and .forecast methods and relevant attributes.
df	DataFrame		Time series data with a datetime index and a target column and optionally exogenous features.
target_col	str		Name of the target column to minimize the evaluation metric on.
cv_split	int		Number of cross-validation splits.
test_size	int		Number of samples in each test set.
eval_metric	Callable		Evaluation metric function.
param_space	dict		Hyperparameter search space for the forecasting model.
step_size	int	None	Step size to move the test window forward in each split.
eval_num	int	100	Number of hyperparameter combinations to evaluate. Default is 100.
verbose	bool	False	Whether to print the evaluation metric for each hyperparameter combination. Default is False.
Returns	Tuple		A tuple containing the best hyperparameters, selected lags, and selected transforms.

source

mv_optuna_tune


def mv_optuna_tune(
    model:object, # Forecasting model with .fit and .forecast methods.
    df:DataFrame, # Time series data (datetime index, target column, optional exogenous features).
    target_col:str, # Name of the target column to minimize the evaluation metric on.
    cv_split:int, # Number of cross-validation splits.
    test_size:int, # Number of samples in each test fold.
    eval_metric:Callable, # Metric function to minimise.
    param_space:Dict, # Each value must be a callable that accepts an Optuna `trial` and returns a value.
    step_size:int=None, # Step size between CV folds.
    eval_num:int=100, # Number of Optuna trials. Default 100.
    verbose:bool=False, # Print score for every trial. Default False.
)->Tuple: # Best hyperparameters and best lags (if 'lags' is in param_space).

Tune forecasting model hyperparameters using time series cross-validation and Optuna.

	Type	Default	Details
model	object		Forecasting model with .fit and .forecast methods.
df	DataFrame		Time series data (datetime index, target column, optional exogenous features).
target_col	str		Name of the target column to minimize the evaluation metric on.
cv_split	int		Number of cross-validation splits.
test_size	int		Number of samples in each test fold.
eval_metric	Callable		Metric function to minimise.
param_space	Dict		Each value must be a callable that accepts an Optuna `trial` and returns a value.
step_size	int	None	Step size between CV folds.
eval_num	int	100	Number of Optuna trials. Default 100.
verbose	bool	False	Print score for every trial. Default False.
Returns	Tuple		Best hyperparameters and best lags (if ‘lags’ is in param_space).

Feature selection methods for univariate time series models

source

forward_feature_selection


def forward_feature_selection(
    model:object, # A *configured but unfitted* [`ml_forecaster`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster) instance.  The function works exclusively on deep copies and never mutates the object passed in.
    df:DataFrame, # Full training DataFrame. Must contain the target column and any candidate exogenous columns.
    cv_split:int, # Number of time-series cross-validation folds.
    H:int, # Forecast horizon (test window size for each fold).
    step_size:Optional=None, # Step size between consecutive CV folds.  If `None` (default) the step equals `H`, producing non-overlapping folds — consistent with the default behaviour of [`ml_forecaster.cross_validate`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster.cross_validate).
    metrics:Union=None, # One or more metric functions accepted by [`ml_forecaster.cross_validate`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster.cross_validate) (e.g. `[MAE, RMSE]`). Selection is driven by the **first** metric in the list; a candidate is only accepted when it improves **all** metrics simultaneously.
    lags_to_consider:Optional=None, # Consider lags `1, 2, ..., lags_to_consider` as candidates.  If `None`, lag selection is skipped.
    candidate_features:Optional=None, # Column names in `df` that are exogenous feature candidates.  The function never modifies this list.  If `None`, exogenous feature selection is skipped.
    transformations:Optional=None, # Lag-transform objects to test as candidates (e.g. `[rolling_mean(3, 1), expanding_std(1)]`).  The function never modifies this list.  If `None`, transform selection is skipped.
    starting_lags:Optional=None, # Lags to include in the initial feature set before the search begins. These are *not* candidates — they are always included.  Must be a list (e.g. `[1]` or `[1, 2, 3]`).
    starting_transforms:Optional=None, # Lag-transform objects to include in the initial feature set before the search begins.  Must be a list.
    best_start_score:Optional=None, # Initial best scores for each metric. If not provided, the function will compute the baseline score using the model with the starting features (if any) before beginning the search.
    verbose:bool=False, # Print a message each time a candidate is accepted.
): # A dictionary with keys `best_lags`, `best_exogs`, and `best_transforms` containing the selected features.

Forward stepwise feature selection for ml_forecaster models.

At each iteration every remaining candidate (lag, exogenous column, or lag-transform) is tested individually by adding it to the current best feature set. The candidate that produces the largest cross-validation improvement is permanently added. The loop continues until no remaining candidate improves any of the evaluation metrics.

	Type	Default	Details
model	object		A configured but unfitted `ml_forecaster` instance. The function works exclusively on deep copies and never mutates the object passed in.
df	DataFrame		Full training DataFrame. Must contain the target column and any candidate exogenous columns.
cv_split	int		Number of time-series cross-validation folds.
H	int		Forecast horizon (test window size for each fold).
step_size	Union	None	Step size between consecutive CV folds. If `None` (default) the step equals `H`, producing non-overlapping folds — consistent with the default behaviour of `ml_forecaster.cross_validate`.
metrics	Union	None	One or more metric functions accepted by `ml_forecaster.cross_validate` (e.g. `[MAE, RMSE]`). Selection is driven by the first metric in the list; a candidate is only accepted when it improves all metrics simultaneously.
lags_to_consider	Union	None	Consider lags `1, 2, ..., lags_to_consider` as candidates. If `None`, lag selection is skipped.
candidate_features	Union	None	Column names in `df` that are exogenous feature candidates. The function never modifies this list. If `None`, exogenous feature selection is skipped.
transformations	Union	None	Lag-transform objects to test as candidates (e.g. `[rolling_mean(3, 1), expanding_std(1)]`). The function never modifies this list. If `None`, transform selection is skipped.
starting_lags	Union	None	Lags to include in the initial feature set before the search begins. These are not candidates — they are always included. Must be a list (e.g. `[1]` or `[1, 2, 3]`).
starting_transforms	Union	None	Lag-transform objects to include in the initial feature set before the search begins. Must be a list.
best_start_score	Union	None	Initial best scores for each metric. If not provided, the function will compute the baseline score using the model with the starting features (if any) before beginning the search.
verbose	bool	False	Print a message each time a candidate is accepted.
Returns			A dictionary with keys `best_lags`, `best_exogs`, and `best_transforms` containing the selected features.

source

backward_feature_selection


def backward_feature_selection(
    model:object, # A *configured but unfitted* [`ml_forecaster`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster) instance.  The function works exclusively on deep copies and never mutates the object passed in.
    df:DataFrame, # Full training DataFrame. Must contain the target column and any candidate exogenous columns.
    cv_split:int, # Number of time-series cross-validation folds.
    H:int, # Forecast horizon (test window size for each fold).
    step_size:Optional=None, # Step size between consecutive CV folds.  If `None` (default) the step equals `H`, producing non-overlapping folds.
    metrics:Union=None, # One or more metric functions accepted by [`ml_forecaster.cross_validate`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster.cross_validate) (e.g. `[MAE, RMSE]`). Selection is driven by the **first** metric in the list; a feature is only removed when doing so improves **all** metrics simultaneously.
    lags_to_consider:Optional=None, # Lags to include in the initial feature set and test for removal (e.g. `[1, 2, 3, 4]`).  If `None`, no lag removal is attempted.
    candidate_features:Optional=None, # Column names in `df` that start in the model and are tested for removal.  If `None`, exogenous feature removal is skipped.
    transformations:Optional=None, # Lag-transform objects that start in the model and are tested for removal (e.g. `[rolling_mean(3, 1), expanding_std(1)]`).  If `None`, transform removal is skipped.
    verbose:bool=False, # Print a message each time a feature is removed.
): # A dictionary with keys `best_lags`, `best_exogs`, and `best_transforms` containing the surviving features after backward selection.

Backward stepwise feature selection for ml_forecaster models.

Starts with the full feature set (all provided lags, exogenous columns, and lag-transforms) and at each iteration tries removing each current feature individually. The feature whose removal produces the largest cross-validation improvement is permanently dropped. The loop continues until no remaining feature can be removed without hurting any of the evaluation metrics.

	Type	Default	Details
model	object		A configured but unfitted `ml_forecaster` instance. The function works exclusively on deep copies and never mutates the object passed in.
df	DataFrame		Full training DataFrame. Must contain the target column and any candidate exogenous columns.
cv_split	int		Number of time-series cross-validation folds.
H	int		Forecast horizon (test window size for each fold).
step_size	Union	None	Step size between consecutive CV folds. If `None` (default) the step equals `H`, producing non-overlapping folds.
metrics	Union	None	One or more metric functions accepted by `ml_forecaster.cross_validate` (e.g. `[MAE, RMSE]`). Selection is driven by the first metric in the list; a feature is only removed when doing so improves all metrics simultaneously.
lags_to_consider	Union	None	Lags to include in the initial feature set and test for removal (e.g. `[1, 2, 3, 4]`). If `None`, no lag removal is attempted.
candidate_features	Union	None	Column names in `df` that start in the model and are tested for removal. If `None`, exogenous feature removal is skipped.
transformations	Union	None	Lag-transform objects that start in the model and are tested for removal (e.g. `[rolling_mean(3, 1), expanding_std(1)]`). If `None`, transform removal is skipped.
verbose	bool	False	Print a message each time a feature is removed.
Returns			A dictionary with keys `best_lags`, `best_exogs`, and `best_transforms` containing the surviving features after backward selection.

Feature selection methods for multivariate time series models

source

mv_forward_feature_selection


def mv_forward_feature_selection(
    model:object, # Template model — never mutated.
    df:DataFrame, # DataFrame containing the target variable and any candidate features.
    target_col:str, # Target variable used to evaluate cross-validation score.
    cv_split:int, # Number of time-series cross-validation folds.
    H:int, # Forecast horizon / test size per fold.
    step_size:NoneType=None, # Rolling-window step size (defaults to H).
    metrics:NoneType=None, # One or more metric functions (e.g. `[MAE, RMSE]`). Selection is driven by the **first** metric in the list; a candidate is only accepted when it improves **all** metrics simultaneously.
    lags_to_consider:NoneType=None, # ``{col: max_lag}`` — lags 1..max_lag are candidates.
    candidate_features:NoneType=None, # Exogenous columns to consider adding.
    transformations:NoneType=None, # ``{col: [transform_objects]}`` — transform candidates per target.
    starting_lags:NoneType=None, # Lags already included before search begins.
    starting_transforms:NoneType=None, # Transforms already included before search begins.
    verbose:bool=False
): # `{"best_lags": {col: [...]}, "best_exogs": [...],
   "best_transforms": {col: [name_str, ...]}}`

Forward stepwise feature selection for ml_mv_forecaster.

	Type	Default	Details
model	object		Template model — never mutated.
df	DataFrame		DataFrame containing the target variable and any candidate features.
target_col	str		Target variable used to evaluate cross-validation score.
cv_split	int		Number of time-series cross-validation folds.
H	int		Forecast horizon / test size per fold.
step_size	NoneType	None	Rolling-window step size (defaults to H).
metrics	NoneType	None	One or more metric functions (e.g. `[MAE, RMSE]`). Selection is driven by the first metric in the list; a candidate is only accepted when it improves all metrics simultaneously.
lags_to_consider	NoneType	None	`{col: max_lag}` — lags 1..max_lag are candidates.
candidate_features	NoneType	None	Exogenous columns to consider adding.
transformations	NoneType	None	`{col: [transform_objects]}` — transform candidates per target.
starting_lags	NoneType	None	Lags already included before search begins.
starting_transforms	NoneType	None	Transforms already included before search begins.
verbose	bool	False
Returns			`{"best_lags": {col: [...]}, "best_exogs": [...], "best_transforms": {col: [name_str, ...]}}`

source

mv_backward_feature_selection


def mv_backward_feature_selection(
    model:object, # Template model — never mutated.
    df:DataFrame, # All candidate exog columns must already be present.
    target_col:str, # Target variable used to evaluate cross-validation score.
    cv_split:int, H:int, # Forecast horizon / test size per fold.
    step_size:NoneType=None, # Rolling-window step size (defaults to H).
    metrics:NoneType=None, # One or more metric functions (e.g. `[MAE, RMSE]`). A feature is only removed when its removal improves **all** metrics simultaneously.
    lags_to_consider:NoneType=None, # ``{col: max_lag}`` — all lags 1..max_lag start as selected.
    candidate_features:NoneType=None, # Exogenous columns that start as selected.
    transformations:NoneType=None, # ``{col: [transform_objects]}`` — all transforms start as selected.
    verbose:bool=False
): # ``{"best_lags": {col: [...]}, "best_exogs": [...],
   "best_transforms": {col: [name_str, ...]}}``

Backward stepwise feature selection for [ml_mv_forecaster](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_mv_forecast.html#ml_mv_forecaster).

Starts with all candidate features included and iteratively removes the one whose removal most improves cross-validation score.

	Type	Default	Details
model	object		Template model — never mutated.
df	DataFrame		All candidate exog columns must already be present.
target_col	str		Target variable used to evaluate cross-validation score.
cv_split	int
H	int		Forecast horizon / test size per fold.
step_size	NoneType	None	Rolling-window step size (defaults to H).
metrics	NoneType	None	One or more metric functions (e.g. `[MAE, RMSE]`). A feature is only removed when its removal improves all metrics simultaneously.
lags_to_consider	NoneType	None	`{col: max_lag}` — all lags 1..max_lag start as selected.
candidate_features	NoneType	None	Exogenous columns that start as selected.
transformations	NoneType	None	`{col: [transform_objects]}` — all transforms start as selected.
verbose	bool	False
Returns			`{"best_lags": {col: [...]}, "best_exogs": [...], "best_transforms": {col: [name_str, ...]}}`

Feature selection methods for Markov Switching Autoregressive Regression

source

ms_arr_forward_feature_selection


def ms_arr_forward_feature_selection(
    model:object, # A configured [`ms_arr`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ms_arr.html#ms_arr) instance with `fit_em()` already called (recommended to use few EM iterations for this initial fit, e.g. `iterations=10`) or a template model with the same configuration but not yet fitted.  The model is copied internally and never mutated, so the caller's instance remains unchanged.
    df:DataFrame, # Full training DataFrame. Must contain the target column and any candidate exogenous columns.
    cv_split:int, # Number of time-series cross-validation folds.
    H:int, # Forecast horizon (test window size for each fold).
    step_size:Optional=None, # Step size between consecutive CV folds. Defaults to H.
    metrics:Union=None, # Required when validation_type='cv'. Selection driven by first metric; a candidate is accepted only when it improves all metrics.
    lags_to_consider:Union=None, # Candidate lags. Int → 1..n; list → specific lags.
    candidate_features:Optional=None, # Exogenous column names to test as candidates.
    transformations:Optional=None, # Lag-transform objects to test as candidates.
    starting_lags:Optional=None, # Lags always included in the initial set (not candidates).
    starting_transforms:Optional=None, # Transforms always included in the initial set (not candidates).
    validation_type:str='cv', # Criterion for selection: 'cv', 'AIC', 'BIC', or 'AIC_BIC'. When 'cv', metrics must be provided and drive selection. When 'AIC' or 'BIC', the respective information criterion is used. When 'AIC_BIC', a candidate is accepted only if it improves both AIC and BIC.
    iterations:int=10, # EM iterations used inside fit_em() for each candidate evaluation.
    verbose:bool=False, # Print a message each time a candidate is accepted.
): # `{"best_lags": [...], "best_exogs": [...], "best_transforms": [...]}`

Forward stepwise feature selection for ms_arr models.

At each iteration every remaining candidate (lag, exogenous column, or lag-transform) is tested individually by adding it to the current best feature set. The candidate that produces the largest improvement is permanently added. The loop continues until no remaining candidate improves the evaluation criterion.

The HMM state (A, pi, stds, coeffs) is warm-started from the round winner and propagated to subsequent rounds for consistent initialisation.

source

mv_forward_feature_selection


def mv_forward_feature_selection(
    model:object, # Template model — never mutated.
    df:DataFrame, # DataFrame containing the target variable and any candidate features.
    target_col:str, # Target variable used to evaluate cross-validation score.
    cv_split:int, # Number of time-series cross-validation folds.
    H:int, # Forecast horizon / test size per fold.
    step_size:NoneType=None, # Rolling-window step size (defaults to H).
    metrics:NoneType=None, # One or more metric functions (e.g. `[MAE, RMSE]`). Selection is driven by the **first** metric in the list; a candidate is only accepted when it improves **all** metrics simultaneously.
    lags_to_consider:NoneType=None, # ``{col: max_lag}`` — lags 1..max_lag are candidates.
    candidate_features:NoneType=None, # Exogenous columns to consider adding.
    transformations:NoneType=None, # ``{col: [transform_objects]}`` — transform candidates per target.
    starting_lags:NoneType=None, # Lags already included before search begins.
    starting_transforms:NoneType=None, # Transforms already included before search begins.
    verbose:bool=False
): # `{"best_lags": {col: [...]}, "best_exogs": [...],
   "best_transforms": {col: [name_str, ...]}}`

Forward stepwise feature selection for ml_mv_forecaster.

	Type	Default	Details
model	object		Template model — never mutated.
df	DataFrame		DataFrame containing the target variable and any candidate features.
target_col	str		Target variable used to evaluate cross-validation score.
cv_split	int		Number of time-series cross-validation folds.
H	int		Forecast horizon / test size per fold.
step_size	NoneType	None	Rolling-window step size (defaults to H).
metrics	NoneType	None	One or more metric functions (e.g. `[MAE, RMSE]`). Selection is driven by the first metric in the list; a candidate is only accepted when it improves all metrics simultaneously.
lags_to_consider	NoneType	None	`{col: max_lag}` — lags 1..max_lag are candidates.
candidate_features	NoneType	None	Exogenous columns to consider adding.
transformations	NoneType	None	`{col: [transform_objects]}` — transform candidates per target.
starting_lags	NoneType	None	Lags already included before search begins.
starting_transforms	NoneType	None	Transforms already included before search begins.
verbose	bool	False
Returns			`{"best_lags": {col: [...]}, "best_exogs": [...], "best_transforms": {col: [name_str, ...]}}`

source

ms_arr_backward_feature_selection


def ms_arr_backward_feature_selection(
    df:DataFrame, # Full training DataFrame. All candidate exogenous columns must be present.
    cv_split:int, # Number of time-series cross-validation folds.
    H:int, # Forecast horizon (test window size for each fold).
    step_size:Optional=None, # Step size between consecutive CV folds. Defaults to H.
    model:object=None, # A configured but unfitted ms_arr instance. Never mutated.
    metrics:Union=None, # Required when validation_type='cv'. A feature is only removed when
doing so improves all metrics simultaneously.
    lags_to_consider:Union=None, # Initial lag set. Int → 1..n; list → specific lags.
    candidate_features:Optional=None, # Exogenous columns that start in the model and are tested for removal.
    transformations:Optional=None, # Lag-transform objects that start in the model and are tested for removal.
    validation_type:str='cv', # Criterion for selection: 'cv', 'AIC', 'BIC', or 'AIC_BIC'.
    iterations:int=100, # EM iterations used inside fit_em() for each candidate evaluation.
    verbose:bool=False, # Print a message each time a feature is removed.
): # `{"best_lags": [...], "best_exogs": [...], "best_transforms": [...]}`

Backward stepwise feature selection for ms_arr models.

Starts with the full feature set and at each iteration tries removing each current feature individually. The feature whose removal produces the largest improvement is permanently dropped. The loop continues until no removal improves the evaluation criterion.

The HMM state (A, pi, stds, coeffs) is warm-started from the round winner and propagated to subsequent rounds.

source

mv_backward_feature_selection


def mv_backward_feature_selection(
    model:object, # Template model — never mutated.
    df:DataFrame, # All candidate exog columns must already be present.
    target_col:str, # Target variable used to evaluate cross-validation score.
    cv_split:int, H:int, # Forecast horizon / test size per fold.
    step_size:NoneType=None, # Rolling-window step size (defaults to H).
    metrics:NoneType=None, # One or more metric functions (e.g. `[MAE, RMSE]`). A feature is only removed when its removal improves **all** metrics simultaneously.
    lags_to_consider:NoneType=None, # ``{col: max_lag}`` — all lags 1..max_lag start as selected.
    candidate_features:NoneType=None, # Exogenous columns that start as selected.
    transformations:NoneType=None, # ``{col: [transform_objects]}`` — all transforms start as selected.
    verbose:bool=False
): # ``{"best_lags": {col: [...]}, "best_exogs": [...],
   "best_transforms": {col: [name_str, ...]}}``

Backward stepwise feature selection for [ml_mv_forecaster](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_mv_forecast.html#ml_mv_forecaster).

Starts with all candidate features included and iteratively removes the one whose removal most improves cross-validation score.

	Type	Default	Details
model	object		Template model — never mutated.
df	DataFrame		All candidate exog columns must already be present.
target_col	str		Target variable used to evaluate cross-validation score.
cv_split	int
H	int		Forecast horizon / test size per fold.
step_size	NoneType	None	Rolling-window step size (defaults to H).
metrics	NoneType	None	One or more metric functions (e.g. `[MAE, RMSE]`). A feature is only removed when its removal improves all metrics simultaneously.
lags_to_consider	NoneType	None	`{col: max_lag}` — all lags 1..max_lag start as selected.
candidate_features	NoneType	None	Exogenous columns that start as selected.
transformations	NoneType	None	`{col: [transform_objects]}` — all transforms start as selected.
verbose	bool	False
Returns			`{"best_lags": {col: [...]}, "best_exogs": [...], "best_transforms": {col: [name_str, ...]}}`