| Type | Default | Details | |
|---|---|---|---|
| n_splits | int | Number of splits to generate. | |
| test_size | int | The number of samples in each test set. | |
| step_size | Union | None | The number of samples to move the test set forward for each split. If None, it defaults to the test_size, meaning non-overlapping test sets. |
| Returns | None |
Hyperparameters tuning methods for Univariate machine learning models
SplitTimeSeries
def SplitTimeSeries(
n_splits:int, # Number of splits to generate.
test_size:int, # The number of samples in each test set.
step_size:Optional=None, # The number of samples to move the test set forward for each split. If None, it defaults to the test_size, meaning non-overlapping test sets.
)->None:
A time series cross-validator that generates train/test splits with a fixed test size and a configurable step size.
hyperopt_tune
def hyperopt_tune(
model:object, # Forecasting model object with .fit and .forecast methods and relevant attributes.
df:DataFrame, # Time series data with a datetime index and a target column and optionally exogenous features.
cv_split:int, # Number of cross-validation splits.
test_size:int, # Number of samples in each test set. For ml_direct_forecaster, this will be overridden to be the maximum horizon in model.H.
eval_metric:Callable, # Evaluation metric function.
param_space:dict, # Hyperparameter search space for the forecasting model.
step_size:int=None, # Step size to move the test window forward in each split.
eval_num:int=100, # Number of hyperparameter combinations to evaluate. Default is 100.
verbose:bool=False, # Whether to print the evaluation metric for each hyperparameter combination. Default is False.
)->Tuple: # A tuple containing the best hyperparameters, selected lags, and selected transforms.
Tune forecasting model hyperparameters using time series cross-validation and hyperopt.
| Type | Default | Details | |
|---|---|---|---|
| model | object | Forecasting model object with .fit and .forecast methods and relevant attributes. | |
| df | DataFrame | Time series data with a datetime index and a target column and optionally exogenous features. | |
| cv_split | int | Number of cross-validation splits. | |
| test_size | int | Number of samples in each test set. For ml_direct_forecaster, this will be overridden to be the maximum horizon in model.H. | |
| eval_metric | Callable | Evaluation metric function. | |
| param_space | dict | Hyperparameter search space for the forecasting model. | |
| step_size | int | None | Step size to move the test window forward in each split. |
| eval_num | int | 100 | Number of hyperparameter combinations to evaluate. Default is 100. |
| verbose | bool | False | Whether to print the evaluation metric for each hyperparameter combination. Default is False. |
| Returns | Tuple | A tuple containing the best hyperparameters, selected lags, and selected transforms. |
other_{'box_cox': 0.7756701052941546, 'box_cox_biasadj': False}
optuna_tune
def optuna_tune(
model:object, # Forecasting model with .fit and .forecast methods.
df:DataFrame, # Time series data (datetime index, target column, optional exogenous features).
cv_split:int, # Number of cross-validation splits.
test_size:int, # Number of samples in each test fold. For ml_direct_forecaster, this will be overridden to be the maximum horizon in model.H.
eval_metric:Callable, # Metric function to minimise.
param_space:Dict, # Each value must be a callable that accepts an Optuna `trial` and returns a value.
step_size:int=None, # Step size between CV folds.
eval_num:int=100, # Number of Optuna trials. Default 100.
verbose:bool=False, # Print score for every trial. Default False.
)->Tuple: # Best hyperparameters and best lags (if 'lags' is in param_space).
Tune forecasting model hyperparameters using time series cross-validation and Optuna.
| Type | Default | Details | |
|---|---|---|---|
| model | object | Forecasting model with .fit and .forecast methods. | |
| df | DataFrame | Time series data (datetime index, target column, optional exogenous features). | |
| cv_split | int | Number of cross-validation splits. | |
| test_size | int | Number of samples in each test fold. For ml_direct_forecaster, this will be overridden to be the maximum horizon in model.H. | |
| eval_metric | Callable | Metric function to minimise. | |
| param_space | Dict | Each value must be a callable that accepts an Optuna trial and returns a value. |
|
| step_size | int | None | Step size between CV folds. |
| eval_num | int | 100 | Number of Optuna trials. Default 100. |
| verbose | bool | False | Print score for every trial. Default False. |
| Returns | Tuple | Best hyperparameters and best lags (if ‘lags’ is in param_space). |
Hyperparameters tuning methods for Multivariate machine learning models
mv_hyperopt_tune
def mv_hyperopt_tune(
model:object, # Forecasting model object with .fit and .forecast methods and relevant attributes.
df:DataFrame, # Time series data with a datetime index and a target column and optionally exogenous features.
target_col:str, # Name of the target column to minimize the evaluation metric on.
cv_split:int, # Number of cross-validation splits.
test_size:int, # Number of samples in each test set.
eval_metric:Callable, # Evaluation metric function.
param_space:dict, # Hyperparameter search space for the forecasting model.
step_size:int=None, # Step size to move the test window forward in each split.
eval_num:int=100, # Number of hyperparameter combinations to evaluate. Default is 100.
verbose:bool=False, # Whether to print the evaluation metric for each hyperparameter combination. Default is False.
)->Tuple: # A tuple containing the best hyperparameters, selected lags, and selected transforms.
Tune forecasting model hyperparameters using time series cross-validation and hyperopt for multivariate models.
| Type | Default | Details | |
|---|---|---|---|
| model | object | Forecasting model object with .fit and .forecast methods and relevant attributes. | |
| df | DataFrame | Time series data with a datetime index and a target column and optionally exogenous features. | |
| target_col | str | Name of the target column to minimize the evaluation metric on. | |
| cv_split | int | Number of cross-validation splits. | |
| test_size | int | Number of samples in each test set. | |
| eval_metric | Callable | Evaluation metric function. | |
| param_space | dict | Hyperparameter search space for the forecasting model. | |
| step_size | int | None | Step size to move the test window forward in each split. |
| eval_num | int | 100 | Number of hyperparameter combinations to evaluate. Default is 100. |
| verbose | bool | False | Whether to print the evaluation metric for each hyperparameter combination. Default is False. |
| Returns | Tuple | A tuple containing the best hyperparameters, selected lags, and selected transforms. |
mv_optuna_tune
def mv_optuna_tune(
model:object, # Forecasting model with .fit and .forecast methods.
df:DataFrame, # Time series data (datetime index, target column, optional exogenous features).
target_col:str, # Name of the target column to minimize the evaluation metric on.
cv_split:int, # Number of cross-validation splits.
test_size:int, # Number of samples in each test fold.
eval_metric:Callable, # Metric function to minimise.
param_space:Dict, # Each value must be a callable that accepts an Optuna `trial` and returns a value.
step_size:int=None, # Step size between CV folds.
eval_num:int=100, # Number of Optuna trials. Default 100.
verbose:bool=False, # Print score for every trial. Default False.
)->Tuple: # Best hyperparameters and best lags (if 'lags' is in param_space).
Tune forecasting model hyperparameters using time series cross-validation and Optuna.
| Type | Default | Details | |
|---|---|---|---|
| model | object | Forecasting model with .fit and .forecast methods. | |
| df | DataFrame | Time series data (datetime index, target column, optional exogenous features). | |
| target_col | str | Name of the target column to minimize the evaluation metric on. | |
| cv_split | int | Number of cross-validation splits. | |
| test_size | int | Number of samples in each test fold. | |
| eval_metric | Callable | Metric function to minimise. | |
| param_space | Dict | Each value must be a callable that accepts an Optuna trial and returns a value. |
|
| step_size | int | None | Step size between CV folds. |
| eval_num | int | 100 | Number of Optuna trials. Default 100. |
| verbose | bool | False | Print score for every trial. Default False. |
| Returns | Tuple | Best hyperparameters and best lags (if ‘lags’ is in param_space). |
Feature selection methods for univariate time series models
forward_feature_selection
def forward_feature_selection(
model:object, # A *configured but unfitted* [`ml_forecaster`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster) instance. The function works exclusively on deep copies and never mutates the object passed in.
df:DataFrame, # Full training DataFrame. Must contain the target column and any candidate exogenous columns.
cv_split:int, # Number of time-series cross-validation folds.
H:int, # Forecast horizon (test window size for each fold).
step_size:Optional=None, # Step size between consecutive CV folds. If `None` (default) the step equals `H`, producing non-overlapping folds — consistent with the default behaviour of [`ml_forecaster.cross_validate`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster.cross_validate).
metrics:Union=None, # One or more metric functions accepted by [`ml_forecaster.cross_validate`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster.cross_validate) (e.g. `[MAE, RMSE]`). Selection is driven by the **first** metric in the list; a candidate is only accepted when it improves **all** metrics simultaneously.
lags_to_consider:Optional=None, # Consider lags `1, 2, ..., lags_to_consider` as candidates. If `None`, lag selection is skipped.
candidate_features:Optional=None, # Column names in `df` that are exogenous feature candidates. The function never modifies this list. If `None`, exogenous feature selection is skipped.
transformations:Optional=None, # Lag-transform objects to test as candidates (e.g. `[rolling_mean(3, 1), expanding_std(1)]`). The function never modifies this list. If `None`, transform selection is skipped.
starting_lags:Optional=None, # Lags to include in the initial feature set before the search begins. These are *not* candidates — they are always included. Must be a list (e.g. `[1]` or `[1, 2, 3]`).
starting_transforms:Optional=None, # Lag-transform objects to include in the initial feature set before the search begins. Must be a list.
best_start_score:Optional=None, # Initial best scores for each metric. If not provided, the function will compute the baseline score using the model with the starting features (if any) before beginning the search.
verbose:bool=False, # Print a message each time a candidate is accepted.
): # A dictionary with keys `best_lags`, `best_exogs`, and `best_transforms` containing the selected features.
Forward stepwise feature selection for ml_forecaster models.
At each iteration every remaining candidate (lag, exogenous column, or lag-transform) is tested individually by adding it to the current best feature set. The candidate that produces the largest cross-validation improvement is permanently added. The loop continues until no remaining candidate improves any of the evaluation metrics.
| Type | Default | Details | |
|---|---|---|---|
| model | object | A configured but unfitted ml_forecaster instance. The function works exclusively on deep copies and never mutates the object passed in. |
|
| df | DataFrame | Full training DataFrame. Must contain the target column and any candidate exogenous columns. | |
| cv_split | int | Number of time-series cross-validation folds. | |
| H | int | Forecast horizon (test window size for each fold). | |
| step_size | Union | None | Step size between consecutive CV folds. If None (default) the step equals H, producing non-overlapping folds — consistent with the default behaviour of ml_forecaster.cross_validate. |
| metrics | Union | None | One or more metric functions accepted by ml_forecaster.cross_validate (e.g. [MAE, RMSE]). Selection is driven by the first metric in the list; a candidate is only accepted when it improves all metrics simultaneously. |
| lags_to_consider | Union | None | Consider lags 1, 2, ..., lags_to_consider as candidates. If None, lag selection is skipped. |
| candidate_features | Union | None | Column names in df that are exogenous feature candidates. The function never modifies this list. If None, exogenous feature selection is skipped. |
| transformations | Union | None | Lag-transform objects to test as candidates (e.g. [rolling_mean(3, 1), expanding_std(1)]). The function never modifies this list. If None, transform selection is skipped. |
| starting_lags | Union | None | Lags to include in the initial feature set before the search begins. These are not candidates — they are always included. Must be a list (e.g. [1] or [1, 2, 3]). |
| starting_transforms | Union | None | Lag-transform objects to include in the initial feature set before the search begins. Must be a list. |
| best_start_score | Union | None | Initial best scores for each metric. If not provided, the function will compute the baseline score using the model with the starting features (if any) before beginning the search. |
| verbose | bool | False | Print a message each time a candidate is accepted. |
| Returns | A dictionary with keys best_lags, best_exogs, and best_transforms containing the selected features. |
backward_feature_selection
def backward_feature_selection(
model:object, # A *configured but unfitted* [`ml_forecaster`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster) instance. The function works exclusively on deep copies and never mutates the object passed in.
df:DataFrame, # Full training DataFrame. Must contain the target column and any candidate exogenous columns.
cv_split:int, # Number of time-series cross-validation folds.
H:int, # Forecast horizon (test window size for each fold).
step_size:Optional=None, # Step size between consecutive CV folds. If `None` (default) the step equals `H`, producing non-overlapping folds.
metrics:Union=None, # One or more metric functions accepted by [`ml_forecaster.cross_validate`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster.cross_validate) (e.g. `[MAE, RMSE]`). Selection is driven by the **first** metric in the list; a feature is only removed when doing so improves **all** metrics simultaneously.
lags_to_consider:Optional=None, # Lags to include in the initial feature set and test for removal (e.g. `[1, 2, 3, 4]`). If `None`, no lag removal is attempted.
candidate_features:Optional=None, # Column names in `df` that start in the model and are tested for removal. If `None`, exogenous feature removal is skipped.
transformations:Optional=None, # Lag-transform objects that start in the model and are tested for removal (e.g. `[rolling_mean(3, 1), expanding_std(1)]`). If `None`, transform removal is skipped.
verbose:bool=False, # Print a message each time a feature is removed.
): # A dictionary with keys `best_lags`, `best_exogs`, and `best_transforms` containing the surviving features after backward selection.
Backward stepwise feature selection for ml_forecaster models.
Starts with the full feature set (all provided lags, exogenous columns, and lag-transforms) and at each iteration tries removing each current feature individually. The feature whose removal produces the largest cross-validation improvement is permanently dropped. The loop continues until no remaining feature can be removed without hurting any of the evaluation metrics.
| Type | Default | Details | |
|---|---|---|---|
| model | object | A configured but unfitted ml_forecaster instance. The function works exclusively on deep copies and never mutates the object passed in. |
|
| df | DataFrame | Full training DataFrame. Must contain the target column and any candidate exogenous columns. | |
| cv_split | int | Number of time-series cross-validation folds. | |
| H | int | Forecast horizon (test window size for each fold). | |
| step_size | Union | None | Step size between consecutive CV folds. If None (default) the step equals H, producing non-overlapping folds. |
| metrics | Union | None | One or more metric functions accepted by ml_forecaster.cross_validate (e.g. [MAE, RMSE]). Selection is driven by the first metric in the list; a feature is only removed when doing so improves all metrics simultaneously. |
| lags_to_consider | Union | None | Lags to include in the initial feature set and test for removal (e.g. [1, 2, 3, 4]). If None, no lag removal is attempted. |
| candidate_features | Union | None | Column names in df that start in the model and are tested for removal. If None, exogenous feature removal is skipped. |
| transformations | Union | None | Lag-transform objects that start in the model and are tested for removal (e.g. [rolling_mean(3, 1), expanding_std(1)]). If None, transform removal is skipped. |
| verbose | bool | False | Print a message each time a feature is removed. |
| Returns | A dictionary with keys best_lags, best_exogs, and best_transforms containing the surviving features after backward selection. |
Feature selection methods for multivariate time series models
mv_forward_feature_selection
def mv_forward_feature_selection(
model:object, # Template model — never mutated.
df:DataFrame, # DataFrame containing the target variable and any candidate features.
target_col:str, # Target variable used to evaluate cross-validation score.
cv_split:int, # Number of time-series cross-validation folds.
H:int, # Forecast horizon / test size per fold.
step_size:NoneType=None, # Rolling-window step size (defaults to H).
metrics:NoneType=None, # One or more metric functions (e.g. `[MAE, RMSE]`). Selection is driven by the **first** metric in the list; a candidate is only accepted when it improves **all** metrics simultaneously.
lags_to_consider:NoneType=None, # ``{col: max_lag}`` — lags 1..max_lag are candidates.
candidate_features:NoneType=None, # Exogenous columns to consider adding.
transformations:NoneType=None, # ``{col: [transform_objects]}`` — transform candidates per target.
starting_lags:NoneType=None, # Lags already included before search begins.
starting_transforms:NoneType=None, # Transforms already included before search begins.
verbose:bool=False
): # `{"best_lags": {col: [...]}, "best_exogs": [...],
"best_transforms": {col: [name_str, ...]}}`
Forward stepwise feature selection for ml_mv_forecaster.
| Type | Default | Details | |
|---|---|---|---|
| model | object | Template model — never mutated. | |
| df | DataFrame | DataFrame containing the target variable and any candidate features. | |
| target_col | str | Target variable used to evaluate cross-validation score. | |
| cv_split | int | Number of time-series cross-validation folds. | |
| H | int | Forecast horizon / test size per fold. | |
| step_size | NoneType | None | Rolling-window step size (defaults to H). |
| metrics | NoneType | None | One or more metric functions (e.g. [MAE, RMSE]). Selection is driven by the first metric in the list; a candidate is only accepted when it improves all metrics simultaneously. |
| lags_to_consider | NoneType | None | {col: max_lag} — lags 1..max_lag are candidates. |
| candidate_features | NoneType | None | Exogenous columns to consider adding. |
| transformations | NoneType | None | {col: [transform_objects]} — transform candidates per target. |
| starting_lags | NoneType | None | Lags already included before search begins. |
| starting_transforms | NoneType | None | Transforms already included before search begins. |
| verbose | bool | False | |
| Returns | {"best_lags": {col: [...]}, "best_exogs": [...], "best_transforms": {col: [name_str, ...]}} |
mv_backward_feature_selection
def mv_backward_feature_selection(
model:object, # Template model — never mutated.
df:DataFrame, # All candidate exog columns must already be present.
target_col:str, # Target variable used to evaluate cross-validation score.
cv_split:int, H:int, # Forecast horizon / test size per fold.
step_size:NoneType=None, # Rolling-window step size (defaults to H).
metrics:NoneType=None, # One or more metric functions (e.g. `[MAE, RMSE]`). A feature is only removed when its removal improves **all** metrics simultaneously.
lags_to_consider:NoneType=None, # ``{col: max_lag}`` — all lags 1..max_lag start as selected.
candidate_features:NoneType=None, # Exogenous columns that start as selected.
transformations:NoneType=None, # ``{col: [transform_objects]}`` — all transforms start as selected.
verbose:bool=False
): # ``{"best_lags": {col: [...]}, "best_exogs": [...],
"best_transforms": {col: [name_str, ...]}}``
Backward stepwise feature selection for [ml_mv_forecaster](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_mv_forecast.html#ml_mv_forecaster).
Starts with all candidate features included and iteratively removes the one whose removal most improves cross-validation score.
| Type | Default | Details | |
|---|---|---|---|
| model | object | Template model — never mutated. | |
| df | DataFrame | All candidate exog columns must already be present. | |
| target_col | str | Target variable used to evaluate cross-validation score. | |
| cv_split | int | ||
| H | int | Forecast horizon / test size per fold. | |
| step_size | NoneType | None | Rolling-window step size (defaults to H). |
| metrics | NoneType | None | One or more metric functions (e.g. [MAE, RMSE]). A feature is only removed when its removal improves all metrics simultaneously. |
| lags_to_consider | NoneType | None | {col: max_lag} — all lags 1..max_lag start as selected. |
| candidate_features | NoneType | None | Exogenous columns that start as selected. |
| transformations | NoneType | None | {col: [transform_objects]} — all transforms start as selected. |
| verbose | bool | False | |
| Returns | {"best_lags": {col: [...]}, "best_exogs": [...], "best_transforms": {col: [name_str, ...]}} |
Feature selection methods for Markov Switching Autoregressive Regression
ms_arr_forward_feature_selection
def ms_arr_forward_feature_selection(
model:object, # A configured [`ms_arr`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ms_arr.html#ms_arr) instance with `fit_em()` already called (recommended to use few EM iterations for this initial fit, e.g. `iterations=10`) or a template model with the same configuration but not yet fitted. The model is copied internally and never mutated, so the caller's instance remains unchanged.
df:DataFrame, # Full training DataFrame. Must contain the target column and any candidate exogenous columns.
cv_split:int, # Number of time-series cross-validation folds.
H:int, # Forecast horizon (test window size for each fold).
step_size:Optional=None, # Step size between consecutive CV folds. Defaults to H.
metrics:Union=None, # Required when validation_type='cv'. Selection driven by first metric; a candidate is accepted only when it improves all metrics.
lags_to_consider:Union=None, # Candidate lags. Int → 1..n; list → specific lags.
candidate_features:Optional=None, # Exogenous column names to test as candidates.
transformations:Optional=None, # Lag-transform objects to test as candidates.
starting_lags:Optional=None, # Lags always included in the initial set (not candidates).
starting_transforms:Optional=None, # Transforms always included in the initial set (not candidates).
validation_type:str='cv', # Criterion for selection: 'cv', 'AIC', 'BIC', or 'AIC_BIC'. When 'cv', metrics must be provided and drive selection. When 'AIC' or 'BIC', the respective information criterion is used. When 'AIC_BIC', a candidate is accepted only if it improves both AIC and BIC.
iterations:int=10, # EM iterations used inside fit_em() for each candidate evaluation.
verbose:bool=False, # Print a message each time a candidate is accepted.
): # `{"best_lags": [...], "best_exogs": [...], "best_transforms": [...]}`
Forward stepwise feature selection for ms_arr models.
At each iteration every remaining candidate (lag, exogenous column, or lag-transform) is tested individually by adding it to the current best feature set. The candidate that produces the largest improvement is permanently added. The loop continues until no remaining candidate improves the evaluation criterion.
The HMM state (A, pi, stds, coeffs) is warm-started from the round winner and propagated to subsequent rounds for consistent initialisation.
mv_forward_feature_selection
def mv_forward_feature_selection(
model:object, # Template model — never mutated.
df:DataFrame, # DataFrame containing the target variable and any candidate features.
target_col:str, # Target variable used to evaluate cross-validation score.
cv_split:int, # Number of time-series cross-validation folds.
H:int, # Forecast horizon / test size per fold.
step_size:NoneType=None, # Rolling-window step size (defaults to H).
metrics:NoneType=None, # One or more metric functions (e.g. `[MAE, RMSE]`). Selection is driven by the **first** metric in the list; a candidate is only accepted when it improves **all** metrics simultaneously.
lags_to_consider:NoneType=None, # ``{col: max_lag}`` — lags 1..max_lag are candidates.
candidate_features:NoneType=None, # Exogenous columns to consider adding.
transformations:NoneType=None, # ``{col: [transform_objects]}`` — transform candidates per target.
starting_lags:NoneType=None, # Lags already included before search begins.
starting_transforms:NoneType=None, # Transforms already included before search begins.
verbose:bool=False
): # `{"best_lags": {col: [...]}, "best_exogs": [...],
"best_transforms": {col: [name_str, ...]}}`
Forward stepwise feature selection for ml_mv_forecaster.
| Type | Default | Details | |
|---|---|---|---|
| model | object | Template model — never mutated. | |
| df | DataFrame | DataFrame containing the target variable and any candidate features. | |
| target_col | str | Target variable used to evaluate cross-validation score. | |
| cv_split | int | Number of time-series cross-validation folds. | |
| H | int | Forecast horizon / test size per fold. | |
| step_size | NoneType | None | Rolling-window step size (defaults to H). |
| metrics | NoneType | None | One or more metric functions (e.g. [MAE, RMSE]). Selection is driven by the first metric in the list; a candidate is only accepted when it improves all metrics simultaneously. |
| lags_to_consider | NoneType | None | {col: max_lag} — lags 1..max_lag are candidates. |
| candidate_features | NoneType | None | Exogenous columns to consider adding. |
| transformations | NoneType | None | {col: [transform_objects]} — transform candidates per target. |
| starting_lags | NoneType | None | Lags already included before search begins. |
| starting_transforms | NoneType | None | Transforms already included before search begins. |
| verbose | bool | False | |
| Returns | {"best_lags": {col: [...]}, "best_exogs": [...], "best_transforms": {col: [name_str, ...]}} |
ms_arr_backward_feature_selection
def ms_arr_backward_feature_selection(
df:DataFrame, # Full training DataFrame. All candidate exogenous columns must be present.
cv_split:int, # Number of time-series cross-validation folds.
H:int, # Forecast horizon (test window size for each fold).
step_size:Optional=None, # Step size between consecutive CV folds. Defaults to H.
model:object=None, # A configured but unfitted ms_arr instance. Never mutated.
metrics:Union=None, # Required when validation_type='cv'. A feature is only removed when
doing so improves all metrics simultaneously.
lags_to_consider:Union=None, # Initial lag set. Int → 1..n; list → specific lags.
candidate_features:Optional=None, # Exogenous columns that start in the model and are tested for removal.
transformations:Optional=None, # Lag-transform objects that start in the model and are tested for removal.
validation_type:str='cv', # Criterion for selection: 'cv', 'AIC', 'BIC', or 'AIC_BIC'.
iterations:int=100, # EM iterations used inside fit_em() for each candidate evaluation.
verbose:bool=False, # Print a message each time a feature is removed.
): # `{"best_lags": [...], "best_exogs": [...], "best_transforms": [...]}`
Backward stepwise feature selection for ms_arr models.
Starts with the full feature set and at each iteration tries removing each current feature individually. The feature whose removal produces the largest improvement is permanently dropped. The loop continues until no removal improves the evaluation criterion.
The HMM state (A, pi, stds, coeffs) is warm-started from the round winner and propagated to subsequent rounds.
mv_backward_feature_selection
def mv_backward_feature_selection(
model:object, # Template model — never mutated.
df:DataFrame, # All candidate exog columns must already be present.
target_col:str, # Target variable used to evaluate cross-validation score.
cv_split:int, H:int, # Forecast horizon / test size per fold.
step_size:NoneType=None, # Rolling-window step size (defaults to H).
metrics:NoneType=None, # One or more metric functions (e.g. `[MAE, RMSE]`). A feature is only removed when its removal improves **all** metrics simultaneously.
lags_to_consider:NoneType=None, # ``{col: max_lag}`` — all lags 1..max_lag start as selected.
candidate_features:NoneType=None, # Exogenous columns that start as selected.
transformations:NoneType=None, # ``{col: [transform_objects]}`` — all transforms start as selected.
verbose:bool=False
): # ``{"best_lags": {col: [...]}, "best_exogs": [...],
"best_transforms": {col: [name_str, ...]}}``
Backward stepwise feature selection for [ml_mv_forecaster](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_mv_forecast.html#ml_mv_forecaster).
Starts with all candidate features included and iteratively removes the one whose removal most improves cross-validation score.
| Type | Default | Details | |
|---|---|---|---|
| model | object | Template model — never mutated. | |
| df | DataFrame | All candidate exog columns must already be present. | |
| target_col | str | Target variable used to evaluate cross-validation score. | |
| cv_split | int | ||
| H | int | Forecast horizon / test size per fold. | |
| step_size | NoneType | None | Rolling-window step size (defaults to H). |
| metrics | NoneType | None | One or more metric functions (e.g. [MAE, RMSE]). A feature is only removed when its removal improves all metrics simultaneously. |
| lags_to_consider | NoneType | None | {col: max_lag} — all lags 1..max_lag start as selected. |
| candidate_features | NoneType | None | Exogenous columns that start as selected. |
| transformations | NoneType | None | {col: [transform_objects]} — all transforms start as selected. |
| verbose | bool | False | |
| Returns | {"best_lags": {col: [...]}, "best_exogs": [...], "best_transforms": {col: [name_str, ...]}} |