| Type | Default | Details | |
|---|---|---|---|
| n_components | int | Number of hidden states (regimes). | |
| target_col | str | Name of the target variable. | |
| lags | Optional[Union[int, List[int]]] | None | Lags for the autoregressive model. |
| lag_transform | Optional[list] | None | List of lag-transform function objects applied to the target. |
| difference | Optional[int] | None | Order of ordinary differencing (e.g. 1 for first difference). |
| seasonal_diff | Optional[int] | None | Seasonal period for seasonal differencing. |
| trend | Optional[str] | None | Trend strategy: ‘linear’ or ‘ets’. |
| pol_degree | int | 1 | Degree of polynomial trend (default: 1). Used when trend=‘linear’. |
| ets_params | Optional[Dict[str, Any]] | None | Dictionary of parameters for the ExponentialSmoothing model when using ‘ets’ trend strategy. The keys should be the parameter names and the values should be the parameter values. Default is None (use default ETS parameters). |
| change_points | Optional[List[int]] | None | Change points for piecewise linear trend. List of indices where the trend slope can change. |
| box_cox | Union[bool, float, int] | False | Whether to apply Box-Cox transformation to the target variable. If a float or int value is provided, it will be used as the lambda parameter for the Box-Cox transformation. If True, the lambda parameter will be estimated from the data. |
| box_cox_biasadj | bool | False | Whether to apply bias adjustment when inverting Box-Cox transformation (default: False). |
| add_constant | bool | True | If True, prepend a constant column to the regressor matrix (default: True). |
| cat_variables | Optional[List[str]] | None | List of categorical feature column names. If provided, these columns will be treated as categorical variables and encoded accordingly. Default is None (no categorical variables). |
| categorical_encoder | Optional[Any] | None | Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(), etc.) to apply to the categorical variables specified in cat_variables. The encoder should have fit() and transform() methods that can be applied to the input DataFrame. Default is None (no categorical encoding) and if None, categorical variables can only be used if the model can handle them natively (e.g. LGBM or CatBoost). |
| method | str | posterior | State assignment method: ‘posterior’ (soft) or ‘viterbi’ (hard). Default: ‘posterior’. |
| switching_var | bool | True | If True, each regime has its own variance. If False, uses pooled variance. Default: True. |
| startprob_prior | float | 1000.0 | Dirichlet concentration for initial state distribution. Default: 1e3. |
| transmat_prior | float | 1000.0 | Dirichlet concentration for transition matrix rows. Default: 1e3. |
| n_iter | int | 100 | Maximum EM iterations. Default: 100. |
| tol | float | 0.001 | Convergence tolerance on log-likelihood. Default: 1e-6. |
| ridge | float | 1e-05 | Ridge regularisation parameter for coefficient estimation. Default: 1e-5. |
| coefficients | Optional[np.ndarray] | None | Initial regression coefficients (shape: n_states x n_features). |
| stds | Optional[np.ndarray] | None | Initial state standard deviations (shape: n_states,). |
| init_state | Optional[np.ndarray] | None | Initial state probability vector (shape: n_states,). |
| trans_matrix | Optional[np.ndarray] | None | Initial transition matrix (shape: n_states x n_states). |
| random_state | int | 42 | Random seed for reproducibility. Default: 42. |
| verbose | bool | False | If True, print EM progress. Default: False. |
| Returns | None |
ms_arr
def ms_arr(
n_components:int, # Number of hidden states (regimes).
target_col:str, # Name of the target variable.
lags:Optional[Union[int, List[int]]]=None, # Lags for the autoregressive model.
lag_transform:Optional[list]=None, # List of lag-transform function objects applied to the target.
difference:Optional[int]=None, # Order of ordinary differencing (e.g. 1 for first difference).
seasonal_diff:Optional[int]=None, # Seasonal period for seasonal differencing.
trend:Optional[str]=None, # Trend strategy: 'linear' or 'ets'.
pol_degree:int=1, # Degree of polynomial trend (default: 1). Used when trend='linear'.
ets_params:Optional[Dict[str, Any]]=None, # Dictionary of parameters for the ExponentialSmoothing model when using 'ets' trend strategy. The keys should be the parameter names and the values should be the parameter values. Default is None (use default ETS parameters).
change_points:Optional[List[int]]=None, # Change points for piecewise linear trend. List of indices where the trend slope can change.
box_cox:Union[bool, float, int]=False, # Whether to apply Box-Cox transformation to the target variable. If a float or int value is provided, it will be used as the lambda parameter for the Box-Cox transformation. If True, the lambda parameter will be estimated from the data.
box_cox_biasadj:bool=False, # Whether to apply bias adjustment when inverting Box-Cox transformation (default: False).
add_constant:bool=True, # If True, prepend a constant column to the regressor matrix (default: True).
cat_variables:Optional[List[str]]=None, # List of categorical feature column names. If provided, these columns will be treated as categorical variables and encoded accordingly. Default is None (no categorical variables).
categorical_encoder:Optional[Any]=None, # Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(), etc.) to apply to the categorical variables specified in cat_variables. The encoder should have fit() and transform() methods that can be applied to the input DataFrame. Default is None (no categorical encoding) and if None, categorical variables can only be used if the model can handle them natively (e.g. LGBM or CatBoost).
method:str='posterior', # State assignment method: 'posterior' (soft) or 'viterbi' (hard). Default: 'posterior'.
switching_var:bool=True, # If True, each regime has its own variance. If False, uses pooled variance. Default: True.
startprob_prior:float=1000.0, # Dirichlet concentration for initial state distribution. Default: 1e3.
transmat_prior:float=1000.0, # Dirichlet concentration for transition matrix rows. Default: 1e3.
n_iter:int=100, # Maximum EM iterations. Default: 100.
tol:float=0.001, # Convergence tolerance on log-likelihood. Default: 1e-6.
ridge:float=1e-05, # Ridge regularisation parameter for coefficient estimation. Default: 1e-5.
coefficients:Optional[np.ndarray]=None, # Initial regression coefficients (shape: n_states x n_features).
stds:Optional[np.ndarray]=None, # Initial state standard deviations (shape: n_states,).
init_state:Optional[np.ndarray]=None, # Initial state probability vector (shape: n_states,).
trans_matrix:Optional[np.ndarray]=None, # Initial transition matrix (shape: n_states x n_states).
random_state:int=42, # Random seed for reproducibility. Default: 42.
verbose:bool=False, # If True, print EM progress. Default: False.
)->None:
Initialize the MS-ARR model with the specified parameters.
ms_arr.fit
def fit(
df:pd.DataFrame, # Training DataFrame containing the target and any feature columns.
)->float: # Final log-likelihood after EM convergence.
Fit the model using the EM algorithm.
| Type | Details | |
|---|---|---|
| df | pd.DataFrame | Training DataFrame containing the target and any feature columns. |
| Returns | float | Final log-likelihood after EM convergence. |
ms_arr.forecast
def forecast(
H:int, # Forecast horizon (number of steps to forecast ahead).
exog:Optional[pd.DataFrame]=None, # Future exogenous regressors (must contain at least H rows). Should have the same columns as the training data (excluding the target variable).
)->np.ndarray:
Generate forecasts for H future time steps.
| Type | Default | Details | |
|---|---|---|---|
| H | int | Forecast horizon (number of steps to forecast ahead). | |
| exog | Optional[pd.DataFrame] | None | Future exogenous regressors (must contain at least H rows). Should have the same columns as the training data (excluding the target variable). |
| Returns | np.ndarray |
ms_arr.cross_validate
def cross_validate(
df:pd.DataFrame, # DataFrame containing the target and any feature columns.
cv_split:int, # Number of cross-validation splits.
test_size:int, # Number of time steps in the test set for each split.
metrics:List[Callable], # Metric functions (e.g. ``[MAE, RMSE]``) used to evaluate forecast accuracy across folds. Call ``.cv_summary()`` after cross-validation to retrieve the aggregated scores.
step_size:int=1, # Step size between the start of each test set in the splits.
n_iter:int=1, # Number of EM iterations to run for each training fold.
h_split_point:Optional[int]=None, # If provided, split the test set into two parts at this index and evaluate metrics separately on each part.
)->Union[pd.DataFrame, Tuple[pd.DataFrame, pd.DataFrame]]: # DataFrame containing the average score for each metric across all splits. If h_split_point is provided, also includes separate scores for the two parts of the test set. If cv_df is True, returns a tuple of (overall_performance_df, cv_predictions_df)
Run cross-validation.
| Type | Default | Details | |
|---|---|---|---|
| df | pd.DataFrame | DataFrame containing the target and any feature columns. | |
| cv_split | int | Number of cross-validation splits. | |
| test_size | int | Number of time steps in the test set for each split. | |
| metrics | List[Callable] | Metric functions (e.g. [MAE, RMSE]) used to evaluate forecast accuracy across folds. Call .cv_summary() after cross-validation to retrieve the aggregated scores. |
|
| step_size | int | 1 | Step size between the start of each test set in the splits. |
| n_iter | int | 1 | Number of EM iterations to run for each training fold. |
| h_split_point | Optional[int] | None | If provided, split the test set into two parts at this index and evaluate metrics separately on each part. |
| Returns | Union[pd.DataFrame, Tuple[pd.DataFrame, pd.DataFrame]] | DataFrame containing the average score for each metric across all splits. If h_split_point is provided, also includes separate scores for the two parts of the test set. If cv_df is True, returns a tuple of (overall_performance_df, cv_predictions_df) |