source

ms_arr


def ms_arr(
    n_components:int, # Number of hidden states (regimes).
    target_col:str, # Name of the target variable.
    lags:Optional[Union[int, List[int]]]=None, # Lags for the autoregressive model.
    lag_transform:Optional[list]=None, # List of lag-transform function objects applied to the target.
    difference:Optional[int]=None, # Order of ordinary differencing (e.g. 1 for first difference).
    seasonal_diff:Optional[int]=None, # Seasonal period for seasonal differencing.
    trend:Optional[str]=None, # Trend strategy: 'linear' or 'ets'.
    pol_degree:int=1, # Degree of polynomial trend (default: 1). Used when trend='linear'.
    ets_params:Optional[Dict[str, Any]]=None, # Dictionary of parameters for the ExponentialSmoothing model when using 'ets' trend strategy. The keys should be the parameter names and the values should be the parameter values. Default is None (use default ETS parameters).
    change_points:Optional[List[int]]=None, # Change points for piecewise linear trend. List of indices where the trend slope can change.
    box_cox:Union[bool, float, int]=False, # Whether to apply Box-Cox transformation to the target variable. If a float or int value is provided, it will be used as the lambda parameter for the Box-Cox transformation. If True, the lambda parameter will be estimated from the data.
    box_cox_biasadj:bool=False, # Whether to apply bias adjustment when inverting Box-Cox transformation (default: False).
    add_constant:bool=True, # If True, prepend a constant column to the regressor matrix (default: True).
    cat_variables:Optional[List[str]]=None, # List of categorical feature column names. If provided, these columns will be treated as categorical variables and encoded accordingly. Default is None (no categorical variables).
    categorical_encoder:Optional[Any]=None, # Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(), etc.) to apply to the categorical variables specified in cat_variables. The encoder should have fit() and transform() methods that can be applied to the input DataFrame. Default is None (no categorical encoding) and if None, categorical variables can only be used if the model can handle them natively (e.g. LGBM or CatBoost).
    method:str='posterior', # State assignment method: 'posterior' (soft) or 'viterbi' (hard). Default: 'posterior'.
    switching_var:bool=True, # If True, each regime has its own variance. If False, uses pooled variance. Default: True.
    startprob_prior:float=1000.0, # Dirichlet concentration for initial state distribution. Default: 1e3.
    transmat_prior:float=1000.0, # Dirichlet concentration for transition matrix rows. Default: 1e3.
    n_iter:int=100, # Maximum EM iterations. Default: 100.
    tol:float=0.001, # Convergence tolerance on log-likelihood. Default: 1e-6.
    ridge:float=1e-05, # Ridge regularisation parameter for coefficient estimation. Default: 1e-5.
    coefficients:Optional[np.ndarray]=None, # Initial regression coefficients (shape: n_states x n_features).
    stds:Optional[np.ndarray]=None, # Initial state standard deviations (shape: n_states,).
    init_state:Optional[np.ndarray]=None, # Initial state probability vector (shape: n_states,).
    trans_matrix:Optional[np.ndarray]=None, # Initial transition matrix (shape: n_states x n_states).
    random_state:int=42, # Random seed for reproducibility. Default: 42.
    verbose:bool=False, # If True, print EM progress. Default: False.
)->None:

Initialize the MS-ARR model with the specified parameters.

Type Default Details
n_components int Number of hidden states (regimes).
target_col str Name of the target variable.
lags Optional[Union[int, List[int]]] None Lags for the autoregressive model.
lag_transform Optional[list] None List of lag-transform function objects applied to the target.
difference Optional[int] None Order of ordinary differencing (e.g. 1 for first difference).
seasonal_diff Optional[int] None Seasonal period for seasonal differencing.
trend Optional[str] None Trend strategy: ‘linear’ or ‘ets’.
pol_degree int 1 Degree of polynomial trend (default: 1). Used when trend=‘linear’.
ets_params Optional[Dict[str, Any]] None Dictionary of parameters for the ExponentialSmoothing model when using ‘ets’ trend strategy. The keys should be the parameter names and the values should be the parameter values. Default is None (use default ETS parameters).
change_points Optional[List[int]] None Change points for piecewise linear trend. List of indices where the trend slope can change.
box_cox Union[bool, float, int] False Whether to apply Box-Cox transformation to the target variable. If a float or int value is provided, it will be used as the lambda parameter for the Box-Cox transformation. If True, the lambda parameter will be estimated from the data.
box_cox_biasadj bool False Whether to apply bias adjustment when inverting Box-Cox transformation (default: False).
add_constant bool True If True, prepend a constant column to the regressor matrix (default: True).
cat_variables Optional[List[str]] None List of categorical feature column names. If provided, these columns will be treated as categorical variables and encoded accordingly. Default is None (no categorical variables).
categorical_encoder Optional[Any] None Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(), etc.) to apply to the categorical variables specified in cat_variables. The encoder should have fit() and transform() methods that can be applied to the input DataFrame. Default is None (no categorical encoding) and if None, categorical variables can only be used if the model can handle them natively (e.g. LGBM or CatBoost).
method str posterior State assignment method: ‘posterior’ (soft) or ‘viterbi’ (hard). Default: ‘posterior’.
switching_var bool True If True, each regime has its own variance. If False, uses pooled variance. Default: True.
startprob_prior float 1000.0 Dirichlet concentration for initial state distribution. Default: 1e3.
transmat_prior float 1000.0 Dirichlet concentration for transition matrix rows. Default: 1e3.
n_iter int 100 Maximum EM iterations. Default: 100.
tol float 0.001 Convergence tolerance on log-likelihood. Default: 1e-6.
ridge float 1e-05 Ridge regularisation parameter for coefficient estimation. Default: 1e-5.
coefficients Optional[np.ndarray] None Initial regression coefficients (shape: n_states x n_features).
stds Optional[np.ndarray] None Initial state standard deviations (shape: n_states,).
init_state Optional[np.ndarray] None Initial state probability vector (shape: n_states,).
trans_matrix Optional[np.ndarray] None Initial transition matrix (shape: n_states x n_states).
random_state int 42 Random seed for reproducibility. Default: 42.
verbose bool False If True, print EM progress. Default: False.
Returns None

source

ms_arr.fit


def fit(
    df:pd.DataFrame, # Training DataFrame containing the target and any feature columns.
)->float: # Final log-likelihood after EM convergence.

Fit the model using the EM algorithm.

Type Details
df pd.DataFrame Training DataFrame containing the target and any feature columns.
Returns float Final log-likelihood after EM convergence.

source

ms_arr.forecast


def forecast(
    H:int, # Forecast horizon (number of steps to forecast ahead).
    exog:Optional[pd.DataFrame]=None, # Future exogenous regressors (must contain at least H rows). Should have the same columns as the training data (excluding the target variable).
)->np.ndarray:

Generate forecasts for H future time steps.

Type Default Details
H int Forecast horizon (number of steps to forecast ahead).
exog Optional[pd.DataFrame] None Future exogenous regressors (must contain at least H rows). Should have the same columns as the training data (excluding the target variable).
Returns np.ndarray

source

ms_arr.cross_validate


def cross_validate(
    df:pd.DataFrame, # DataFrame containing the target and any feature columns.
    cv_split:int, # Number of cross-validation splits.
    test_size:int, # Number of time steps in the test set for each split.
    metrics:List[Callable], # Metric functions (e.g. ``[MAE, RMSE]``) used to evaluate forecast accuracy across folds. Call ``.cv_summary()`` after cross-validation to retrieve the aggregated scores.
    step_size:int=1, # Step size between the start of each test set in the splits.
    n_iter:int=1, # Number of EM iterations to run for each training fold.
    h_split_point:Optional[int]=None, # If provided, split the test set into two parts at this index and evaluate metrics separately on each part.
)->Union[pd.DataFrame, Tuple[pd.DataFrame, pd.DataFrame]]: # DataFrame containing the average score for each metric across all splits. If h_split_point is provided, also includes separate scores for the two parts of the test set. If cv_df is True, returns a tuple of (overall_performance_df, cv_predictions_df)

Run cross-validation.

Type Default Details
df pd.DataFrame DataFrame containing the target and any feature columns.
cv_split int Number of cross-validation splits.
test_size int Number of time steps in the test set for each split.
metrics List[Callable] Metric functions (e.g. [MAE, RMSE]) used to evaluate forecast accuracy across folds. Call .cv_summary() after cross-validation to retrieve the aggregated scores.
step_size int 1 Step size between the start of each test set in the splits.
n_iter int 1 Number of EM iterations to run for each training fold.
h_split_point Optional[int] None If provided, split the test set into two parts at this index and evaluate metrics separately on each part.
Returns Union[pd.DataFrame, Tuple[pd.DataFrame, pd.DataFrame]] DataFrame containing the average score for each metric across all splits. If h_split_point is provided, also includes separate scores for the two parts of the test set. If cv_df is True, returns a tuple of (overall_performance_df, cv_predictions_df)