ms_arr


def ms_arr(
    n_components:int, # Number of hidden states (regimes).
    target_col:str, # Name of the target variable.
    lags:Optional[Union[int, List[int]]]=None, # Lags for the autoregressive model.
    lag_transform:Optional[list]=None, # List of lag-transform function objects applied to the target.
    difference:Optional[int]=None, # Order of ordinary differencing (e.g. 1 for first difference).
    seasonal_diff:Optional[int]=None, # Seasonal period for seasonal differencing.
    trend:Optional[str]=None, # Trend strategy: 'linear' or 'ets'.
    pol_degree:int=1, # Degree of polynomial trend (default: 1). Used when trend='linear'.
    ets_params:Optional[Dict[str, Any]]=None, # Dictionary of parameters for the ExponentialSmoothing model when using 'ets' trend strategy. The keys should be the parameter names and the values should be the parameter values. Default is None (use default ETS parameters).
    change_points:Optional[List[int]]=None, # Change points for piecewise linear trend. List of indices where the trend slope can change.
    box_cox:Union[bool, float, int]=False, # Whether to apply Box-Cox transformation to the target variable. If a float or int value is provided, it will be used as the lambda parameter for the Box-Cox transformation. If True, the lambda parameter will be estimated from the data.
    box_cox_biasadj:bool=False, # Whether to apply bias adjustment when inverting Box-Cox transformation (default: False).
    add_constant:bool=True, # If True, prepend a constant column to the regressor matrix (default: True).
    cat_variables:Optional[List[str]]=None, # List of categorical feature column names. If provided, these columns will be treated as categorical variables and encoded accordingly. Default is None (no categorical variables).
    categorical_encoder:Optional[Any]=None, # Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(), etc.) to apply to the categorical variables specified in cat_variables. The encoder should have fit() and transform() methods that can be applied to the input DataFrame. Default is None (no categorical encoding) and if None, categorical variables can only be used if the model can handle them natively (e.g. LGBM or CatBoost).
    method:str='posterior', # State assignment method: 'posterior' (soft) or 'viterbi' (hard). Default: 'posterior'.
    switching_var:bool=True, # If True, each regime has its own variance. If False, uses pooled variance. Default: True.
    startprob_prior:float=1000.0, # Dirichlet concentration for initial state distribution. Default: 1e3.
    transmat_prior:float=1000.0, # Dirichlet concentration for transition matrix rows. Default: 1e3.
    n_iter:int=100, # Maximum EM iterations. Default: 100.
    tol:float=0.001, # Convergence tolerance on log-likelihood. Default: 1e-6.
    ridge:float=1e-05, # Ridge regularisation parameter for coefficient estimation. Default: 1e-5.
    coefficients:Optional[np.ndarray]=None, # Initial regression coefficients (shape: n_states x n_features).
    stds:Optional[np.ndarray]=None, # Initial state standard deviations (shape: n_states,).
    init_state:Optional[np.ndarray]=None, # Initial state probability vector (shape: n_states,).
    trans_matrix:Optional[np.ndarray]=None, # Initial transition matrix (shape: n_states x n_states).
    random_state:int=42, # Random seed for reproducibility. Default: 42.
    verbose:bool=False, # If True, print EM progress. Default: False.
)->None:

Initialize the MS-ARR model with the specified parameters.

	Type	Default	Details
n_components	int		Number of hidden states (regimes).
target_col	str		Name of the target variable.
lags	Optional[Union[int, List[int]]]	None	Lags for the autoregressive model.
lag_transform	Optional[list]	None	List of lag-transform function objects applied to the target.
difference	Optional[int]	None	Order of ordinary differencing (e.g. 1 for first difference).
seasonal_diff	Optional[int]	None	Seasonal period for seasonal differencing.
trend	Optional[str]	None	Trend strategy: ‘linear’ or ‘ets’.
pol_degree	int	1	Degree of polynomial trend (default: 1). Used when trend=‘linear’.
ets_params	Optional[Dict[str, Any]]	None	Dictionary of parameters for the ExponentialSmoothing model when using ‘ets’ trend strategy. The keys should be the parameter names and the values should be the parameter values. Default is None (use default ETS parameters).
change_points	Optional[List[int]]	None	Change points for piecewise linear trend. List of indices where the trend slope can change.
box_cox	Union[bool, float, int]	False	Whether to apply Box-Cox transformation to the target variable. If a float or int value is provided, it will be used as the lambda parameter for the Box-Cox transformation. If True, the lambda parameter will be estimated from the data.
box_cox_biasadj	bool	False	Whether to apply bias adjustment when inverting Box-Cox transformation (default: False).
add_constant	bool	True	If True, prepend a constant column to the regressor matrix (default: True).
cat_variables	Optional[List[str]]	None	List of categorical feature column names. If provided, these columns will be treated as categorical variables and encoded accordingly. Default is None (no categorical variables).
categorical_encoder	Optional[Any]	None	Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(), etc.) to apply to the categorical variables specified in cat_variables. The encoder should have fit() and transform() methods that can be applied to the input DataFrame. Default is None (no categorical encoding) and if None, categorical variables can only be used if the model can handle them natively (e.g. LGBM or CatBoost).
method	str	posterior	State assignment method: ‘posterior’ (soft) or ‘viterbi’ (hard). Default: ‘posterior’.
switching_var	bool	True	If True, each regime has its own variance. If False, uses pooled variance. Default: True.
startprob_prior	float	1000.0	Dirichlet concentration for initial state distribution. Default: 1e3.
transmat_prior	float	1000.0	Dirichlet concentration for transition matrix rows. Default: 1e3.
n_iter	int	100	Maximum EM iterations. Default: 100.
tol	float	0.001	Convergence tolerance on log-likelihood. Default: 1e-6.
ridge	float	1e-05	Ridge regularisation parameter for coefficient estimation. Default: 1e-5.
coefficients	Optional[np.ndarray]	None	Initial regression coefficients (shape: n_states x n_features).
stds	Optional[np.ndarray]	None	Initial state standard deviations (shape: n_states,).
init_state	Optional[np.ndarray]	None	Initial state probability vector (shape: n_states,).
trans_matrix	Optional[np.ndarray]	None	Initial transition matrix (shape: n_states x n_states).
random_state	int	42	Random seed for reproducibility. Default: 42.
verbose	bool	False	If True, print EM progress. Default: False.
Returns	None

source

ms_arr.fit


def fit(
    df:pd.DataFrame, # Training DataFrame containing the target and any feature columns.
)->float: # Final log-likelihood after EM convergence.

Fit the model using the EM algorithm.

	Type	Details
df	pd.DataFrame	Training DataFrame containing the target and any feature columns.
Returns	float	Final log-likelihood after EM convergence.

source

ms_arr.forecast


def forecast(
    H:int, # Forecast horizon (number of steps to forecast ahead).
    exog:Optional[pd.DataFrame]=None, # Future exogenous regressors (must contain at least H rows). Should have the same columns as the training data (excluding the target variable).
)->np.ndarray:

Generate forecasts for H future time steps.

	Type	Default	Details
H	int		Forecast horizon (number of steps to forecast ahead).
exog	Optional[pd.DataFrame]	None	Future exogenous regressors (must contain at least H rows). Should have the same columns as the training data (excluding the target variable).
Returns	np.ndarray

source

ms_arr.cross_validate


def cross_validate(
    df:pd.DataFrame, # DataFrame containing the target and any feature columns.
    cv_split:int, # Number of cross-validation splits.
    test_size:int, # Number of time steps in the test set for each split.
    metrics:List[Callable], # Metric functions (e.g. ``[MAE, RMSE]``) used to evaluate forecast accuracy across folds. Call ``.cv_summary()`` after cross-validation to retrieve the aggregated scores.
    step_size:int=1, # Step size between the start of each test set in the splits.
    n_iter:int=1, # Number of EM iterations to run for each training fold.
    h_split_point:Optional[int]=None, # If provided, split the test set into two parts at this index and evaluate metrics separately on each part.
)->Union[pd.DataFrame, Tuple[pd.DataFrame, pd.DataFrame]]: # DataFrame containing the average score for each metric across all splits. If h_split_point is provided, also includes separate scores for the two parts of the test set. If cv_df is True, returns a tuple of (overall_performance_df, cv_predictions_df)

Run cross-validation.

	Type	Default	Details
df	pd.DataFrame		DataFrame containing the target and any feature columns.
cv_split	int		Number of cross-validation splits.
test_size	int		Number of time steps in the test set for each split.
metrics	List[Callable]		Metric functions (e.g. `[MAE, RMSE]`) used to evaluate forecast accuracy across folds. Call `.cv_summary()` after cross-validation to retrieve the aggregated scores.
step_size	int	1	Step size between the start of each test set in the splits.
n_iter	int	1	Number of EM iterations to run for each training fold.
h_split_point	Optional[int]	None	If provided, split the test set into two parts at this index and evaluate metrics separately on each part.
Returns	Union[pd.DataFrame, Tuple[pd.DataFrame, pd.DataFrame]]		DataFrame containing the average score for each metric across all splits. If h_split_point is provided, also includes separate scores for the two parts of the test set. If cv_df is True, returns a tuple of (overall_performance_df, cv_predictions_df)