| Type | Default | Details | |
|---|---|---|---|
| n_components | int | Number of hidden states (regimes) in the model. | |
| target_cols | List[str] | List of column names corresponding to the target variables in the input DataFrame. | |
| lags | Dict[str, Union[int, List[int]]] | Dictionary specifying the lags to include for each target variable. The keys are target column names, and the values can be either an integer (which expands to a list of lags from 1 to that integer) or a list of specific lag integers. | |
| lag_transform | Optional[Dict[str, list]] | None | Dictionary specifying lag-transform functions to apply to each target variable. The keys are target column names, and the values are lists of transformation functions (e.g., rolling mean, expanding std). |
| difference | Optional[Dict[str, int]] | None | Dictionary specifying the order of ordinary differencing to apply to each target variable. The keys are target column names, and the values are integers indicating the number of differences (e.g., 1 for first difference). |
| seasonal_diff | Optional[Dict[str, int]] | None | Dictionary specifying the order of seasonal differencing to apply to each target variable. The keys are target column names, and the values are integers indicating the seasonal period (e.g., 12 for monthly data with yearly seasonality). |
| trend | Optional[Dict[str, str]] | None | Dictionary specifying the trend component to include for each target variable. The keys are target column names, and the values are strings indicating the type of trend (e.g., ‘linear’, ‘ets’). |
| pol_degree | Optional[Union[int, Dict[str, int]]] | 1 | Degree of polynomial for trend component. Can be a single integer applied to all targets or a dictionary keyed by target column name. |
| ets_params | Optional[Dict[str, tuple]] | None | Dictionary specifying ETS model parameters for trend removal. The keys are target column names, and the values are tuples containing the arguments for ExponentialSmoothing (first element) and its fit method (second element). |
| change_points | Optional[Dict[str, List[int]]] | None | Dictionary specifying change points for piecewise linear trend removal. The keys are target column names, and the values are lists of integer indices where the trend slope changes. |
| box_cox | Optional[Dict[str, Union[bool, float, int]]] | None | Dictionary specifying whether to apply Box-Cox transformation to each target variable. Values can be a boolean (True to apply, False to skip), a float (lambda parameter for Box-Cox transformation), or an integer (seasonal period for seasonal Box-Cox transformation). If True, lambda will be estimated from the data. |
| box_cox_biasadj | Optional[Dict[str, bool]] | None | Dictionary specifying whether to apply bias adjustment when inverting the Box-Cox transformation for each target variable. The keys are target column names, and the values are booleans indicating whether to apply bias adjustment. |
| add_constant | bool | True | If True, a constant column will be added to the regressor matrix for each state. |
| cat_variables | Optional[List[str]] | None | List of categorical feature column names to encode. These will be shared across all target variables. |
| categorical_encoder | Optional[Union[Dict[str, Any], Any]] | None | A categorical encoder instance, or a single-entry dictionary mapping the target column to the encoder when the encoder requires access to the target variable during fitting (e.g. {target_col: MeanEncoder()}). If encoder requiring target access is provided directly without the dict format, first target column in target_cols will be used for fitting the encoder. For encoders that do not require target access, pass the encoder instance directly (e.g. OneHotEncoder()). |
| method | str | posterior | Method for state assignment during the E-step. Options are ‘posterior’ for soft assignments based on posterior probabilities or ‘viterbi’ for hard assignments using the Viterbi algorithm. |
| covariance_type | str | full | Structure of the emission covariance matrices. Options are ‘full’ for full covariance matrices or ‘diag’ for diagonal covariance matrices. |
| switching_cov | bool | True | If True, allow the covariance matrices to switch between states. If False, a single covariance matrix will be shared across all states. |
| startprob_prior | float | 1000.0 | Concentration parameter for the Dirichlet prior on the initial state distribution. Higher values lead to more uniform priors. |
| transmat_prior | float | 100000.0 | Concentration parameter for the Dirichlet prior on the transition matrix rows. Higher values lead to more uniform priors. |
| n_iter | int | 100 | Maximum number of EM iterations to perform during training. |
| tol | float | 0.001 | Convergence threshold for EM. Training will stop if the change in log-likelihood between iterations is less than this value. |
| coefficients | Optional[List[np.ndarray]] | None | Optional initial state-wise VAR coefficient matrices. If None, they will be initialized using ordinary least squares on the entire dataset. |
| init_state | Optional[np.ndarray] | None | Optional initial state probability vector. If None, it will be initialized from a Dirichlet distribution with concentration parameter startprob_prior. |
| trans_matrix | Optional[np.ndarray] | None | Optional initial transition matrix. If None, it will be initialized from a Dirichlet distribution with concentration parameter transmat_prior. |
| random_state | Optional[int] | None | Random seed for reproducibility of the initial state distribution and transition matrix. |
| verbose | bool | False | If True, print log-likelihood at each EM iteration and convergence message. |
| Returns | None |
ms_var
def ms_var(
n_components:int, # Number of hidden states (regimes) in the model.
target_cols:List[str], # List of column names corresponding to the target variables in the input DataFrame.
lags:Dict[str, Union[int, List[int]]], # Dictionary specifying the lags to include for each target variable. The keys are target column names, and the values can be either an integer (which expands to a list of lags from 1 to that integer) or a list of specific lag integers.
lag_transform:Optional[Dict[str, list]]=None, # Dictionary specifying lag-transform functions to apply to each target variable. The keys are target column names, and the values are lists of transformation functions (e.g., rolling mean, expanding std).
difference:Optional[Dict[str, int]]=None, # Dictionary specifying the order of ordinary differencing to apply to each target variable. The keys are target column names, and the values are integers indicating the number of differences (e.g., 1 for first difference).
seasonal_diff:Optional[Dict[str, int]]=None, # Dictionary specifying the order of seasonal differencing to apply to each target variable. The keys are target column names, and the values are integers indicating the seasonal period (e.g., 12 for monthly data with yearly seasonality).
trend:Optional[Dict[str, str]]=None, # Dictionary specifying the trend component to include for each target variable. The keys are target column names, and the values are strings indicating the type of trend (e.g., 'linear', 'ets').
pol_degree:Optional[Union[int, Dict[str, int]]]=1, # Degree of polynomial for trend component. Can be a single integer applied to all targets or a dictionary keyed by target column name.
ets_params:Optional[Dict[str, tuple]]=None, # Dictionary specifying ETS model parameters for trend removal. The keys are target column names, and the values are tuples containing the arguments for ExponentialSmoothing (first element) and its fit method (second element).
change_points:Optional[Dict[str, List[int]]]=None, # Dictionary specifying change points for piecewise linear trend removal. The keys are target column names, and the values are lists of integer indices where the trend slope changes.
box_cox:Optional[Dict[str, Union[bool, float, int]]]=None, # Dictionary specifying whether to apply Box-Cox transformation to each target variable. Values can be a boolean (True to apply, False to skip), a float (lambda parameter for Box-Cox transformation), or an integer (seasonal period for seasonal Box-Cox transformation). If True, lambda will be estimated from the data.
box_cox_biasadj:Optional[Dict[str, bool]]=None, # Dictionary specifying whether to apply bias adjustment when inverting the Box-Cox transformation for each target variable. The keys are target column names, and the values are booleans indicating whether to apply bias adjustment.
add_constant:bool=True, # If True, a constant column will be added to the regressor matrix for each state.
cat_variables:Optional[List[str]]=None, # List of categorical feature column names to encode. These will be shared across all target variables.
categorical_encoder:Optional[Union[Dict[str, Any], Any]]=None, # A categorical encoder instance, or a single-entry dictionary mapping the target column to the encoder when the encoder requires access to the target variable during fitting (e.g. {target_col: MeanEncoder()}). If encoder requiring target access is provided directly without the dict format, first target column in target_cols will be used for fitting the encoder. For encoders that do not require target access, pass the encoder instance directly (e.g. OneHotEncoder()).
method:str='posterior', # Method for state assignment during the E-step. Options are 'posterior' for soft assignments based on posterior probabilities or 'viterbi' for hard assignments using the Viterbi algorithm.
covariance_type:str='full', # Structure of the emission covariance matrices. Options are 'full' for full covariance matrices or 'diag' for diagonal covariance matrices.
switching_cov:bool=True, # If True, allow the covariance matrices to switch between states. If False, a single covariance matrix will be shared across all states.
startprob_prior:float=1000.0, # Concentration parameter for the Dirichlet prior on the initial state distribution. Higher values lead to more uniform priors.
transmat_prior:float=100000.0, # Concentration parameter for the Dirichlet prior on the transition matrix rows. Higher values lead to more uniform priors.
n_iter:int=100, # Maximum number of EM iterations to perform during training.
tol:float=0.001, # Convergence threshold for EM. Training will stop if the change in log-likelihood between iterations is less than this value.
coefficients:Optional[List[np.ndarray]]=None, # Optional initial state-wise VAR coefficient matrices. If None, they will be initialized using ordinary least squares on the entire dataset.
init_state:Optional[np.ndarray]=None, # Optional initial state probability vector. If None, it will be initialized from a Dirichlet distribution with concentration parameter `startprob_prior`.
trans_matrix:Optional[np.ndarray]=None, # Optional initial transition matrix. If None, it will be initialized from a Dirichlet distribution with concentration parameter `transmat_prior`.
random_state:Optional[int]=None, # Random seed for reproducibility of the initial state distribution and transition matrix.
verbose:bool=False, # If True, print log-likelihood at each EM iteration and convergence message.
)->None:
Initialize the Markov-Switching Vector Autoregression (MS-VAR) model with specified parameters and preprocessing options.
ms_var.fit
def fit(
df:pd.DataFrame, # Training DataFrame containing all target and feature columns. Must include the columns specified in `target_cols` and any columns needed as regressors (e.g., categorical variables).
)->float: # Final log-likelihood after the additional EM iterations.
Fit the model using EM iterations until convergence on training data.
| Type | Details | |
|---|---|---|
| df | pd.DataFrame | Training DataFrame containing all target and feature columns. Must include the columns specified in target_cols and any columns needed as regressors (e.g., categorical variables). |
| Returns | float | Final log-likelihood after the additional EM iterations. |
ms_var.forecast
def forecast(
H:int, # Forecast horizon (number of future time steps to predict).
exog:Optional[pd.DataFrame]=None, # Future exogenous regressors. If provided, must contain at least H rows and the same columns as the regressors used during training (excluding target columns).
)->Dict[str, np.ndarray]: # Forecasted values for each target variable, returned as a dictionary keyed by target column name.
Generate forecasts for H future time steps using the fitted MS-VAR model.
| Type | Default | Details | |
|---|---|---|---|
| H | int | Forecast horizon (number of future time steps to predict). | |
| exog | Optional[pd.DataFrame] | None | Future exogenous regressors. If provided, must contain at least H rows and the same columns as the regressors used during training (excluding target columns). |
| Returns | Dict[str, np.ndarray] | Forecasted values for each target variable, returned as a dictionary keyed by target column name. |
ms_var.cross_validate
def cross_validate(
df:pd.DataFrame, # Input dataframe.
target_col:str, # Target variable for evaluation.
cv_split:int, # Number of cross-validation folds.
test_size:int, # Test size per fold.
metrics:List[Callable], # Metric functions (e.g. ``[MAE, RMSE]``) used to evaluate forecast accuracy across folds. Call ``.cv_summary()`` after cross-validation to retrieve the aggregated scores.
step_size:int=1, # Step size for rolling window. Default is 1.
n_iter:int=1, # Number of iterations for each fold. Default is 1.
h_split_point:Optional[int]=None, # Point to split the test set for separate evaluation. Default is None.
)->Union[pd.DataFrame, Tuple[pd.DataFrame, pd.DataFrame]]: # DataFrame with averaged cross-validation metric scores.
Perform cross-validation.
| Type | Default | Details | |
|---|---|---|---|
| df | pd.DataFrame | Input dataframe. | |
| target_col | str | Target variable for evaluation. | |
| cv_split | int | Number of cross-validation folds. | |
| test_size | int | Test size per fold. | |
| metrics | List[Callable] | Metric functions (e.g. [MAE, RMSE]) used to evaluate forecast accuracy across folds. Call .cv_summary() after cross-validation to retrieve the aggregated scores. |
|
| step_size | int | 1 | Step size for rolling window. Default is 1. |
| n_iter | int | 1 | Number of iterations for each fold. Default is 1. |
| h_split_point | Optional[int] | None | Point to split the test set for separate evaluation. Default is None. |
| Returns | Union[pd.DataFrame, Tuple[pd.DataFrame, pd.DataFrame]] | DataFrame with averaged cross-validation metric scores. |