ms_var


def ms_var(
    n_components:int, # Number of hidden states (regimes) in the model.
    target_cols:List[str], # List of column names corresponding to the target variables in the input DataFrame.
    lags:Dict[str, Union[int, List[int]]], # Dictionary specifying the lags to include for each target variable. The keys are target column names, and the values can be either an integer (which expands to a list of lags from 1 to that integer) or a list of specific lag integers.
    lag_transform:Optional[Dict[str, list]]=None, # Dictionary specifying lag-transform functions to apply to each target variable. The keys are target column names, and the values are lists of transformation functions (e.g., rolling mean, expanding std).
    difference:Optional[Dict[str, int]]=None, # Dictionary specifying the order of ordinary differencing to apply to each target variable. The keys are target column names, and the values are integers indicating the number of differences (e.g., 1 for first difference).
    seasonal_diff:Optional[Dict[str, int]]=None, # Dictionary specifying the order of seasonal differencing to apply to each target variable. The keys are target column names, and the values are integers indicating the seasonal period (e.g., 12 for monthly data with yearly seasonality).
    trend:Optional[Dict[str, str]]=None, # Dictionary specifying the trend component to include for each target variable. The keys are target column names, and the values are strings indicating the type of trend (e.g., 'linear', 'ets').
    pol_degree:Optional[Union[int, Dict[str, int]]]=1, # Degree of polynomial for trend component. Can be a single integer applied to all targets or a dictionary keyed by target column name.
    ets_params:Optional[Dict[str, tuple]]=None, # Dictionary specifying ETS model parameters for trend removal. The keys are target column names, and the values are tuples containing the arguments for ExponentialSmoothing (first element) and its fit method (second element).
    change_points:Optional[Dict[str, List[int]]]=None, # Dictionary specifying change points for piecewise linear trend removal. The keys are target column names, and the values are lists of integer indices where the trend slope changes.
    box_cox:Optional[Dict[str, Union[bool, float, int]]]=None, # Dictionary specifying whether to apply Box-Cox transformation to each target variable. Values can be a boolean (True to apply, False to skip), a float (lambda parameter for Box-Cox transformation), or an integer (seasonal period for seasonal Box-Cox transformation). If True, lambda will be estimated from the data.
    box_cox_biasadj:Optional[Dict[str, bool]]=None, # Dictionary specifying whether to apply bias adjustment when inverting the Box-Cox transformation for each target variable. The keys are target column names, and the values are booleans indicating whether to apply bias adjustment.
    add_constant:bool=True, # If True, a constant column will be added to the regressor matrix for each state.
    cat_variables:Optional[List[str]]=None, # List of categorical feature column names to encode. These will be shared across all target variables.
    categorical_encoder:Optional[Union[Dict[str, Any], Any]]=None, # A categorical encoder instance, or a single-entry dictionary mapping the target column to the encoder when the encoder requires access to the target variable during fitting (e.g. {target_col: MeanEncoder()}). If encoder requiring target access is provided directly without the dict format, first target column in target_cols will be used for fitting the encoder. For encoders that do not require target access, pass the encoder instance directly (e.g. OneHotEncoder()).
    method:str='posterior', # Method for state assignment during the E-step. Options are 'posterior' for soft assignments based on posterior probabilities or 'viterbi' for hard assignments using the Viterbi algorithm.
    covariance_type:str='full', # Structure of the emission covariance matrices. Options are 'full' for full covariance matrices or 'diag' for diagonal covariance matrices.
    switching_cov:bool=True, # If True, allow the covariance matrices to switch between states. If False, a single covariance matrix will be shared across all states.
    startprob_prior:float=1000.0, # Concentration parameter for the Dirichlet prior on the initial state distribution. Higher values lead to more uniform priors.
    transmat_prior:float=100000.0, # Concentration parameter for the Dirichlet prior on the transition matrix rows. Higher values lead to more uniform priors.
    n_iter:int=100, # Maximum number of EM iterations to perform during training.
    tol:float=0.001, # Convergence threshold for EM. Training will stop if the change in log-likelihood between iterations is less than this value.
    coefficients:Optional[List[np.ndarray]]=None, # Optional initial state-wise VAR coefficient matrices. If None, they will be initialized using ordinary least squares on the entire dataset.
    init_state:Optional[np.ndarray]=None, # Optional initial state probability vector. If None, it will be initialized from a Dirichlet distribution with concentration parameter `startprob_prior`.
    trans_matrix:Optional[np.ndarray]=None, # Optional initial transition matrix. If None, it will be initialized from a Dirichlet distribution with concentration parameter `transmat_prior`.
    random_state:Optional[int]=None, # Random seed for reproducibility of the initial state distribution and transition matrix.
    verbose:bool=False, # If True, print log-likelihood at each EM iteration and convergence message.
)->None:

Initialize the Markov-Switching Vector Autoregression (MS-VAR) model with specified parameters and preprocessing options.

	Type	Default	Details
n_components	int		Number of hidden states (regimes) in the model.
target_cols	List[str]		List of column names corresponding to the target variables in the input DataFrame.
lags	Dict[str, Union[int, List[int]]]		Dictionary specifying the lags to include for each target variable. The keys are target column names, and the values can be either an integer (which expands to a list of lags from 1 to that integer) or a list of specific lag integers.
lag_transform	Optional[Dict[str, list]]	None	Dictionary specifying lag-transform functions to apply to each target variable. The keys are target column names, and the values are lists of transformation functions (e.g., rolling mean, expanding std).
difference	Optional[Dict[str, int]]	None	Dictionary specifying the order of ordinary differencing to apply to each target variable. The keys are target column names, and the values are integers indicating the number of differences (e.g., 1 for first difference).
seasonal_diff	Optional[Dict[str, int]]	None	Dictionary specifying the order of seasonal differencing to apply to each target variable. The keys are target column names, and the values are integers indicating the seasonal period (e.g., 12 for monthly data with yearly seasonality).
trend	Optional[Dict[str, str]]	None	Dictionary specifying the trend component to include for each target variable. The keys are target column names, and the values are strings indicating the type of trend (e.g., ‘linear’, ‘ets’).
pol_degree	Optional[Union[int, Dict[str, int]]]	1	Degree of polynomial for trend component. Can be a single integer applied to all targets or a dictionary keyed by target column name.
ets_params	Optional[Dict[str, tuple]]	None	Dictionary specifying ETS model parameters for trend removal. The keys are target column names, and the values are tuples containing the arguments for ExponentialSmoothing (first element) and its fit method (second element).
change_points	Optional[Dict[str, List[int]]]	None	Dictionary specifying change points for piecewise linear trend removal. The keys are target column names, and the values are lists of integer indices where the trend slope changes.
box_cox	Optional[Dict[str, Union[bool, float, int]]]	None	Dictionary specifying whether to apply Box-Cox transformation to each target variable. Values can be a boolean (True to apply, False to skip), a float (lambda parameter for Box-Cox transformation), or an integer (seasonal period for seasonal Box-Cox transformation). If True, lambda will be estimated from the data.
box_cox_biasadj	Optional[Dict[str, bool]]	None	Dictionary specifying whether to apply bias adjustment when inverting the Box-Cox transformation for each target variable. The keys are target column names, and the values are booleans indicating whether to apply bias adjustment.
add_constant	bool	True	If True, a constant column will be added to the regressor matrix for each state.
cat_variables	Optional[List[str]]	None	List of categorical feature column names to encode. These will be shared across all target variables.
categorical_encoder	Optional[Union[Dict[str, Any], Any]]	None	A categorical encoder instance, or a single-entry dictionary mapping the target column to the encoder when the encoder requires access to the target variable during fitting (e.g. {target_col: MeanEncoder()}). If encoder requiring target access is provided directly without the dict format, first target column in target_cols will be used for fitting the encoder. For encoders that do not require target access, pass the encoder instance directly (e.g. OneHotEncoder()).
method	str	posterior	Method for state assignment during the E-step. Options are ‘posterior’ for soft assignments based on posterior probabilities or ‘viterbi’ for hard assignments using the Viterbi algorithm.
covariance_type	str	full	Structure of the emission covariance matrices. Options are ‘full’ for full covariance matrices or ‘diag’ for diagonal covariance matrices.
switching_cov	bool	True	If True, allow the covariance matrices to switch between states. If False, a single covariance matrix will be shared across all states.
startprob_prior	float	1000.0	Concentration parameter for the Dirichlet prior on the initial state distribution. Higher values lead to more uniform priors.
transmat_prior	float	100000.0	Concentration parameter for the Dirichlet prior on the transition matrix rows. Higher values lead to more uniform priors.
n_iter	int	100	Maximum number of EM iterations to perform during training.
tol	float	0.001	Convergence threshold for EM. Training will stop if the change in log-likelihood between iterations is less than this value.
coefficients	Optional[List[np.ndarray]]	None	Optional initial state-wise VAR coefficient matrices. If None, they will be initialized using ordinary least squares on the entire dataset.
init_state	Optional[np.ndarray]	None	Optional initial state probability vector. If None, it will be initialized from a Dirichlet distribution with concentration parameter `startprob_prior`.
trans_matrix	Optional[np.ndarray]	None	Optional initial transition matrix. If None, it will be initialized from a Dirichlet distribution with concentration parameter `transmat_prior`.
random_state	Optional[int]	None	Random seed for reproducibility of the initial state distribution and transition matrix.
verbose	bool	False	If True, print log-likelihood at each EM iteration and convergence message.
Returns	None

source

ms_var.fit


def fit(
    df:pd.DataFrame, # Training DataFrame containing all target and feature columns. Must include the columns specified in `target_cols` and any columns needed as regressors (e.g., categorical variables).
)->float: # Final log-likelihood after the additional EM iterations.

Fit the model using EM iterations until convergence on training data.

	Type	Details
df	pd.DataFrame	Training DataFrame containing all target and feature columns. Must include the columns specified in `target_cols` and any columns needed as regressors (e.g., categorical variables).
Returns	float	Final log-likelihood after the additional EM iterations.

source

ms_var.forecast


def forecast(
    H:int, # Forecast horizon (number of future time steps to predict).
    exog:Optional[pd.DataFrame]=None, # Future exogenous regressors. If provided, must contain at least H rows and the same columns as the regressors used during training (excluding target columns).
)->Dict[str, np.ndarray]: # Forecasted values for each target variable, returned as a dictionary keyed by target column name.

Generate forecasts for H future time steps using the fitted MS-VAR model.

	Type	Default	Details
H	int		Forecast horizon (number of future time steps to predict).
exog	Optional[pd.DataFrame]	None	Future exogenous regressors. If provided, must contain at least H rows and the same columns as the regressors used during training (excluding target columns).
Returns	Dict[str, np.ndarray]		Forecasted values for each target variable, returned as a dictionary keyed by target column name.

source

ms_var.cross_validate


def cross_validate(
    df:pd.DataFrame, # Input dataframe.
    target_col:str, # Target variable for evaluation.
    cv_split:int, # Number of cross-validation folds.
    test_size:int, # Test size per fold.
    metrics:List[Callable], # Metric functions (e.g. ``[MAE, RMSE]``) used to evaluate forecast accuracy across folds. Call ``.cv_summary()`` after cross-validation to retrieve the aggregated scores.
    step_size:int=1, # Step size for rolling window. Default is 1.
    n_iter:int=1, # Number of iterations for each fold. Default is 1.
    h_split_point:Optional[int]=None, # Point to split the test set for separate evaluation. Default is None.
)->Union[pd.DataFrame, Tuple[pd.DataFrame, pd.DataFrame]]: # DataFrame with averaged cross-validation metric scores.

Perform cross-validation.

	Type	Default	Details
df	pd.DataFrame		Input dataframe.
target_col	str		Target variable for evaluation.
cv_split	int		Number of cross-validation folds.
test_size	int		Test size per fold.
metrics	List[Callable]		Metric functions (e.g. `[MAE, RMSE]`) used to evaluate forecast accuracy across folds. Call `.cv_summary()` after cross-validation to retrieve the aggregated scores.
step_size	int	1	Step size for rolling window. Default is 1.
n_iter	int	1	Number of iterations for each fold. Default is 1.
h_split_point	Optional[int]	None	Point to split the test set for separate evaluation. Default is None.
Returns	Union[pd.DataFrame, Tuple[pd.DataFrame, pd.DataFrame]]		DataFrame with averaged cross-validation metric scores.