source

ms_var


def ms_var(
    n_components:int, # Number of hidden states (regimes) in the model.
    target_cols:List[str], # List of column names corresponding to the target variables in the input DataFrame.
    lags:Dict[str, Union[int, List[int]]], # Dictionary specifying the lags to include for each target variable. The keys are target column names, and the values can be either an integer (which expands to a list of lags from 1 to that integer) or a list of specific lag integers.
    lag_transform:Optional[Dict[str, list]]=None, # Dictionary specifying lag-transform functions to apply to each target variable. The keys are target column names, and the values are lists of transformation functions (e.g., rolling mean, expanding std).
    difference:Optional[Dict[str, int]]=None, # Dictionary specifying the order of ordinary differencing to apply to each target variable. The keys are target column names, and the values are integers indicating the number of differences (e.g., 1 for first difference).
    seasonal_diff:Optional[Dict[str, int]]=None, # Dictionary specifying the order of seasonal differencing to apply to each target variable. The keys are target column names, and the values are integers indicating the seasonal period (e.g., 12 for monthly data with yearly seasonality).
    trend:Optional[Dict[str, str]]=None, # Dictionary specifying the trend component to include for each target variable. The keys are target column names, and the values are strings indicating the type of trend (e.g., 'linear', 'ets').
    pol_degree:Optional[Union[int, Dict[str, int]]]=1, # Degree of polynomial for trend component. Can be a single integer applied to all targets or a dictionary keyed by target column name.
    ets_params:Optional[Dict[str, tuple]]=None, # Dictionary specifying ETS model parameters for trend removal. The keys are target column names, and the values are tuples containing the arguments for ExponentialSmoothing (first element) and its fit method (second element).
    change_points:Optional[Dict[str, List[int]]]=None, # Dictionary specifying change points for piecewise linear trend removal. The keys are target column names, and the values are lists of integer indices where the trend slope changes.
    box_cox:Optional[Dict[str, Union[bool, float, int]]]=None, # Dictionary specifying whether to apply Box-Cox transformation to each target variable. Values can be a boolean (True to apply, False to skip), a float (lambda parameter for Box-Cox transformation), or an integer (seasonal period for seasonal Box-Cox transformation). If True, lambda will be estimated from the data.
    box_cox_biasadj:Optional[Dict[str, bool]]=None, # Dictionary specifying whether to apply bias adjustment when inverting the Box-Cox transformation for each target variable. The keys are target column names, and the values are booleans indicating whether to apply bias adjustment.
    add_constant:bool=True, # If True, a constant column will be added to the regressor matrix for each state.
    cat_variables:Optional[List[str]]=None, # List of categorical feature column names to encode. These will be shared across all target variables.
    categorical_encoder:Optional[Union[Dict[str, Any], Any]]=None, # A categorical encoder instance, or a single-entry dictionary mapping the target column to the encoder when the encoder requires access to the target variable during fitting (e.g. {target_col: MeanEncoder()}). If encoder requiring target access is provided directly without the dict format, first target column in target_cols will be used for fitting the encoder. For encoders that do not require target access, pass the encoder instance directly (e.g. OneHotEncoder()).
    method:str='posterior', # Method for state assignment during the E-step. Options are 'posterior' for soft assignments based on posterior probabilities or 'viterbi' for hard assignments using the Viterbi algorithm.
    covariance_type:str='full', # Structure of the emission covariance matrices. Options are 'full' for full covariance matrices or 'diag' for diagonal covariance matrices.
    switching_cov:bool=True, # If True, allow the covariance matrices to switch between states. If False, a single covariance matrix will be shared across all states.
    startprob_prior:float=1000.0, # Concentration parameter for the Dirichlet prior on the initial state distribution. Higher values lead to more uniform priors.
    transmat_prior:float=100000.0, # Concentration parameter for the Dirichlet prior on the transition matrix rows. Higher values lead to more uniform priors.
    n_iter:int=100, # Maximum number of EM iterations to perform during training.
    tol:float=0.001, # Convergence threshold for EM. Training will stop if the change in log-likelihood between iterations is less than this value.
    coefficients:Optional[List[np.ndarray]]=None, # Optional initial state-wise VAR coefficient matrices. If None, they will be initialized using ordinary least squares on the entire dataset.
    init_state:Optional[np.ndarray]=None, # Optional initial state probability vector. If None, it will be initialized from a Dirichlet distribution with concentration parameter `startprob_prior`.
    trans_matrix:Optional[np.ndarray]=None, # Optional initial transition matrix. If None, it will be initialized from a Dirichlet distribution with concentration parameter `transmat_prior`.
    random_state:Optional[int]=None, # Random seed for reproducibility of the initial state distribution and transition matrix.
    verbose:bool=False, # If True, print log-likelihood at each EM iteration and convergence message.
)->None:

Initialize the Markov-Switching Vector Autoregression (MS-VAR) model with specified parameters and preprocessing options.

Type Default Details
n_components int Number of hidden states (regimes) in the model.
target_cols List[str] List of column names corresponding to the target variables in the input DataFrame.
lags Dict[str, Union[int, List[int]]] Dictionary specifying the lags to include for each target variable. The keys are target column names, and the values can be either an integer (which expands to a list of lags from 1 to that integer) or a list of specific lag integers.
lag_transform Optional[Dict[str, list]] None Dictionary specifying lag-transform functions to apply to each target variable. The keys are target column names, and the values are lists of transformation functions (e.g., rolling mean, expanding std).
difference Optional[Dict[str, int]] None Dictionary specifying the order of ordinary differencing to apply to each target variable. The keys are target column names, and the values are integers indicating the number of differences (e.g., 1 for first difference).
seasonal_diff Optional[Dict[str, int]] None Dictionary specifying the order of seasonal differencing to apply to each target variable. The keys are target column names, and the values are integers indicating the seasonal period (e.g., 12 for monthly data with yearly seasonality).
trend Optional[Dict[str, str]] None Dictionary specifying the trend component to include for each target variable. The keys are target column names, and the values are strings indicating the type of trend (e.g., ‘linear’, ‘ets’).
pol_degree Optional[Union[int, Dict[str, int]]] 1 Degree of polynomial for trend component. Can be a single integer applied to all targets or a dictionary keyed by target column name.
ets_params Optional[Dict[str, tuple]] None Dictionary specifying ETS model parameters for trend removal. The keys are target column names, and the values are tuples containing the arguments for ExponentialSmoothing (first element) and its fit method (second element).
change_points Optional[Dict[str, List[int]]] None Dictionary specifying change points for piecewise linear trend removal. The keys are target column names, and the values are lists of integer indices where the trend slope changes.
box_cox Optional[Dict[str, Union[bool, float, int]]] None Dictionary specifying whether to apply Box-Cox transformation to each target variable. Values can be a boolean (True to apply, False to skip), a float (lambda parameter for Box-Cox transformation), or an integer (seasonal period for seasonal Box-Cox transformation). If True, lambda will be estimated from the data.
box_cox_biasadj Optional[Dict[str, bool]] None Dictionary specifying whether to apply bias adjustment when inverting the Box-Cox transformation for each target variable. The keys are target column names, and the values are booleans indicating whether to apply bias adjustment.
add_constant bool True If True, a constant column will be added to the regressor matrix for each state.
cat_variables Optional[List[str]] None List of categorical feature column names to encode. These will be shared across all target variables.
categorical_encoder Optional[Union[Dict[str, Any], Any]] None A categorical encoder instance, or a single-entry dictionary mapping the target column to the encoder when the encoder requires access to the target variable during fitting (e.g. {target_col: MeanEncoder()}). If encoder requiring target access is provided directly without the dict format, first target column in target_cols will be used for fitting the encoder. For encoders that do not require target access, pass the encoder instance directly (e.g. OneHotEncoder()).
method str posterior Method for state assignment during the E-step. Options are ‘posterior’ for soft assignments based on posterior probabilities or ‘viterbi’ for hard assignments using the Viterbi algorithm.
covariance_type str full Structure of the emission covariance matrices. Options are ‘full’ for full covariance matrices or ‘diag’ for diagonal covariance matrices.
switching_cov bool True If True, allow the covariance matrices to switch between states. If False, a single covariance matrix will be shared across all states.
startprob_prior float 1000.0 Concentration parameter for the Dirichlet prior on the initial state distribution. Higher values lead to more uniform priors.
transmat_prior float 100000.0 Concentration parameter for the Dirichlet prior on the transition matrix rows. Higher values lead to more uniform priors.
n_iter int 100 Maximum number of EM iterations to perform during training.
tol float 0.001 Convergence threshold for EM. Training will stop if the change in log-likelihood between iterations is less than this value.
coefficients Optional[List[np.ndarray]] None Optional initial state-wise VAR coefficient matrices. If None, they will be initialized using ordinary least squares on the entire dataset.
init_state Optional[np.ndarray] None Optional initial state probability vector. If None, it will be initialized from a Dirichlet distribution with concentration parameter startprob_prior.
trans_matrix Optional[np.ndarray] None Optional initial transition matrix. If None, it will be initialized from a Dirichlet distribution with concentration parameter transmat_prior.
random_state Optional[int] None Random seed for reproducibility of the initial state distribution and transition matrix.
verbose bool False If True, print log-likelihood at each EM iteration and convergence message.
Returns None

source

ms_var.fit


def fit(
    df:pd.DataFrame, # Training DataFrame containing all target and feature columns. Must include the columns specified in `target_cols` and any columns needed as regressors (e.g., categorical variables).
)->float: # Final log-likelihood after the additional EM iterations.

Fit the model using EM iterations until convergence on training data.

Type Details
df pd.DataFrame Training DataFrame containing all target and feature columns. Must include the columns specified in target_cols and any columns needed as regressors (e.g., categorical variables).
Returns float Final log-likelihood after the additional EM iterations.

source

ms_var.forecast


def forecast(
    H:int, # Forecast horizon (number of future time steps to predict).
    exog:Optional[pd.DataFrame]=None, # Future exogenous regressors. If provided, must contain at least H rows and the same columns as the regressors used during training (excluding target columns).
)->Dict[str, np.ndarray]: # Forecasted values for each target variable, returned as a dictionary keyed by target column name.

Generate forecasts for H future time steps using the fitted MS-VAR model.

Type Default Details
H int Forecast horizon (number of future time steps to predict).
exog Optional[pd.DataFrame] None Future exogenous regressors. If provided, must contain at least H rows and the same columns as the regressors used during training (excluding target columns).
Returns Dict[str, np.ndarray] Forecasted values for each target variable, returned as a dictionary keyed by target column name.

source

ms_var.cross_validate


def cross_validate(
    df:pd.DataFrame, # Input dataframe.
    target_col:str, # Target variable for evaluation.
    cv_split:int, # Number of cross-validation folds.
    test_size:int, # Test size per fold.
    metrics:List[Callable], # Metric functions (e.g. ``[MAE, RMSE]``) used to evaluate forecast accuracy across folds. Call ``.cv_summary()`` after cross-validation to retrieve the aggregated scores.
    step_size:int=1, # Step size for rolling window. Default is 1.
    n_iter:int=1, # Number of iterations for each fold. Default is 1.
    h_split_point:Optional[int]=None, # Point to split the test set for separate evaluation. Default is None.
)->Union[pd.DataFrame, Tuple[pd.DataFrame, pd.DataFrame]]: # DataFrame with averaged cross-validation metric scores.

Perform cross-validation.

Type Default Details
df pd.DataFrame Input dataframe.
target_col str Target variable for evaluation.
cv_split int Number of cross-validation folds.
test_size int Test size per fold.
metrics List[Callable] Metric functions (e.g. [MAE, RMSE]) used to evaluate forecast accuracy across folds. Call .cv_summary() after cross-validation to retrieve the aggregated scores.
step_size int 1 Step size for rolling window. Default is 1.
n_iter int 1 Number of iterations for each fold. Default is 1.
h_split_point Optional[int] None Point to split the test set for separate evaluation. Default is None.
Returns Union[pd.DataFrame, Tuple[pd.DataFrame, pd.DataFrame]] DataFrame with averaged cross-validation metric scores.