# Probabilistic forecasting


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Forecasters

## Naive forecaster

### peshbeen.models.naive

``` python
naive(
    target_col: 'str',
    season_period: 'Optional[int]' = None,
    box_cox: 'Union[bool, float]' = False,
    box_cox_biasadj: 'bool' = False
)
```

Naïve forecaster.

Two modes controlled by `season_period`:

- **Non-seasonal** (`season_period=None`): every forecast step repeats
  the last observed value in the training series.
- **Seasonal** (`season_period=m`): forecast values are taken from the
  last complete season and cycled forward — i.e. step `h` is predicted
  by `y[T - m + ((h-1) % m)]`, where `T` is the last training index.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Name of the target variable column.</td>
</tr>
<tr>
<td>season_period</td>
<td>Optional[int]</td>
<td>None</td>
<td>Seasonal period <code>m</code>. <code>None</code> selects the
non-seasonal naïve method. When provided and the training series is
shorter than <code>m</code>, <code>forecast</code> returns an array of
<code>NaN</code>.</td>
</tr>
<tr>
<td>box_cox</td>
<td>Union[bool, float]</td>
<td>False</td>
<td>Whether to apply Box-Cox transformation to the target variable. If a
float value is provided, it will be used as the lambda parameter for the
Box-Cox transformation. If True, the lambda parameter will be estimated
from the data.</td>
</tr>
<tr>
<td>box_cox_biasadj</td>
<td>bool</td>
<td>False</td>
<td>Bias adjustment when inverting the manual Box-Cox on forecasts.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.naive.fit

``` python
fit(
    df: 'pd.DataFrame'
)
```

Store the values needed for naïve forecasting.

No statistical model is estimated. `fit` simply applies `data_prep` and
records the training series so that `forecast` can replicate the correct
naïve pattern.

<table>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td>Training DataFrame containing the target column.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.naive.forecast

``` python
forecast(
    H: 'int',
    exog: 'Optional[pd.DataFrame]' = None
)
```

Generate naïve forecasts.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon.</td>
</tr>
<tr>
<td>exog</td>
<td>Optional[pd.DataFrame]</td>
<td>None</td>
<td>Accepted for API consistency with other models but silently ignored
— naïve forecasts do not use exogenous variables.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>np.ndarray</strong></td>
<td></td>
<td><strong>Forecast values of length <code>H</code>.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.models.naive.cross_validate

``` python
cross_validate(
    df: 'pd.DataFrame',
    cv_split: 'int',
    test_size: 'int',
    metrics: 'List[Callable]',
    step_size: 'int' = 1,
    h_split_point: 'Optional[int]' = None
)
```

Run time-series cross-validation.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>Full dataset.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of CV folds.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>Test window size per fold.</td>
</tr>
<tr>
<td>metrics</td>
<td>List[Callable]</td>
<td></td>
<td>Metric functions (e.g. <code>[MAE, RMSE]</code>) used to evaluate
forecast accuracy across folds. Call <code>.cv_summary()</code> after
cross-validation to retrieve the aggregated scores.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>1</td>
<td>Step size to advance the test window each fold.</td>
</tr>
<tr>
<td>h_split_point</td>
<td>Optional[int]</td>
<td>None</td>
<td>Split the test window into two sub-horizons for separate short- and
long-term evaluation.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Tuple[pd.DataFrame, pd.DataFrame]</strong></td>
<td></td>
<td><strong>Summary DataFrame with mean metric scores across folds, and
(optionally) a fold-level DataFrame with true vs. predicted values for
each fold.</strong></td>
</tr>
</tbody>
</table>

## ETS (Error, Trend, Seasonality) forecaster

### peshbeen.models.ets

``` python
ets(
    target_col: 'str',
    trend: 'Optional[str]' = None,
    damped_trend: 'bool' = False,
    seasonal: 'Optional[str]' = None,
    seasonal_periods: 'Optional[int]' = None,
    initialization_method: 'Optional[str]' = 'estimated',
    initial_level: 'Optional[float]' = None,
    initial_trend: 'Optional[float]' = None,
    initial_seasonal: 'Optional[list]' = None,
    bounds: 'Optional[dict]' = None,
    dates=None,
    freq: 'Optional[str]' = None,
    missing: 'str' = 'none',
    optimized: 'bool' = True,
    smoothing_level: 'Optional[float]' = None,
    smoothing_trend: 'Optional[float]' = None,
    smoothing_seasonal: 'Optional[float]' = None,
    damping_trend: 'Optional[float]' = None,
    remove_bias: 'bool' = False,
    start_params=None,
    method: 'Optional[str]' = None,
    minimize_kwargs: 'Optional[dict]' = None,
    use_brute: 'bool' = True,
    box_cox: 'Union[bool, float]' = False,
    box_cox_biasadj: 'bool' = False,
    fit_kwargs: 'Optional[dict]' = None
)
```

Holt-Winters Exponential Smoothing forecaster.

A thin wrapper around
`statsmodels.tsa.holtwinters.ExponentialSmoothing`.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Name of the target variable column.</td>
</tr>
<tr>
<td>trend</td>
<td>Optional[str]</td>
<td>None</td>
<td>Trend component type.</td>
</tr>
<tr>
<td>damped_trend</td>
<td>bool</td>
<td>False</td>
<td>Whether to damp the trend. Only meaningful when <code>trend</code>
is not <code>None</code>.</td>
</tr>
<tr>
<td>seasonal</td>
<td>Optional[str]</td>
<td>None</td>
<td>Seasonal component type.</td>
</tr>
<tr>
<td>seasonal_periods</td>
<td>Optional[int]</td>
<td>None</td>
<td>Number of periods in a complete seasonal cycle — e.g. 12 for monthly
data with an annual cycle. Required when <code>seasonal</code> is not
<code>None</code>.</td>
</tr>
<tr>
<td>initialization_method</td>
<td>Optional[str]</td>
<td>estimated</td>
<td>How to initialise the recursions. When <code>"known"</code> is
chosen, <code>initial_level</code> (and <code>initial_trend</code> /
<code>initial_seasonal</code> where applicable) must also be
provided.</td>
</tr>
<tr>
<td>initial_level</td>
<td>Optional[float]</td>
<td>None</td>
<td>Initial level value. Required when
<code>initialization_method=</code>“known”<code>. | | initial_trend | Optional[float] | None | Initial trend value.  Required when</code>initialization_method=<code>"known"</code>
and the model has a trend component.</td>
</tr>
<tr>
<td>initial_seasonal</td>
<td>Optional[list]</td>
<td>None</td>
<td>Initial seasonal factors (length <code>seasonal_periods</code> or
<code>seasonal_periods - 1</code>). Required when
<code>initialization_method="known"</code> and the model is
seasonal.</td>
</tr>
<tr>
<td>bounds</td>
<td>Optional[dict]</td>
<td>None</td>
<td>Parameter bounds passed to <code>ExponentialSmoothing</code>,
e.g. <code>{"smoothing_level": (0, 1)}</code>.</td>
</tr>
<tr>
<td>dates</td>
<td>NoneType</td>
<td>None</td>
<td>Datetime index for the series. Inferred automatically when
<code>endog</code> is a Pandas object with a
<code>DatetimeIndex</code>.</td>
</tr>
<tr>
<td>freq</td>
<td>Optional[str]</td>
<td>None</td>
<td>Frequency of the time series (e.g. <code>"M"</code>,
<code>"D"</code>). Optional when <code>dates</code> is provided.</td>
</tr>
<tr>
<td>missing</td>
<td>str</td>
<td>none</td>
<td>How to handle <code>NaN</code> values in the input series.</td>
</tr>
<tr>
<td>optimized</td>
<td>bool</td>
<td>True</td>
<td>Estimate smoothing parameters by maximising the log-likelihood.</td>
</tr>
<tr>
<td>smoothing_level</td>
<td>Optional[float]</td>
<td>None</td>
<td>Fixed alpha value. When set, this value is used directly and not
optimised.</td>
</tr>
<tr>
<td>smoothing_trend</td>
<td>Optional[float]</td>
<td>None</td>
<td>Fixed beta value. Only used when the model has a trend
component.</td>
</tr>
<tr>
<td>smoothing_seasonal</td>
<td>Optional[float]</td>
<td>None</td>
<td>Fixed gamma value. Only used when the model is seasonal.</td>
</tr>
<tr>
<td>damping_trend</td>
<td>Optional[float]</td>
<td>None</td>
<td>Fixed phi (damping) value. Only used when
<code>damped_trend=True</code>.</td>
</tr>
<tr>
<td>remove_bias</td>
<td>bool</td>
<td>False</td>
<td>Remove bias from forecast values by enforcing that the mean residual
is zero.</td>
</tr>
<tr>
<td>start_params</td>
<td>NoneType</td>
<td>None</td>
<td>Starting parameter values for the optimiser.</td>
</tr>
<tr>
<td>method</td>
<td>Optional[str]</td>
<td>None</td>
<td>Optimisation method — one of <code>"L-BFGS-B"</code> (default),
<code>"TNC"</code>, <code>"SLSQP"</code>, <code>"Powell"</code>,
<code>"trust-constr"</code>, <code>"basinhopping"</code> (alias
<code>"bh"</code>), or <code>"least_squares"</code> (alias
<code>"ls"</code>).</td>
</tr>
<tr>
<td>minimize_kwargs</td>
<td>Optional[dict]</td>
<td>None</td>
<td>Extra keyword arguments forwarded to the chosen SciPy
minimiser.</td>
</tr>
<tr>
<td>use_brute</td>
<td>bool</td>
<td>True</td>
<td>Search for good starting values with a brute-force grid search
before running the main optimiser.</td>
</tr>
<tr>
<td>box_cox</td>
<td>Union[bool, float]</td>
<td>False</td>
<td>Whether to apply Box-Cox transformation to the target variable. If a
float value is provided, it will be used as the lambda parameter for the
Box-Cox transformation. If True, the lambda parameter will be estimated
from the data.</td>
</tr>
<tr>
<td>box_cox_biasadj</td>
<td>bool</td>
<td>False</td>
<td>Bias adjustment when inverting the manual Box-Cox on forecasts.</td>
</tr>
<tr>
<td>fit_kwargs</td>
<td>Optional[dict]</td>
<td>None</td>
<td>Any additional keyword arguments forwarded verbatim to
<code>ExponentialSmoothing.fit</code>.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
<td><strong>Fitted model object with a <code>forecast</code> method for
making predictions and properties for information criteria scores (AIC,
BIC, etc.)</strong></td>
</tr>
</tbody>
</table>

### peshbeen.models.ets.fit

``` python
fit(
    df: 'pd.DataFrame'
)
```

Fit `ExponentialSmoothing` to the training data.

<table>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td>Training DataFrame containing the target column</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.ets.forecast

``` python
forecast(
    H: 'int',
    exog: 'Optional[pd.DataFrame]' = None
)
```

Multi-step forecast.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon.</td>
</tr>
<tr>
<td>exog</td>
<td>Optional[pd.DataFrame]</td>
<td>None</td>
<td>Accepted for API consistency with other models but silently ignored
— ETS forecasts do not use exogenous variables.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>np.ndarray</strong></td>
<td></td>
<td><strong>Forecast values of length <code>H</code>.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.models.ets.cross_validate

``` python
cross_validate(
    df: 'pd.DataFrame',
    cv_split: 'int',
    test_size: 'int',
    metrics: 'List[Callable]',
    step_size: 'int' = 1,
    h_split_point: 'Optional[int]' = None
)
```

Run time-series cross-validation.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>Full dataset</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of CV folds.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>Test window size per fold.</td>
</tr>
<tr>
<td>metrics</td>
<td>List[Callable]</td>
<td></td>
<td>Metric functions (e.g. <code>[MAE, RMSE]</code>) used to evaluate
forecast accuracy across folds. Call <code>.cv_summary()</code> after
cross-validation to retrieve the aggregated scores.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>1</td>
<td>Step size to advance the test window each fold.</td>
</tr>
<tr>
<td>h_split_point</td>
<td>Optional[int]</td>
<td>None</td>
<td>Split the test window into two sub-horizons for separate short- and
long-term evaluation.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Tuple[pd.DataFrame, pd.DataFrame]</strong></td>
<td></td>
<td><strong>Summary DataFrame with mean metric scores across folds, and
(optionally) a fold-level DataFrame with true vs. predicted values for
each fold.</strong></td>
</tr>
</tbody>
</table>

## ARIMA forecaster

### peshbeen.models.arima

``` python
arima(
    target_col: 'str',
    order: 'Optional[Tuple[int, int, int]]' = (0, 0, 0),
    seasonal_order: 'Optional[Tuple[int, int, int]]' = (0, 0, 0),
    seasonal_length: 'Optional[int]' = 1,
    lag_transform: 'Optional[list]' = None,
    trend: 'Optional[str]' = None,
    pol_degree: 'int' = 1,
    ets_params: 'Optional[Dict[str, Any]]' = None,
    change_points: 'Optional[List[int]]' = None,
    box_cox: 'Union[bool, float, int]' = False,
    box_cox_biasadj: 'bool' = False,
    cat_variables: 'Optional[List[str]]' = None,
    categorical_encoder: 'Optional[Any]' = None,
    target_encode: 'bool' = False
)
```

Initialize the arima model with the specified parameters and
configurations.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Name of the target variable column in the input DataFrame.</td>
</tr>
<tr>
<td>order</td>
<td>Optional[Tuple[int, int, int]]</td>
<td>(0, 0, 0)</td>
<td>The (p, d, q) order of the ARIMA model. Default is (0, 0, 0).</td>
</tr>
<tr>
<td>seasonal_order</td>
<td>Optional[Tuple[int, int, int]]</td>
<td>(0, 0, 0)</td>
<td>The (P, D, Q) order of the seasonal ARIMA model. Default is (0, 0,
0).</td>
</tr>
<tr>
<td>seasonal_length</td>
<td>Optional[int]</td>
<td>1</td>
<td>The seasonal period for the seasonal ARIMA model. Default is 1.</td>
</tr>
<tr>
<td>lag_transform</td>
<td>Optional[list]</td>
<td>None</td>
<td>List of lag-transform function objects to apply to the target
variable (e.g. [expanding_mean(shift=1), rolling_std(window_size=3,
shift=1)]). Each function should take a pandas Series as input and
return a Series of the same length. Default is None (no lag
transforms).</td>
</tr>
<tr>
<td>trend</td>
<td>Optional[str]</td>
<td>None</td>
<td>Trend strategy to use. Options are ‘linear’ for linear trend
removal, ‘ets’ for ETS-based trend removal, ‘feature_lr’ for using
linear trend components as features, and ‘feature_ets’ for using ETS
trend components as features. Default is None (no trend handling).</td>
</tr>
<tr>
<td>pol_degree</td>
<td>int</td>
<td>1</td>
<td>Degree of polynomial trend to fit when using ‘linear’ or
‘feature_lr’ trend strategy. Default is 1 (linear trend).</td>
</tr>
<tr>
<td>ets_params</td>
<td>Optional[Dict[str, Any]]</td>
<td>None</td>
<td>Dictionary of parameters for the ExponentialSmoothing model when
using ‘ets’ trend strategy. The keys should be the parameter names and
the values should be the parameter values. Default is None (use default
ETS parameters).</td>
</tr>
<tr>
<td>change_points</td>
<td>Optional[List[int]]</td>
<td>None</td>
<td>List of indices in the time series where change points occur for
piecewise linear trend fitting. Only used when trend strategy is
‘linear’ or ‘feature_lr’. Default is None (no change points, fit a
single linear trend).</td>
</tr>
<tr>
<td>box_cox</td>
<td>Union[bool, float, int]</td>
<td>False</td>
<td>Whether to apply Box-Cox transformation to the target variable. If a
float or int value is provided, it will be used as the lambda parameter
for the Box-Cox transformation. If True, the lambda parameter will be
estimated from the data.</td>
</tr>
<tr>
<td>box_cox_biasadj</td>
<td>bool</td>
<td>False</td>
<td>Whether to apply bias adjustment when inverting the Box-Cox
transformation on forecasts. Default is False.</td>
</tr>
<tr>
<td>cat_variables</td>
<td>Optional[List[str]]</td>
<td>None</td>
<td>List of categorical feature column names. If provided, these columns
will be treated as categorical variables and encoded accordingly.
Default is None (no categorical variables).</td>
</tr>
<tr>
<td>categorical_encoder</td>
<td>Optional[Any]</td>
<td>None</td>
<td>Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(),
etc.) to apply to the categorical variables specified in cat_variables.
The encoder should have fit() and transform() methods that can be
applied to the input DataFrame. Default is None (no categorical
encoding) and if None, categorical variables can only be used if the
model can handle them natively (e.g. LGBM or CatBoost).</td>
</tr>
<tr>
<td>target_encode</td>
<td>bool</td>
<td>False</td>
<td></td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.arima.fit

``` python
fit(
    df: 'pd.DataFrame'
)
```

Fit the model to the training data by applying the specified data
preparation steps and then fitting the ARIMA model.

<table>
<colgroup>
<col style="width: 9%" />
<col style="width: 38%" />
<col style="width: 52%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td>Training DataFrame containing the target and any feature
columns.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.arima.forecast

``` python
forecast(
    H: 'int',
    exog: 'Optional[pd.DataFrame]' = None
)
```

Recursive multi-step forecast.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon.</td>
</tr>
<tr>
<td>exog</td>
<td>Optional[pd.DataFrame]</td>
<td>None</td>
<td>Optional dataframe of future regressors.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>np.ndarray</strong></td>
<td></td>
<td><strong>Forecast values of length <code>H</code>.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.models.arima.cross_validate

``` python
cross_validate(
    df: 'pd.DataFrame',
    cv_split: 'int',
    test_size: 'int',
    metrics: 'List[Callable]',
    step_size: 'int' = 1,
    h_split_point: 'Optional[int]' = None
)
```

Run cross-validation using time series splits.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>DataFrame containing the target and any feature columns.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of cross-validation splits.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>Number of periods in each test set.</td>
</tr>
<tr>
<td>metrics</td>
<td>List[Callable]</td>
<td></td>
<td>Metric functions (e.g. <code>[MAE, RMSE]</code>) used to evaluate
forecast accuracy across folds. Call <code>.cv_summary()</code> after
cross-validation to retrieve the aggregated scores.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>1</td>
<td>Step size to move the test window forward in each split.</td>
</tr>
<tr>
<td>h_split_point</td>
<td>Optional[int]</td>
<td>None</td>
<td>Optional index to split the test set into two parts for separate
evaluation (e.g. to evaluate short-term vs long-term performance). If
None, no split is done.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Tuple[pd.DataFrame, pd.DataFrame]</strong></td>
<td></td>
<td><strong>DataFrame containing overall performance metrics averaged
across splits, and a DataFrame with predictions and true values for each
split.</strong></td>
</tr>
</tbody>
</table>

## Machine learning forecaster

### peshbeen.models.ml_forecaster

``` python
ml_forecaster(
    model: 'Any',
    target_col: 'str',
    lags: 'Optional[Union[int, List[int]]]' = None,
    lag_transform: 'Optional[list]' = None,
    difference: 'Optional[int]' = None,
    seasonal_diff: 'Optional[int]' = None,
    trend: 'Optional[str]' = None,
    pol_degree: 'int' = 1,
    ets_params: 'Optional[Dict[str, Any]]' = None,
    change_points: 'Optional[List[int]]' = None,
    box_cox: 'Union[bool, float, int]' = False,
    box_cox_biasadj: 'bool' = False,
    cat_variables: 'Optional[List[str]]' = None,
    categorical_encoder: 'Optional[Any]' = None
)
```

Initialize the ml_forecaster with the specified model and preprocessing
options.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>Any</td>
<td></td>
<td>A regression model object (e.g. LGBMRegressor(), XGBRegressor(),
CatBoostRegressor(), LinearRegression(), etc.)</td>
</tr>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Name of the target variable column in the input DataFrame.</td>
</tr>
<tr>
<td>lags</td>
<td>Optional[Union[int, List[int]]]</td>
<td>None</td>
<td>Lags to include as features. If an integer is provided, lags from 1
to that integer will be included. If a list of integers is provided,
those specific lags will be included. Default is None (no lag
features).</td>
</tr>
<tr>
<td>lag_transform</td>
<td>Optional[list]</td>
<td>None</td>
<td>List of lag-transform function objects to apply to the target
variable (e.g. [expanding_mean(shift=1), rolling_std(window_size=3,
shift=1)]). Each function should take a pandas Series as input and
return a Series of the same length. Default is None (no lag
transforms).</td>
</tr>
<tr>
<td>difference</td>
<td>Optional[int]</td>
<td>None</td>
<td>Order of ordinary differencing to apply to the target variable
(e.g. 1 for first difference). Default is None (no differencing).</td>
</tr>
<tr>
<td>seasonal_diff</td>
<td>Optional[int]</td>
<td>None</td>
<td>Seasonal period for seasonal differencing (e.g. 12 for monthly data
with yearly seasonality). Default is None (no seasonal
differencing).</td>
</tr>
<tr>
<td>trend</td>
<td>Optional[str]</td>
<td>None</td>
<td>Trend strategy to use. Options are ‘linear’ for linear trend
removal, ‘ets’ for ETS-based trend removal, ‘feature_lr’ for using
linear trend components as features, and ‘feature_ets’ for using ETS
trend components as features. Default is None (no trend handling).</td>
</tr>
<tr>
<td>pol_degree</td>
<td>int</td>
<td>1</td>
<td>Degree of polynomial trend to fit when using ‘linear’ or
‘feature_lr’ trend strategy. Default is 1 (linear trend).</td>
</tr>
<tr>
<td>ets_params</td>
<td>Optional[Dict[str, Any]]</td>
<td>None</td>
<td>Dictionary of parameters for the ExponentialSmoothing model when
using ‘ets’ trend strategy. The keys should be the parameter names and
the values should be the parameter values. Default is None (use default
ETS parameters).</td>
</tr>
<tr>
<td>change_points</td>
<td>Optional[List[int]]</td>
<td>None</td>
<td>List of indices in the time series where change points occur for
piecewise linear trend fitting. Only used when trend strategy is
‘linear’ or ‘feature_lr’. Default is None (no change points, fit a
single linear trend).</td>
</tr>
<tr>
<td>box_cox</td>
<td>Union[bool, float, int]</td>
<td>False</td>
<td>Whether to apply Box-Cox transformation to the target variable. If a
float or int value is provided, it will be used as the lambda parameter
for the Box-Cox transformation. If True, the lambda parameter will be
estimated from the data.</td>
</tr>
<tr>
<td>box_cox_biasadj</td>
<td>bool</td>
<td>False</td>
<td>Whether to apply bias adjustment when inverting the Box-Cox
transformation on forecasts. Default is False.</td>
</tr>
<tr>
<td>cat_variables</td>
<td>Optional[List[str]]</td>
<td>None</td>
<td>List of categorical feature column names. If provided, these columns
will be treated as categorical variables and encoded accordingly.
Default is None (no categorical variables).</td>
</tr>
<tr>
<td>categorical_encoder</td>
<td>Optional[Any]</td>
<td>None</td>
<td>Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(),
etc.) to apply to the categorical variables specified in cat_variables.
The encoder should have fit() and transform() methods that can be
applied to the input DataFrame. Default is None (no categorical
encoding) and if None, categorical variables can only be used if the
model can handle them natively (e.g. LGBM or CatBoost).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.ml_forecaster.fit

``` python
fit(
    df: 'pd.DataFrame'
)
```

Fit the model to the training data after applying the specified data
preparation steps.

<table>
<colgroup>
<col style="width: 9%" />
<col style="width: 38%" />
<col style="width: 52%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td>Training DataFrame containing the target and any feature
columns.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.ml_forecaster.forecast

``` python
forecast(
    H: 'int',
    exog: 'Optional[pd.DataFrame]' = None
)
```

Recursive multi-step forecast.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon.</td>
</tr>
<tr>
<td>exog</td>
<td>Optional[pd.DataFrame]</td>
<td>None</td>
<td>Optional dataframe of future regressors.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>np.ndarray</strong></td>
<td></td>
<td><strong>Forecast values of length <code>H</code>.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.models.ml_forecaster.cross_validate

``` python
cross_validate(
    df: 'pd.DataFrame',
    cv_split: 'int',
    test_size: 'int',
    metrics: 'List[Callable]',
    step_size: 'int' = 1,
    h_split_point: 'Optional[int]' = None
)
```

Run cross-validation using time series splits.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>DataFrame containing the target and any feature columns.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of cross-validation splits.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>Number of periods in each test set.</td>
</tr>
<tr>
<td>metrics</td>
<td>List[Callable]</td>
<td></td>
<td>Metric functions (e.g. <code>[MAE, RMSE]</code>) used to evaluate
forecast accuracy across folds. Call <code>.cv_summary()</code> after
cross-validation to retrieve the aggregated scores.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>1</td>
<td>Step size to move the test window forward in each split.</td>
</tr>
<tr>
<td>h_split_point</td>
<td>Optional[int]</td>
<td>None</td>
<td>Optional index to split the test set into two parts for separate
evaluation (e.g. to evaluate short-term vs long-term performance). If
None, no split is done.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Tuple[pd.DataFrame, pd.DataFrame]</strong></td>
<td></td>
<td><strong>DataFrame containing overall performance metrics averaged
across splits, and a DataFrame with predictions and true values for each
split.</strong></td>
</tr>
</tbody>
</table>

## GLM (Generalized Linear Model) forecaster

### peshbeen.models.glm

``` python
glm(
    family: 'Any',
    target_col: 'str',
    lags: 'Optional[Union[int, List[int]]]' = None,
    lag_transform: 'Optional[list]' = None,
    difference: 'Optional[int]' = None,
    seasonal_diff: 'Optional[int]' = None,
    trend: 'Optional[str]' = None,
    pol_degree: 'int' = 1,
    ets_params: 'Optional[Dict[str, Any]]' = None,
    change_points: 'Optional[List[int]]' = None,
    box_cox: 'Union[bool, float, int]' = False,
    box_cox_biasadj: 'bool' = False,
    add_constant: 'bool' = True,
    cat_variables: 'Optional[List[str]]' = None,
    categorical_encoder: 'Optional[Any]' = None,
    offset: 'Optional[np.ndarray]' = None,
    exposure: 'Optional[np.ndarray]' = None,
    freq_weights: 'Optional[np.ndarray]' = None,
    var_weights: 'Optional[np.ndarray]' = None,
    missing: 'Optional[str]' = None
)
```

Initialize the glm forecaster with the specified model and data
preparation parameters.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>family</td>
<td>Any</td>
<td></td>
<td>A statsmodels family object specifying the error distribution and
link function for the GLM (e.g. family=sm.families.Poisson() for count
data, family=sm.families.Binomial() for binary data, etc.).
<code>import statsmodels.api as sm</code>, so you can access the
families via <code>sm.families</code>.</td>
</tr>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Name of the target variable column in the input DataFrame.</td>
</tr>
<tr>
<td>lags</td>
<td>Optional[Union[int, List[int]]]</td>
<td>None</td>
<td>Lags to include as features. If an integer is provided, lags from 1
to that integer will be included. If a list of integers is provided,
those specific lags will be included. Default is None (no lag
features).</td>
</tr>
<tr>
<td>lag_transform</td>
<td>Optional[list]</td>
<td>None</td>
<td>List of lag-transform function objects to apply to the target
variable (e.g. [expanding_mean(shift=1), rolling_std(window_size=3,
shift=1)]). Each function should take a pandas Series as input and
return a Series of the same length. Default is None (no lag
transforms).</td>
</tr>
<tr>
<td>difference</td>
<td>Optional[int]</td>
<td>None</td>
<td>Order of ordinary differencing to apply to the target variable
(e.g. 1 for first difference). Default is None (no differencing).</td>
</tr>
<tr>
<td>seasonal_diff</td>
<td>Optional[int]</td>
<td>None</td>
<td>Seasonal period for seasonal differencing (e.g. 12 for monthly data
with yearly seasonality). Default is None (no seasonal
differencing).</td>
</tr>
<tr>
<td>trend</td>
<td>Optional[str]</td>
<td>None</td>
<td>Trend strategy to use. Options are ‘linear’ for linear trend
removal, ‘ets’ for ETS-based trend removal, ‘feature_lr’ for using
linear trend components as features, and ‘feature_ets’ for using ETS
trend components as features. Default is None (no trend handling).</td>
</tr>
<tr>
<td>pol_degree</td>
<td>int</td>
<td>1</td>
<td>Degree of polynomial trend to fit when using ‘linear’ or
‘feature_lr’ trend strategy. Default is 1 (linear trend).</td>
</tr>
<tr>
<td>ets_params</td>
<td>Optional[Dict[str, Any]]</td>
<td>None</td>
<td>Dictionary of parameters for the ExponentialSmoothing model when
using ‘ets’ trend strategy. The keys should be the parameter names and
the values should be the parameter values. Default is None (use default
ETS parameters).</td>
</tr>
<tr>
<td>change_points</td>
<td>Optional[List[int]]</td>
<td>None</td>
<td>List of indices in the time series where change points occur for
piecewise linear trend fitting. Only used when trend strategy is
‘linear’ or ‘feature_lr’. Default is None (no change points, fit a
single linear trend).</td>
</tr>
<tr>
<td>box_cox</td>
<td>Union[bool, float, int]</td>
<td>False</td>
<td>Whether to apply Box-Cox transformation to the target variable. If a
float or int value is provided, it will be used as the lambda parameter
for the Box-Cox transformation. If True, the lambda parameter will be
estimated from the data.</td>
</tr>
<tr>
<td>box_cox_biasadj</td>
<td>bool</td>
<td>False</td>
<td>Whether to apply bias adjustment when inverting the Box-Cox
transformation on forecasts. Default is False.</td>
</tr>
<tr>
<td>add_constant</td>
<td>bool</td>
<td>True</td>
<td></td>
</tr>
<tr>
<td>cat_variables</td>
<td>Optional[List[str]]</td>
<td>None</td>
<td>List of categorical feature column names. If provided, these columns
will be treated as categorical variables and encoded accordingly.
Default is None (no categorical variables).</td>
</tr>
<tr>
<td>categorical_encoder</td>
<td>Optional[Any]</td>
<td>None</td>
<td>Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(),
etc.) to apply to the categorical variables specified in cat_variables.
The encoder should have fit() and transform() methods that can be
applied to the input DataFrame. Default is None (no categorical
encoding) and if None, categorical variables can only be used if the
model can handle them natively (e.g. LGBM or CatBoost).</td>
</tr>
<tr>
<td>offset</td>
<td>Optional[np.ndarray]</td>
<td>None</td>
<td>An offset to be included in the model. If provided, must be an array
whose length is the number of rows in exog.</td>
</tr>
<tr>
<td>exposure</td>
<td>Optional[np.ndarray]</td>
<td>None</td>
<td>Log(exposure) will be added to the linear prediction in the model.
Exposure is only valid if the log link is used. If provided, it must be
an array with the same length as endog.</td>
</tr>
<tr>
<td>freq_weights</td>
<td>Optional[np.ndarray]</td>
<td>None</td>
<td>1d array of frequency weights. The default is None. If None is
selected or a blank value, then the algorithm will replace with an array
of 1’s with length equal to the endog. WARNING: Using weights is not
verified yet for all possible options and results, see Notes in
statsmodels documentation.</td>
</tr>
<tr>
<td>var_weights</td>
<td>Optional[np.ndarray]</td>
<td>None</td>
<td>1d array of variance (analytic) weights. The default is None. If
None is selected or a blank value, then the algorithm will replace with
an array of 1’s with length equal to the endog. WARNING: Using weights
is not verified yet for all possible options and results, see Notes in
statsmodels documentation.</td>
</tr>
<tr>
<td>missing</td>
<td>Optional[str]</td>
<td>None</td>
<td>Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan
checking is done. If ‘drop’, any observations with nans are dropped. If
‘raise’, an error is raised. Default is ‘none’.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.glm.fit

``` python
fit(
    df: 'pd.DataFrame'
)
```

Fit the model to the training data after applying the specified data
preparation steps.

<table>
<colgroup>
<col style="width: 9%" />
<col style="width: 38%" />
<col style="width: 52%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td>Training DataFrame containing the target and any feature
columns.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.glm.forecast

``` python
forecast(
    H: 'int',
    exog: 'Optional[pd.DataFrame]' = None
)
```

Recursive multi-step forecast.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon.</td>
</tr>
<tr>
<td>exog</td>
<td>Optional[pd.DataFrame]</td>
<td>None</td>
<td>Optional dataframe of future regressors.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>np.ndarray</strong></td>
<td></td>
<td><strong>Forecast values of length <code>H</code>.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.models.glm.cross_validate

``` python
cross_validate(
    df: 'pd.DataFrame',
    cv_split: 'int',
    test_size: 'int',
    metrics: 'List[Callable]',
    step_size: 'int' = 1,
    h_split_point: 'Optional[int]' = None
)
```

Run cross-validation using time series splits.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>DataFrame containing the target and any feature columns.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of cross-validation splits.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>Number of periods in each test set.</td>
</tr>
<tr>
<td>metrics</td>
<td>List[Callable]</td>
<td></td>
<td>Metric functions (e.g. <code>[MAE, RMSE]</code>) used to evaluate
forecast accuracy across folds. Call <code>.cv_summary()</code> after
cross-validation to retrieve the aggregated scores.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>1</td>
<td>Step size to move the test window forward in each split.</td>
</tr>
<tr>
<td>h_split_point</td>
<td>Optional[int]</td>
<td>None</td>
<td>Optional index to split the test set into two parts for separate
evaluation (e.g. to evaluate short-term vs long-term performance). If
None, no split is done.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Tuple[pd.DataFrame, pd.DataFrame]</strong></td>
<td></td>
<td><strong>DataFrame containing overall performance metrics averaged
across splits, and a DataFrame with predictions and true values for each
split.</strong></td>
</tr>
</tbody>
</table>

## MS-ARR (Markov Switching Autoregressive Regression) forecaster

### peshbeen.models.ms_arr

``` python
ms_arr(
    n_components: 'int',
    target_col: 'str',
    lags: 'Optional[Union[int, List[int]]]' = None,
    lag_transform: 'Optional[list]' = None,
    difference: 'Optional[int]' = None,
    seasonal_diff: 'Optional[int]' = None,
    trend: 'Optional[str]' = None,
    pol_degree: 'int' = 1,
    ets_params: 'Optional[Dict[str, Any]]' = None,
    change_points: 'Optional[List[int]]' = None,
    box_cox: 'Union[bool, float, int]' = False,
    box_cox_biasadj: 'bool' = False,
    add_constant: 'bool' = True,
    cat_variables: 'Optional[List[str]]' = None,
    categorical_encoder: 'Optional[Any]' = None,
    method: 'str' = 'posterior',
    switching_var: 'bool' = True,
    startprob_prior: 'float' = 1000.0,
    transmat_prior: 'float' = 1000.0,
    n_iter: 'int' = 100,
    tol: 'float' = 0.001,
    ridge: 'float' = 1e-05,
    coefficients: 'Optional[np.ndarray]' = None,
    stds: 'Optional[np.ndarray]' = None,
    init_state: 'Optional[np.ndarray]' = None,
    trans_matrix: 'Optional[np.ndarray]' = None,
    random_state: 'int' = 42,
    verbose: 'bool' = False
)
```

Initialize the MS-ARR model with the specified parameters.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>n_components</td>
<td>int</td>
<td></td>
<td>Number of hidden states (regimes).</td>
</tr>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Name of the target variable.</td>
</tr>
<tr>
<td>lags</td>
<td>Optional[Union[int, List[int]]]</td>
<td>None</td>
<td>Lags for the autoregressive model.</td>
</tr>
<tr>
<td>lag_transform</td>
<td>Optional[list]</td>
<td>None</td>
<td>List of lag-transform function objects applied to the target.</td>
</tr>
<tr>
<td>difference</td>
<td>Optional[int]</td>
<td>None</td>
<td>Order of ordinary differencing (e.g. 1 for first difference).</td>
</tr>
<tr>
<td>seasonal_diff</td>
<td>Optional[int]</td>
<td>None</td>
<td>Seasonal period for seasonal differencing.</td>
</tr>
<tr>
<td>trend</td>
<td>Optional[str]</td>
<td>None</td>
<td>Trend strategy: ‘linear’ or ‘ets’.</td>
</tr>
<tr>
<td>pol_degree</td>
<td>int</td>
<td>1</td>
<td>Degree of polynomial trend (default: 1). Used when
trend=‘linear’.</td>
</tr>
<tr>
<td>ets_params</td>
<td>Optional[Dict[str, Any]]</td>
<td>None</td>
<td>Dictionary of parameters for the ExponentialSmoothing model when
using ‘ets’ trend strategy. The keys should be the parameter names and
the values should be the parameter values. Default is None (use default
ETS parameters).</td>
</tr>
<tr>
<td>change_points</td>
<td>Optional[List[int]]</td>
<td>None</td>
<td>Change points for piecewise linear trend. List of indices where the
trend slope can change.</td>
</tr>
<tr>
<td>box_cox</td>
<td>Union[bool, float, int]</td>
<td>False</td>
<td>Whether to apply Box-Cox transformation to the target variable. If a
float or int value is provided, it will be used as the lambda parameter
for the Box-Cox transformation. If True, the lambda parameter will be
estimated from the data.</td>
</tr>
<tr>
<td>box_cox_biasadj</td>
<td>bool</td>
<td>False</td>
<td>Whether to apply bias adjustment when inverting Box-Cox
transformation (default: False).</td>
</tr>
<tr>
<td>add_constant</td>
<td>bool</td>
<td>True</td>
<td>If True, prepend a constant column to the regressor matrix (default:
True).</td>
</tr>
<tr>
<td>cat_variables</td>
<td>Optional[List[str]]</td>
<td>None</td>
<td>List of categorical feature column names. If provided, these columns
will be treated as categorical variables and encoded accordingly.
Default is None (no categorical variables).</td>
</tr>
<tr>
<td>categorical_encoder</td>
<td>Optional[Any]</td>
<td>None</td>
<td>Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(),
etc.) to apply to the categorical variables specified in cat_variables.
The encoder should have fit() and transform() methods that can be
applied to the input DataFrame. Default is None (no categorical
encoding) and if None, categorical variables can only be used if the
model can handle them natively (e.g. LGBM or CatBoost).</td>
</tr>
<tr>
<td>method</td>
<td>str</td>
<td>posterior</td>
<td>State assignment method: ‘posterior’ (soft) or ‘viterbi’ (hard).
Default: ‘posterior’.</td>
</tr>
<tr>
<td>switching_var</td>
<td>bool</td>
<td>True</td>
<td>If True, each regime has its own variance. If False, uses pooled
variance. Default: True.</td>
</tr>
<tr>
<td>startprob_prior</td>
<td>float</td>
<td>1000.0</td>
<td>Dirichlet concentration for initial state distribution. Default:
1e3.</td>
</tr>
<tr>
<td>transmat_prior</td>
<td>float</td>
<td>1000.0</td>
<td>Dirichlet concentration for transition matrix rows. Default:
1e3.</td>
</tr>
<tr>
<td>n_iter</td>
<td>int</td>
<td>100</td>
<td>Maximum EM iterations. Default: 100.</td>
</tr>
<tr>
<td>tol</td>
<td>float</td>
<td>0.001</td>
<td>Convergence tolerance on log-likelihood. Default: 1e-6.</td>
</tr>
<tr>
<td>ridge</td>
<td>float</td>
<td>1e-05</td>
<td>Ridge regularisation parameter for coefficient estimation. Default:
1e-5.</td>
</tr>
<tr>
<td>coefficients</td>
<td>Optional[np.ndarray]</td>
<td>None</td>
<td>Initial regression coefficients (shape: n_states x n_features).</td>
</tr>
<tr>
<td>stds</td>
<td>Optional[np.ndarray]</td>
<td>None</td>
<td>Initial state standard deviations (shape: n_states,).</td>
</tr>
<tr>
<td>init_state</td>
<td>Optional[np.ndarray]</td>
<td>None</td>
<td>Initial state probability vector (shape: n_states,).</td>
</tr>
<tr>
<td>trans_matrix</td>
<td>Optional[np.ndarray]</td>
<td>None</td>
<td>Initial transition matrix (shape: n_states x n_states).</td>
</tr>
<tr>
<td>random_state</td>
<td>int</td>
<td>42</td>
<td>Random seed for reproducibility. Default: 42.</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td>If True, print EM progress. Default: False.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.ms_arr.fit

``` python
fit(
    df: 'pd.DataFrame'
)
```

Fit the model using the EM algorithm.

<table>
<colgroup>
<col style="width: 9%" />
<col style="width: 38%" />
<col style="width: 52%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td>Training DataFrame containing the target and any feature
columns.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>float</strong></td>
<td><strong>Final log-likelihood after EM convergence.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.models.ms_arr.forecast

``` python
forecast(
    H: 'int',
    exog: 'Optional[pd.DataFrame]' = None
)
```

Generate forecasts for H future time steps.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon (number of steps to forecast ahead).</td>
</tr>
<tr>
<td>exog</td>
<td>Optional[pd.DataFrame]</td>
<td>None</td>
<td>Future exogenous regressors (must contain at least H rows). Should
have the same columns as the training data (excluding the target
variable).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>np.ndarray</strong></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.ms_arr.cross_validate

``` python
cross_validate(
    df: 'pd.DataFrame',
    cv_split: 'int',
    test_size: 'int',
    metrics: 'List[Callable]',
    step_size: 'int' = 1,
    n_iter: 'int' = 1,
    h_split_point: 'Optional[int]' = None
)
```

Run cross-validation.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>DataFrame containing the target and any feature columns.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of cross-validation splits.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>Number of time steps in the test set for each split.</td>
</tr>
<tr>
<td>metrics</td>
<td>List[Callable]</td>
<td></td>
<td>Metric functions (e.g. <code>[MAE, RMSE]</code>) used to evaluate
forecast accuracy across folds. Call <code>.cv_summary()</code> after
cross-validation to retrieve the aggregated scores.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>1</td>
<td>Step size between the start of each test set in the splits.</td>
</tr>
<tr>
<td>n_iter</td>
<td>int</td>
<td>1</td>
<td>Number of EM iterations to run for each training fold.</td>
</tr>
<tr>
<td>h_split_point</td>
<td>Optional[int]</td>
<td>None</td>
<td>If provided, split the test set into two parts at this index and
evaluate metrics separately on each part.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Union[pd.DataFrame, Tuple[pd.DataFrame,
pd.DataFrame]]</strong></td>
<td></td>
<td><strong>DataFrame containing the average score for each metric
across all splits. If h_split_point is provided, also includes separate
scores for the two parts of the test set. If cv_df is True, returns a
tuple of (overall_performance_df, cv_predictions_df)</strong></td>
</tr>
</tbody>
</table>

## Hybrid (pesh) forecaster

### peshbeen.models.pesh

``` python
pesh(
    models: 'dict',
    weighting_scheme: 'Optional[Dict[str, float]]' = None
)
```

Initialize the pesh model with the specified parameters for hybrid
forecasting that combines forecasts from multiple models.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>models</td>
<td>dict</td>
<td></td>
<td>A dictionary of model instances to be used for forecasting. The keys
should be string names for each model.</td>
</tr>
<tr>
<td>weighting_scheme</td>
<td>Optional[Dict[str, float]]</td>
<td>None</td>
<td>Optional dictionary specifying weights for each model’s forecast.
Default is None, which means equal weighting.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.pesh.fit

``` python
fit(
    df: 'pd.DataFrame'
)
```

Fit the specified models to the training data.

<table>
<colgroup>
<col style="width: 9%" />
<col style="width: 38%" />
<col style="width: 52%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td>Training DataFrame containing the target and any feature
columns.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.pesh.forecast

``` python
forecast(
    H: 'int',
    exog: 'Optional[pd.DataFrame]' = None
)
```

Recursive multi-step forecast.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon.</td>
</tr>
<tr>
<td>exog</td>
<td>Optional[pd.DataFrame]</td>
<td>None</td>
<td>Optional dataframe of future regressors. Must have the same columns
as the exogenous variables used during training and at least
<code>H</code> rows.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>np.ndarray</strong></td>
<td></td>
<td><strong>Forecast values of length <code>H</code>.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.models.pesh.cross_validate

``` python
cross_validate(
    df: 'pd.DataFrame',
    cv_split: 'int',
    test_size: 'int',
    metrics: 'List[Callable]',
    step_size: 'int' = 1,
    metric_to_opt: 'Optional[Callable]' = None,
    weighting_scheme: 'Optional[Union[Dict[str, float], str]]' = None,
    optimizer: 'str' = 'SLSQP'
)
```

Perform cross-validation for the pesh model using a rolling forecasting
origin approach.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>The input DataFrame containing the target and any feature
columns.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>The number of cross-validation splits.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>The size of the test set for each split.</td>
</tr>
<tr>
<td>metrics</td>
<td>List[Callable]</td>
<td></td>
<td>Metric functions (e.g. <code>[MAE, RMSE]</code>) used to evaluate
forecast accuracy across folds. Call <code>.cv_summary()</code> after
cross-validation to retrieve the aggregated scores.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>1</td>
<td>The step size for rolling the forecasting origin.</td>
</tr>
<tr>
<td>metric_to_opt</td>
<td>Optional[Callable]</td>
<td>None</td>
<td>An optional metric function to optimize when weighting_scheme is set
to “optimize”. If None, it defaults to the first metric in the metrics
list.</td>
</tr>
<tr>
<td>weighting_scheme</td>
<td>Optional[Union[Dict[str, float], str]]</td>
<td>None</td>
<td>None: equal weights across models. dict: user-provided weights (must
sum to 1). “optimize”: optimize weights to minimize MSE via
<code>scipy.optimize.minimize</code>.</td>
</tr>
<tr>
<td>optimizer</td>
<td>str</td>
<td>SLSQP</td>
<td>Optimization method to use when weighting_scheme is set to
“optimize”. Passed to <code>scipy.optimize.minimize</code>. Refer to
SciPy documentation for available methods.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>pd.DataFrame</strong></td>
<td></td>
<td><strong>A DataFrame containing the performance metrics for each
model and the combined forecast across all cross-validation splits.
Also, optimized weights are stored in <code>self.optimal_weights_</code>
if <code>weighting_scheme</code> is “optimize”.</strong></td>
</tr>
</tbody>
</table>

## VAR (Vector Autoregression) forecaster

### peshbeen.models.var

``` python
var(
    target_cols: 'List[str]',
    lags: 'Dict[str, Union[int, List[int]]]',
    lag_transform: 'Optional[Dict[str, list]]' = None,
    difference: 'Optional[Dict[str, int]]' = None,
    seasonal_diff: 'Optional[Dict[str, int]]' = None,
    trend: 'Optional[Dict[str, str]]' = None,
    pol_degree: 'Optional[Union[int, Dict[str, int]]]' = 1,
    ets_params: 'Optional[Dict[str, Any]]' = None,
    change_points: 'Optional[Dict[str, List[int]]]' = None,
    box_cox: 'Optional[Dict[str, Union[bool, float, int]]]' = None,
    box_cox_biasadj: 'Union[bool, Dict[str, bool]]' = False,
    add_constant: 'bool' = True,
    cat_variables: 'Optional[List[str]]' = None,
    categorical_encoder: 'Optional[Union[Dict[str, Any], Any]]' = None,
    verbose: 'bool' = False
)
```

” Initialize the VAR model with specified preprocessing and modeling
parameters.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>target_cols</td>
<td>List[str]</td>
<td></td>
<td>List of target column names to model.</td>
</tr>
<tr>
<td>lags</td>
<td>Dict[str, Union[int, List[int]]]</td>
<td></td>
<td>Dictionary specifying lags for each target variable. Values can be
an int (number of lags) or a list of specific lag indices.</td>
</tr>
<tr>
<td>lag_transform</td>
<td>Optional[Dict[str, list]]</td>
<td>None</td>
<td>Dictionary specifying lag-transform functions for each target
variable. Each value is a list of transformation functions (e.g.,
rolling_mean, expanding_std) to apply to the lagged features of that
target.</td>
</tr>
<tr>
<td>difference</td>
<td>Optional[Dict[str, int]]</td>
<td>None</td>
<td>Dictionary specifying the order of ordinary differencing to apply to
each target variable. Values are integers indicating how many times to
difference the series.</td>
</tr>
<tr>
<td>seasonal_diff</td>
<td>Optional[Dict[str, int]]</td>
<td>None</td>
<td>Dictionary specifying the seasonal period for seasonal differencing
for each target variable. Values are integers indicating the seasonal
lag (e.g., 12 for monthly data with yearly seasonality).</td>
</tr>
<tr>
<td>trend</td>
<td>Optional[Dict[str, str]]</td>
<td>None</td>
<td>Dictionary specifying the trend strategy for each target variable.
Values can be ‘linear’ for linear trend removal or ‘ets’ for ETS-based
trend removal.</td>
</tr>
<tr>
<td>pol_degree</td>
<td>Optional[Union[int, Dict[str, int]]]</td>
<td>1</td>
<td>Polynomial degree for linear trend removal. Can be a single integer
applied to all targets or a dictionary specifying the degree for each
target.</td>
</tr>
<tr>
<td>ets_params</td>
<td>Optional[Dict[str, Any]]</td>
<td>None</td>
<td>Dictionary specifying ETS model and fit parameters for each target
variable when using ‘ets’ trend removal. Each value is a dictionary of
parameters for the ExponentialSmoothing model and fitting process.</td>
</tr>
<tr>
<td>change_points</td>
<td>Optional[Dict[str, List[int]]]</td>
<td>None</td>
<td>Dictionary specifying change points for piecewise linear trend
removal for each target variable. Values are lists of integer indices
indicating where the trend should change. Only applicable when trend
strategy is ‘linear’.</td>
</tr>
<tr>
<td>box_cox</td>
<td>Optional[Dict[str, Union[bool, float, int]]]</td>
<td>None</td>
<td>Dictionary specifying whether to apply Box-Cox transformation to
each target variable. Values can be a boolean (True to apply, False to
skip) or a float (lambda parameter for Box-Cox transformation). If True,
lambda will be estimated from the data.</td>
</tr>
<tr>
<td>box_cox_biasadj</td>
<td>Union[bool, Dict[str, bool]]</td>
<td>False</td>
<td>Whether to apply bias adjustment when inverting the Box-Cox
transformation on forecasts. Can be a single boolean applied to all
targets or a dictionary specifying the bias adjustment for each
target.</td>
</tr>
<tr>
<td>add_constant</td>
<td>bool</td>
<td>True</td>
<td>If True, a constant column will be added to the regressor matrix for
the VAR model. This is typically used to allow for an intercept in the
model.</td>
</tr>
<tr>
<td>cat_variables</td>
<td>Optional[List[str]]</td>
<td>None</td>
<td>List of categorical feature column names to encode. These will be
shared across all target variables.</td>
</tr>
<tr>
<td>categorical_encoder</td>
<td>Optional[Union[Dict[str, Any], Any]]</td>
<td>None</td>
<td>A categorical encoder instance, or a single-entry dictionary mapping
the target column to the encoder when the encoder requires access to the
target variable during fitting (e.g. {target_col: MeanEncoder()}). If
encoder requiring target access is provided directly without the dict
format, first target column in target_cols will be used for fitting the
encoder. For encoders that do not require target access, pass the
encoder instance directly (e.g. OneHotEncoder()).</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td>If True, the model will print verbose messages.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.var.fit

``` python
fit(
    df: 'pd.DataFrame'
)
```

Fit the VAR model to the provided DataFrame.

<table>
<colgroup>
<col style="width: 9%" />
<col style="width: 38%" />
<col style="width: 52%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td>Training DataFrame containing the target and any feature
columns.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.var.forecast

``` python
forecast(
    H: 'int',
    exog: 'Optional[pd.DataFrame]' = None
)
```

Generate forecasts for H future time steps.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon (number of steps ahead to predict).</td>
</tr>
<tr>
<td>exog</td>
<td>Optional[pd.DataFrame]</td>
<td>None</td>
<td>Future exogenous regressors (must contain at least H rows).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Dict[str, np.ndarray]</strong></td>
<td></td>
<td><strong>Forecasted values for each target, keyed by column
name.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.models.var.cross_validate

``` python
cross_validate(
    df: 'pd.DataFrame',
    target_col: 'str',
    cv_split: 'int',
    test_size: 'int',
    metrics: 'List[Callable]',
    step_size: 'int' = 1,
    h_split_point: 'Optional[int]' = None
)
```

Perform cross-validation.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>Input dataframe.</td>
</tr>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Target variable for evaluation.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of cross-validation folds.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>Test size per fold.</td>
</tr>
<tr>
<td>metrics</td>
<td>List[Callable]</td>
<td></td>
<td>Metric functions (e.g. <code>[MAE, RMSE]</code>) used to evaluate
forecast accuracy across folds. Call <code>.cv_summary()</code> after
cross-validation to retrieve the aggregated scores.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>1</td>
<td>Step size for rolling window. Default is 1.</td>
</tr>
<tr>
<td>h_split_point</td>
<td>Optional[int]</td>
<td>None</td>
<td>Point to split the test set for separate evaluation. Default is
None.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Union[pd.DataFrame, Tuple[pd.DataFrame,
pd.DataFrame]]</strong></td>
<td></td>
<td><strong>DataFrame with averaged cross-validation metric
scores.</strong></td>
</tr>
</tbody>
</table>

## Multivariate machine learning forecaster

### peshbeen.models.ml_mv_forecaster

``` python
ml_mv_forecaster(
    model: 'Any',
    target_cols: 'List[str]',
    lags: 'Optional[Dict[str, Union[int, List[int]]]]' = None,
    lag_transform: 'Optional[Dict[str, list]]' = None,
    difference: 'Optional[Dict[str, int]]' = None,
    seasonal_diff: 'Optional[Dict[str, int]]' = None,
    trend: 'Optional[Dict[str, str]]' = None,
    pol_degree: 'Optional[Union[int, Dict[str, int]]]' = 1,
    ets_params: 'Optional[Dict[str, Any]]' = None,
    change_points: 'Optional[Dict[str, List[int]]]' = None,
    box_cox: 'Optional[Dict[str, Union[bool, float, int]]]' = None,
    box_cox_biasadj: 'Optional[Dict[str, bool]]' = None,
    cat_variables: 'Optional[List[str]]' = None,
    categorical_encoder: 'Optional[Union[Dict[str, Any], Any]]' = None
)
```

” Initialize the multi-target machine learning forecaster with specified
transformations and model.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>Any</td>
<td></td>
<td>A scikit-learn compatible regression model instance
(e.g. LGBMRegressor(), CatBoostRegressor(), LinearRegression(),
etc.).</td>
</tr>
<tr>
<td>target_cols</td>
<td>List[str]</td>
<td></td>
<td>List of target variable names to forecast.</td>
</tr>
<tr>
<td>lags</td>
<td>Optional[Dict[str, Union[int, List[int]]]]</td>
<td>None</td>
<td>Dictionary specifying lag features to create for each target
variable. The value can be an integer (number of lags) or a list of
specific lag periods.</td>
</tr>
<tr>
<td>lag_transform</td>
<td>Optional[Dict[str, list]]</td>
<td>None</td>
<td>Dictionary specifying lag-based transformations to apply for each
target variable. The value should be a list of transformation functions
(e.g. rolling_mean, expanding_std) with their parameters encapsulated in
the function instance.</td>
</tr>
<tr>
<td>difference</td>
<td>Optional[Dict[str, int]]</td>
<td>None</td>
<td>Dictionary specifying the order of ordinary differencing to apply
for each target variable.</td>
</tr>
<tr>
<td>seasonal_diff</td>
<td>Optional[Dict[str, int]]</td>
<td>None</td>
<td>Dictionary specifying the order of seasonal differencing to apply
for each target variable.</td>
</tr>
<tr>
<td>trend</td>
<td>Optional[Dict[str, str]]</td>
<td>None</td>
<td>Dictionary specifying the trend removal strategy for each target
variable. Supported values are ‘linear’, ‘ets’, ‘feature_lr’, and
‘feature_ets’.</td>
</tr>
<tr>
<td>pol_degree</td>
<td>Optional[Union[int, Dict[str, int]]]</td>
<td>1</td>
<td>Polynomial degree for linear trend removal. Can be a single integer
applied to all targets or a dictionary specifying the degree for each
target variable.</td>
</tr>
<tr>
<td>ets_params</td>
<td>Optional[Dict[str, Any]]</td>
<td>None</td>
<td>Dictionary specifying ETS model and fit parameters for each target
variable when using ‘ets’ trend removal. Each value is a dictionary of
parameters for the ExponentialSmoothing model and fitting process.</td>
</tr>
<tr>
<td>change_points</td>
<td>Optional[Dict[str, List[int]]]</td>
<td>None</td>
<td>Dictionary specifying change points for piecewise linear trend
removal for each target variable. The value should be a list of integer
indices where the trend slope can change.</td>
</tr>
<tr>
<td>box_cox</td>
<td>Optional[Dict[str, Union[bool, float, int]]]</td>
<td>None</td>
<td>Dictionary specifying whether to apply Box-Cox transformation for
each target variable. The value can be a boolean (True to apply with
lambda estimated from data, False to skip) or a float (specific lambda
value to use).</td>
</tr>
<tr>
<td>box_cox_biasadj</td>
<td>Optional[Dict[str, bool]]</td>
<td>None</td>
<td>Dictionary specifying whether to apply bias adjustment when
inverting Box-Cox transformation for each target variable.</td>
</tr>
<tr>
<td>cat_variables</td>
<td>Optional[List[str]]</td>
<td>None</td>
<td>List of categorical feature column names to encode. These will be
shared across all target variables.</td>
</tr>
<tr>
<td>categorical_encoder</td>
<td>Optional[Union[Dict[str, Any], Any]]</td>
<td>None</td>
<td>A categorical encoder instance, or a single-entry dictionary mapping
the target column to the encoder when the encoder requires access to the
target variable during fitting (e.g. {target_col: MeanEncoder()}). If
encoder requiring target access is provided directly without the dict
format, first target column in target_cols will be used for fitting the
encoder. For encoders that do not require target access, pass the
encoder instance directly (e.g. OneHotEncoder()).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.ml_mv_forecaster.fit

``` python
fit(
    df: 'pd.DataFrame'
)
```

Fit the model to the data passed in df

<table>
<colgroup>
<col style="width: 9%" />
<col style="width: 38%" />
<col style="width: 52%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td>Training DataFrame containing all target and feature columns.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>None</strong></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.models.ml_mv_forecaster.forecast

``` python
forecast(
    H: 'int',
    exog: 'Optional[pd.DataFrame]' = None
)
```

Generate forecasts for H future time steps.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon (number of steps to forecast ahead).</td>
</tr>
<tr>
<td>exog</td>
<td>Optional[pd.DataFrame]</td>
<td>None</td>
<td>Future exogenous regressors (must contain at least H rows).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Dict[str, np.ndarray]</strong></td>
<td></td>
<td><strong>A dictionary where keys are target column names and values
are arrays of H forecasted values for each target
variable.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.models.ml_mv_forecaster.cross_validate

``` python
cross_validate(
    df: 'pd.DataFrame',
    target_col: 'str',
    cv_split: 'int',
    test_size: 'int',
    metrics: 'List[Callable]',
    step_size: 'int' = 1,
    h_split_point: 'Optional[int]' = None
)
```

Perform cross-validation.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>Input dataframe.</td>
</tr>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Target variable for evaluation.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of cross-validation folds.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>Test size per fold.</td>
</tr>
<tr>
<td>metrics</td>
<td>List[Callable]</td>
<td></td>
<td>Metric functions (e.g. <code>[MAE, RMSE]</code>) used to evaluate
forecast accuracy across folds. Call <code>.cv_summary()</code> after
cross-validation to retrieve the aggregated scores.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>1</td>
<td>Step size for rolling window. Default is 1.</td>
</tr>
<tr>
<td>h_split_point</td>
<td>Optional[int]</td>
<td>None</td>
<td>Point to split the test set for separate evaluation. Default is
None.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Union[pd.DataFrame, Tuple[pd.DataFrame,
pd.DataFrame]]</strong></td>
<td></td>
<td><strong>DataFrame with overall performance metrics averaged across
folds. If h_split_point is provided, also includes separate performance
before and after the split point.</strong></td>
</tr>
</tbody>
</table>

## Probabilistic forecasting for univariate time series

### peshbeen.probabilistic_forecasting.prob_forecasts

``` python
prob_forecasts(
    model,
    H: 'int',
    n_calibration: 'Union[int, None]' = None,
    step_size: 'int' = 1,
    random_state: 'int' = 42,
    n_iter: 'Union[int, None]' = None,
    verbose: 'bool' = False
)
```

Probabilistic forecasting wrapper for any point-forecasting model.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td></td>
<td></td>
<td>Any model with <code>.target_col</code>, <code>.fit(df)</code>, and
<code>.forecast(H, exog)</code> attributes.</td>
</tr>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon.</td>
</tr>
<tr>
<td>n_calibration</td>
<td>Union[int, None]</td>
<td>None</td>
<td>Number of calibration windows for cross-validated residual
estimation. If None, in sample residuals are used without
cross-validation (Horizon-specific uncalibrated intervals may be too
narrow in this case. This is recommended when data size is small as the
model may not have enough data to fit well in each calibration
fold).</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>1</td>
<td>Step size between consecutive calibration windows.</td>
</tr>
<tr>
<td>random_state</td>
<td>int</td>
<td>42</td>
<td>Seed for all internal random-number generators.</td>
</tr>
<tr>
<td>n_iter</td>
<td>Union[int, None]</td>
<td>None</td>
<td>Number of EM iterations during each calibration window. Only
relevant for Markov-switching Autoregressive model (<a
href="https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ms_arr.html#ms_arr"><code>ms_arr</code></a>).
A smaller value than the model’s default speeds up calibration at the
cost of convergence quality per fold — typically a value of 3–10 is
sufficient for calibration windows where the model is already close to
the solution.</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td>Print progress during calibration.</td>
</tr>
</tbody>
</table>

### peshbeen.probabilistic_forecasting.prob_forecasts.calibrate

``` python
calibrate(
    df: 'pd.DataFrame',
    delta: 'Union[float, List[float]]' = 0.5
)
```

Calibrate the conformal predictor.

Runs rolling-window cross-validation (if not already done) to collect
non-conformity scores, then computes the per-horizon conformal quantile
`q_hat` for each requested `delta` level.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>Calibration dataset.</td>
</tr>
<tr>
<td>delta</td>
<td>Union[float, List[float]]</td>
<td>0.5</td>
<td>Coverage level(s). A single float produces one symmetric interval; a
list produces one interval per level. For example,<code>delta=0.9</code>
produces a 90 % prediction interval.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>‘prob_forecasts’</strong></td>
<td></td>
<td><strong>The fitted object, with <code>self.q_hat</code> set to the
calibrated</strong></td>
</tr>
</tbody>
</table>

### peshbeen.probabilistic_forecasting.prob_forecasts.sample

``` python
sample(
    df: 'pd.DataFrame',
    n_samples: 'int' = 1000,
    method: 'str' = 'empirical',
    future_exog: 'Union[pd.DataFrame, None]' = None
)
```

Draw sample paths from the predictive distribution.

Three methods are available:

- `"empirical"` — residuals are resampled with replacement independently
  at each horizon.
- `"kde"` — a Gaussian KDE is fitted to each horizon’s residuals;
  samples are drawn from the smoothed distribution.
- `"correlated"` — a multivariate normal is fitted to the full
  `H`-dimensional residual vectors, preserving cross-horizon
  correlation. Samples are drawn jointly.

Results are stored on `self`:

- `self.sample_paths` — `(n_samples, H)` array of sampled trajectories
  centred on the point forecast.
- `self.point_forecast` — `(H,)` point forecast array.
- `self.sample_paths_df` — the same data as a DataFrame with columns
  `h_1, …, h_H`.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>Training data. Residuals are computed via cross-validation if not
yet available.</td>
</tr>
<tr>
<td>n_samples</td>
<td>int</td>
<td>1000</td>
<td>Number of sample paths to draw.</td>
</tr>
<tr>
<td>method</td>
<td>str</td>
<td>empirical</td>
<td>Sampling strategy (see above).</td>
</tr>
<tr>
<td>future_exog</td>
<td>Union[pd.DataFrame, None]</td>
<td>None</td>
<td>Future exogenous variables passed to <code>forecast</code>.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>‘prob_forecasts’</strong></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.probabilistic_forecasting.prob_forecasts.sample_quantiles

``` python
sample_quantiles(
    quantiles: 'Union[float, List[float]]'
)
```

Compute quantiles from the sample paths generated by `sample`.

Works identically regardless of which `method` was passed to `sample`.

<table>
<colgroup>
<col style="width: 9%" />
<col style="width: 38%" />
<col style="width: 52%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>quantiles</td>
<td>Union[float, List[float]]</td>
<td>Desired quantile levels (e.g. <code>[0.1, 0.5, 0.9]</code>).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>pd.DataFrame</strong></td>
<td><strong>Columns: <code>point_forecast</code>,
<code>q_&lt;level&gt;</code> for each level.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.probabilistic_forecasting.prob_forecasts.conformal_quantiles

``` python
conformal_quantiles(
    df: 'pd.DataFrame',
    quantiles: 'Union[float, List[float]]',
    future_exog: 'Union[pd.DataFrame, None]' = None
)
```

Generate conformal prediction quantiles.

Requires `calibrate` to have been called first.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>Training data for the final model fit.</td>
</tr>
<tr>
<td>quantiles</td>
<td>Union[float, List[float]]</td>
<td></td>
<td>Desired quantile levels (e.g. <code>[0.1, 0.5, 0.9]</code>).</td>
</tr>
<tr>
<td>future_exog</td>
<td>Union[pd.DataFrame, None]</td>
<td>None</td>
<td>Future exogenous variables.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>pd.DataFrame</strong></td>
<td></td>
<td><strong>Columns: <code>point_forecast</code>,
<code>q_&lt;level&gt;</code> for each level.</strong></td>
</tr>
</tbody>
</table>

## Probabilistic forecasting for multivariate time series

### peshbeen.probabilistic_forecasting.mv_prob_forecasts

``` python
mv_prob_forecasts(
    model,
    target_col: 'str',
    H: 'int',
    n_calibration: 'Union[int, None]' = None,
    step_size: 'int' = 1,
    random_state: 'int' = 42,
    n_iter: 'Union[int, None]' = None,
    verbose: 'bool' = False
)
```

Probabilistic forecasting wrapper for any point-forecasting model.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td></td>
<td></td>
<td>Any model with <code>.target_col</code>, <code>.fit(df)</code>, and
<code>.forecast(H, exog)</code> attributes.</td>
</tr>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Name of the target variable column in the input DataFrames.</td>
</tr>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon.</td>
</tr>
<tr>
<td>n_calibration</td>
<td>Union[int, None]</td>
<td>None</td>
<td>Number of calibration windows for cross-validated residual
estimation. If None, in sample residuals are used without
cross-validation (Horizon-specific uncalibrated intervals may be too
narrow in this case. This is recommended when data size is small as the
model may not have enough data to fit well in each calibration
fold).</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>1</td>
<td>Step size between consecutive calibration windows.</td>
</tr>
<tr>
<td>random_state</td>
<td>int</td>
<td>42</td>
<td>Seed for all internal random-number generators.</td>
</tr>
<tr>
<td>n_iter</td>
<td>Union[int, None]</td>
<td>None</td>
<td>Number of EM iterations during each calibration window. Only
relevant for Markov-switching Autoregressive model (<a
href="https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ms_arr.html#ms_arr"><code>ms_arr</code></a>).
A smaller value than the model’s default speeds up calibration at the
cost of convergence quality per fold — typically a value of 3–10 is
sufficient for calibration windows where the model is already close to
the solution.</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td>Print progress during calibration.</td>
</tr>
</tbody>
</table>

### peshbeen.probabilistic_forecasting.mv_prob_forecasts.calibrate

``` python
calibrate(
    df: 'pd.DataFrame',
    delta: 'Union[float, List[float]]' = 0.5
)
```

Calibrate the conformal predictor.

Runs rolling-window cross-validation (if not already done) to collect
non-conformity scores, then computes the per-horizon conformal quantile
`q_hat` for each requested `delta` level.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>Calibration dataset.</td>
</tr>
<tr>
<td>delta</td>
<td>Union[float, List[float]]</td>
<td>0.5</td>
<td>Coverage level(s). A single float produces one symmetric interval; a
list produces one interval per level. For example,<code>delta=0.9</code>
produces a 90 % prediction interval.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>‘prob_forecasts’</strong></td>
<td></td>
<td><strong>The fitted object, with <code>self.q_hat</code> set to the
calibrated</strong></td>
</tr>
</tbody>
</table>

### peshbeen.probabilistic_forecasting.mv_prob_forecasts.sample

``` python
sample(
    df: 'pd.DataFrame',
    n_samples: 'int' = 1000,
    method: 'str' = 'empirical',
    future_exog: 'Union[pd.DataFrame, None]' = None
)
```

Draw sample paths from the predictive distribution.

Three methods are available:

- `"empirical"` — residuals are resampled with replacement independently
  at each horizon.
- `"kde"` — a Gaussian KDE is fitted to each horizon’s residuals;
  samples are drawn from the smoothed distribution.
- `"correlated"` — a multivariate normal is fitted to the full
  `H`-dimensional residual vectors, preserving cross-horizon
  correlation. Samples are drawn jointly.

Results are stored on `self`:

- `self.sample_paths` — `(n_samples, H)` array of sampled trajectories
  centred on the point forecast.
- `self.point_forecast` — `(H,)` point forecast array.
- `self.sample_paths_df` — the same data as a DataFrame with columns
  `h_1, …, h_H`.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>Training data. Residuals are computed via cross-validation if not
yet available.</td>
</tr>
<tr>
<td>n_samples</td>
<td>int</td>
<td>1000</td>
<td>Number of sample paths to draw.</td>
</tr>
<tr>
<td>method</td>
<td>str</td>
<td>empirical</td>
<td>Sampling strategy (see above).</td>
</tr>
<tr>
<td>future_exog</td>
<td>Union[pd.DataFrame, None]</td>
<td>None</td>
<td>Future exogenous variables passed to <code>forecast</code>.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>‘prob_forecasts’</strong></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.probabilistic_forecasting.mv_prob_forecasts.sample_quantiles

``` python
sample_quantiles(
    quantiles: 'Union[float, List[float]]'
)
```

Compute quantiles from the sample paths generated by `sample`.

Works identically regardless of which `method` was passed to `sample`.

<table>
<colgroup>
<col style="width: 9%" />
<col style="width: 38%" />
<col style="width: 52%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>quantiles</td>
<td>Union[float, List[float]]</td>
<td>Desired quantile levels (e.g. <code>[0.1, 0.5, 0.9]</code>).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>pd.DataFrame</strong></td>
<td><strong>Columns: <code>point_forecast</code>,
<code>q_&lt;level&gt;</code> for each level.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.probabilistic_forecasting.mv_prob_forecasts.conformal_quantiles

``` python
conformal_quantiles(
    df: 'pd.DataFrame',
    quantiles: 'Union[float, List[float]]',
    future_exog: 'Union[pd.DataFrame, None]' = None
)
```

Generate conformal prediction quantiles.

Requires `calibrate` to have been called first.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>pd.DataFrame</td>
<td></td>
<td>Training data for the final model fit.</td>
</tr>
<tr>
<td>quantiles</td>
<td>Union[float, List[float]]</td>
<td></td>
<td>Desired quantile levels (e.g. <code>[0.1, 0.5, 0.9]</code>).</td>
</tr>
<tr>
<td>future_exog</td>
<td>Union[pd.DataFrame, None]</td>
<td>None</td>
<td>Future exogenous variables.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>pd.DataFrame</strong></td>
<td></td>
<td><strong>Columns: <code>point_forecast</code>,
<code>q_&lt;level&gt;</code> for each level.</strong></td>
</tr>
</tbody>
</table>

# Model selection and evaluation

## Hyperparameters tuning methods for Univariate machine learning models

### peshbeen.model_selection.hyperopt_tune

``` python
hyperopt_tune(
    model: Any,
    df: pandas.core.frame.DataFrame,
    cv_split: int,
    test_size: int,
    eval_metric: Callable,
    param_space: Dict[str, Any],
    step_size: int = None,
    eval_num: int = 100,
    candidate_exog: List[str] = None,
    pareto_bounds: float | Tuple[float, float] = (0.5, 0.999),
    verbose: bool = False
)
```

Tune forecasting model hyperparameters using time series
cross-validation and Hyperopt.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>Any</td>
<td></td>
<td>Forecasting model with .fit and .forecast methods.</td>
</tr>
<tr>
<td>df</td>
<td>DataFrame</td>
<td></td>
<td>Time series data (datetime index, target column, optional exogenous
features).</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of cross-validation splits.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>Number of samples in each test fold. For ml_direct_forecaster, this
will be overridden to be the maximum horizon in model.H.</td>
</tr>
<tr>
<td>eval_metric</td>
<td>Callable</td>
<td></td>
<td>Metric function to minimise.</td>
</tr>
<tr>
<td>param_space</td>
<td>Dict</td>
<td></td>
<td>Each value must be a callable that accepts a Hyperopt
<code>trial</code> and returns a value.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>None</td>
<td>Step size between CV folds.</td>
</tr>
<tr>
<td>eval_num</td>
<td>int</td>
<td>100</td>
<td>Number of Hyperopt trials. Default 100.</td>
</tr>
<tr>
<td>candidate_exog</td>
<td>List</td>
<td>None</td>
<td>List of exogenous feature names to consider for feature
importance-based selection. If None, no feature selection is
performed.</td>
</tr>
<tr>
<td>pareto_bounds</td>
<td>Union</td>
<td>(0.5, 0.999)</td>
<td>If a float is provided, it is used as a fixed cutoff for cumulative
importance (e.g., 0.8 means keep features that explain 80% of variance).
If a tuple is provided, it defines the lower and upper bounds for tuning
the Pareto cutoff. Default is (0.5, 0.999), meaning the cutoff will be
tuned between 50% and 99.9% of cumulative importance.</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td>Print score for every trial. Default False.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Tuple</strong></td>
<td></td>
<td><strong>Best hyperparameters and best lags (if ‘lags’ is in
param_space).</strong></td>
</tr>
</tbody>
</table>

### peshbeen.model_selection.optuna_tune

``` python
optuna_tune(
    model: Any,
    df: pandas.core.frame.DataFrame,
    cv_split: int,
    test_size: int,
    eval_metric: Callable,
    param_space: Dict[str, Any],
    step_size: int = None,
    eval_num: int = 100,
    candidate_exog: List[str] = None,
    pareto_bounds: float | Tuple[float, float] = (0.5, 0.999),
    verbose: bool = False
)
```

Tune forecasting model hyperparameters using time series
cross-validation and Optuna.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>Any</td>
<td></td>
<td>Forecasting model with .fit and .forecast methods.</td>
</tr>
<tr>
<td>df</td>
<td>DataFrame</td>
<td></td>
<td>Time series data (datetime index, target column, optional exogenous
features).</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of cross-validation splits.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>Number of samples in each test fold. For ml_direct_forecaster, this
will be overridden to be the maximum horizon in model.H.</td>
</tr>
<tr>
<td>eval_metric</td>
<td>Callable</td>
<td></td>
<td>Metric function to minimise.</td>
</tr>
<tr>
<td>param_space</td>
<td>Dict</td>
<td></td>
<td>Each value must be a callable that accepts an Optuna
<code>trial</code> and returns a value.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>None</td>
<td>Step size between CV folds.</td>
</tr>
<tr>
<td>eval_num</td>
<td>int</td>
<td>100</td>
<td>Number of Optuna trials. Default 100.</td>
</tr>
<tr>
<td>candidate_exog</td>
<td>List</td>
<td>None</td>
<td>The 800+ features to optimize</td>
</tr>
<tr>
<td>pareto_bounds</td>
<td>Union</td>
<td>(0.5, 0.999)</td>
<td>Global Pareto bounds for feature importance. if float is passed we
will use that as a fixed cutoff, if tuple is passed to be tuned</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td>Print score for every trial. Default False.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Tuple</strong></td>
<td></td>
<td><strong>Best hyperparameters and best lags (if ‘lags’ is in
param_space).</strong></td>
</tr>
</tbody>
</table>

## Hyperparameters tuning methods for Multivariate machine learning models

### peshbeen.model_selection.mv_hyperopt_tune

``` python
mv_hyperopt_tune(
    model: object,
    df: pandas.core.frame.DataFrame,
    target_col: str,
    cv_split: int,
    test_size: int,
    eval_metric: Callable,
    param_space: dict,
    step_size: int = None,
    eval_num=100,
    verbose=False
)
```

Tune forecasting model hyperparameters using time series
cross-validation and hyperopt for multivariate models.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>object</td>
<td></td>
<td>Forecasting model object with .fit and .forecast methods and
relevant attributes.</td>
</tr>
<tr>
<td>df</td>
<td>DataFrame</td>
<td></td>
<td>Time series data with a datetime index and a target column and
optionally exogenous features.</td>
</tr>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Name of the target column to minimize the evaluation metric on.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of cross-validation splits.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>Number of samples in each test set.</td>
</tr>
<tr>
<td>eval_metric</td>
<td>Callable</td>
<td></td>
<td>Evaluation metric function.</td>
</tr>
<tr>
<td>param_space</td>
<td>dict</td>
<td></td>
<td>Hyperparameter search space for the forecasting model.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>None</td>
<td>Step size to move the test window forward in each split.</td>
</tr>
<tr>
<td>eval_num</td>
<td>int</td>
<td>100</td>
<td>Number of hyperparameter combinations to evaluate. Default is
100.</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td>Whether to print the evaluation metric for each hyperparameter
combination. Default is False.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Tuple</strong></td>
<td></td>
<td><strong>A tuple containing the best hyperparameters, selected lags,
and selected transforms.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.model_selection.mv_optuna_tune

``` python
mv_optuna_tune(
    model: object,
    df: pandas.core.frame.DataFrame,
    target_col: str,
    cv_split: int,
    test_size: int,
    eval_metric: Callable,
    param_space: Dict[str, Any],
    step_size: int = None,
    eval_num: int = 100,
    verbose: bool = False
)
```

Tune forecasting model hyperparameters using time series
cross-validation and Optuna.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>object</td>
<td></td>
<td>Forecasting model with .fit and .forecast methods.</td>
</tr>
<tr>
<td>df</td>
<td>DataFrame</td>
<td></td>
<td>Time series data (datetime index, target column, optional exogenous
features).</td>
</tr>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Name of the target column to minimize the evaluation metric on.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of cross-validation splits.</td>
</tr>
<tr>
<td>test_size</td>
<td>int</td>
<td></td>
<td>Number of samples in each test fold.</td>
</tr>
<tr>
<td>eval_metric</td>
<td>Callable</td>
<td></td>
<td>Metric function to minimise.</td>
</tr>
<tr>
<td>param_space</td>
<td>Dict</td>
<td></td>
<td>Each value must be a callable that accepts an Optuna
<code>trial</code> and returns a value.</td>
</tr>
<tr>
<td>step_size</td>
<td>int</td>
<td>None</td>
<td>Step size between CV folds.</td>
</tr>
<tr>
<td>eval_num</td>
<td>int</td>
<td>100</td>
<td>Number of Optuna trials. Default 100.</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td>Print score for every trial. Default False.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>Tuple</strong></td>
<td></td>
<td><strong>Best hyperparameters and best lags (if ‘lags’ is in
param_space).</strong></td>
</tr>
</tbody>
</table>

## Feature selection methods for univariate time series models

### peshbeen.model_selection.forward_feature_selection

``` python
forward_feature_selection(
    model: object,
    df: pandas.core.frame.DataFrame,
    cv_split: int,
    H: int,
    step_size: int | None = None,
    metrics: Callable | List[Callable] = None,
    lags_to_consider: int | None = None,
    candidate_features: List[str] | None = None,
    transformations: List | None = None,
    starting_lags: List[int] | None = None,
    starting_transforms: List | None = None,
    best_start_score: List[float] | None = None,
    verbose=False
)
```

Forward stepwise feature selection for
[`ml_forecaster`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster)
models.

At each iteration every remaining candidate (lag, exogenous column, or
lag-transform) is tested individually by adding it to the current best
feature set. The candidate that produces the largest cross-validation
improvement is permanently added. The loop continues until no remaining
candidate improves any of the evaluation metrics.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>object</td>
<td></td>
<td>A <em>configured but unfitted</em> <a
href="https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster"><code>ml_forecaster</code></a>
instance. The function works exclusively on deep copies and never
mutates the object passed in.</td>
</tr>
<tr>
<td>df</td>
<td>DataFrame</td>
<td></td>
<td>Full training DataFrame. Must contain the target column and any
candidate exogenous columns.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of time-series cross-validation folds.</td>
</tr>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon (test window size for each fold).</td>
</tr>
<tr>
<td>step_size</td>
<td>Union</td>
<td>None</td>
<td>Step size between consecutive CV folds. If <code>None</code>
(default) the step equals <code>H</code>, producing non-overlapping
folds — consistent with the default behaviour of <a
href="https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster.cross_validate"><code>ml_forecaster.cross_validate</code></a>.</td>
</tr>
<tr>
<td>metrics</td>
<td>Union</td>
<td>None</td>
<td>One or more metric functions accepted by <a
href="https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster.cross_validate"><code>ml_forecaster.cross_validate</code></a>
(e.g. <code>[MAE, RMSE]</code>). Selection is driven by the
<strong>first</strong> metric in the list; a candidate is only accepted
when it improves <strong>all</strong> metrics simultaneously.</td>
</tr>
<tr>
<td>lags_to_consider</td>
<td>Union</td>
<td>None</td>
<td>Consider lags <code>1, 2, ..., lags_to_consider</code> as
candidates. If <code>None</code>, lag selection is skipped.</td>
</tr>
<tr>
<td>candidate_features</td>
<td>Union</td>
<td>None</td>
<td>Column names in <code>df</code> that are exogenous feature
candidates. The function never modifies this list. If <code>None</code>,
exogenous feature selection is skipped.</td>
</tr>
<tr>
<td>transformations</td>
<td>Union</td>
<td>None</td>
<td>Lag-transform objects to test as candidates
(e.g. <code>[rolling_mean(3, 1), expanding_std(1)]</code>). The function
never modifies this list. If <code>None</code>, transform selection is
skipped.</td>
</tr>
<tr>
<td>starting_lags</td>
<td>Union</td>
<td>None</td>
<td>Lags to include in the initial feature set before the search begins.
These are <em>not</em> candidates — they are always included. Must be a
list (e.g. <code>[1]</code> or <code>[1, 2, 3]</code>).</td>
</tr>
<tr>
<td>starting_transforms</td>
<td>Union</td>
<td>None</td>
<td>Lag-transform objects to include in the initial feature set before
the search begins. Must be a list.</td>
</tr>
<tr>
<td>best_start_score</td>
<td>Union</td>
<td>None</td>
<td>Initial best scores for each metric. If not provided, the function
will compute the baseline score using the model with the starting
features (if any) before beginning the search.</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td>Print a message each time a candidate is accepted.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td><strong>A dictionary with keys <code>best_lags</code>,
<code>best_exogs</code>, and <code>best_transforms</code> containing the
selected features.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.model_selection.backward_feature_selection

``` python
backward_feature_selection(
    model: object,
    df: pandas.core.frame.DataFrame,
    cv_split: int,
    H: int,
    step_size: int | None = None,
    metrics: Callable | List[Callable] = None,
    lags_to_consider: List[int] | None = None,
    candidate_features: List[str] | None = None,
    transformations: List | None = None,
    verbose=False
)
```

Backward stepwise feature selection for
[`ml_forecaster`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster)
models.

Starts with the full feature set (all provided lags, exogenous columns,
and lag-transforms) and at each iteration tries removing each current
feature individually. The feature whose removal produces the largest
cross-validation improvement is permanently dropped. The loop continues
until no remaining feature can be removed without hurting any of the
evaluation metrics.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>object</td>
<td></td>
<td>A <em>configured but unfitted</em> <a
href="https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster"><code>ml_forecaster</code></a>
instance. The function works exclusively on deep copies and never
mutates the object passed in.</td>
</tr>
<tr>
<td>df</td>
<td>DataFrame</td>
<td></td>
<td>Full training DataFrame. Must contain the target column and any
candidate exogenous columns.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of time-series cross-validation folds.</td>
</tr>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon (test window size for each fold).</td>
</tr>
<tr>
<td>step_size</td>
<td>Union</td>
<td>None</td>
<td>Step size between consecutive CV folds. If <code>None</code>
(default) the step equals <code>H</code>, producing non-overlapping
folds.</td>
</tr>
<tr>
<td>metrics</td>
<td>Union</td>
<td>None</td>
<td>One or more metric functions accepted by <a
href="https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_forecast.html#ml_forecaster.cross_validate"><code>ml_forecaster.cross_validate</code></a>
(e.g. <code>[MAE, RMSE]</code>). Selection is driven by the
<strong>first</strong> metric in the list; a feature is only removed
when doing so improves <strong>all</strong> metrics simultaneously.</td>
</tr>
<tr>
<td>lags_to_consider</td>
<td>Union</td>
<td>None</td>
<td>Lags to include in the initial feature set and test for removal
(e.g. <code>[1, 2, 3, 4]</code>). If <code>None</code>, no lag removal
is attempted.</td>
</tr>
<tr>
<td>candidate_features</td>
<td>Union</td>
<td>None</td>
<td>Column names in <code>df</code> that start in the model and are
tested for removal. If <code>None</code>, exogenous feature removal is
skipped.</td>
</tr>
<tr>
<td>transformations</td>
<td>Union</td>
<td>None</td>
<td>Lag-transform objects that start in the model and are tested for
removal (e.g. <code>[rolling_mean(3, 1), expanding_std(1)]</code>). If
<code>None</code>, transform removal is skipped.</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td>Print a message each time a feature is removed.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td><strong>A dictionary with keys <code>best_lags</code>,
<code>best_exogs</code>, and <code>best_transforms</code> containing the
surviving features after backward selection.</strong></td>
</tr>
</tbody>
</table>

## Feature selection methods for multivariate time series models

### peshbeen.model_selection.mv_forward_feature_selection

``` python
mv_forward_feature_selection(
    model: object,
    df: pandas.core.frame.DataFrame,
    target_col: str,
    cv_split: int,
    H: int,
    step_size=None,
    metrics=None,
    lags_to_consider=None,
    candidate_features=None,
    transformations=None,
    starting_lags=None,
    starting_transforms=None,
    verbose=False
)
```

Forward stepwise feature selection for
[`ml_mv_forecaster`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_mv_forecast.html#ml_mv_forecaster).

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>object</td>
<td></td>
<td>Template model — never mutated.</td>
</tr>
<tr>
<td>df</td>
<td>DataFrame</td>
<td></td>
<td>DataFrame containing the target variable and any candidate
features.</td>
</tr>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Target variable used to evaluate cross-validation score.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of time-series cross-validation folds.</td>
</tr>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon / test size per fold.</td>
</tr>
<tr>
<td>step_size</td>
<td>NoneType</td>
<td>None</td>
<td>Rolling-window step size (defaults to H).</td>
</tr>
<tr>
<td>metrics</td>
<td>NoneType</td>
<td>None</td>
<td>One or more metric functions (e.g. <code>[MAE, RMSE]</code>).
Selection is driven by the <strong>first</strong> metric in the list; a
candidate is only accepted when it improves <strong>all</strong> metrics
simultaneously.</td>
</tr>
<tr>
<td>lags_to_consider</td>
<td>NoneType</td>
<td>None</td>
<td><code>{col: max_lag}</code> — lags 1..max_lag are candidates.</td>
</tr>
<tr>
<td>candidate_features</td>
<td>NoneType</td>
<td>None</td>
<td>Exogenous columns to consider adding.</td>
</tr>
<tr>
<td>transformations</td>
<td>NoneType</td>
<td>None</td>
<td><code>{col: [transform_objects]}</code> — transform candidates per
target.</td>
</tr>
<tr>
<td>starting_lags</td>
<td>NoneType</td>
<td>None</td>
<td>Lags already included before search begins.</td>
</tr>
<tr>
<td>starting_transforms</td>
<td>NoneType</td>
<td>None</td>
<td>Transforms already included before search begins.</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td></td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td><strong><code>{"best_lags": {col: [...]}, "best_exogs": [...],    "best_transforms": {col: [name_str, ...]}}</code></strong></td>
</tr>
</tbody>
</table>

### peshbeen.model_selection.mv_backward_feature_selection

``` python
mv_backward_feature_selection(
    model: object,
    df: pandas.core.frame.DataFrame,
    target_col: str,
    cv_split: int,
    H: int,
    step_size=None,
    metrics=None,
    lags_to_consider=None,
    candidate_features=None,
    transformations=None,
    verbose=False
)
```

Backward stepwise feature selection for
`[`ml_mv_forecaster`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ml_mv_forecast.html#ml_mv_forecaster)`.

Starts with all candidate features included and iteratively removes the
one whose removal most improves cross-validation score.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>object</td>
<td></td>
<td>Template model — never mutated.</td>
</tr>
<tr>
<td>df</td>
<td>DataFrame</td>
<td></td>
<td>All candidate exog columns must already be present.</td>
</tr>
<tr>
<td>target_col</td>
<td>str</td>
<td></td>
<td>Target variable used to evaluate cross-validation score.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td></td>
</tr>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon / test size per fold.</td>
</tr>
<tr>
<td>step_size</td>
<td>NoneType</td>
<td>None</td>
<td>Rolling-window step size (defaults to H).</td>
</tr>
<tr>
<td>metrics</td>
<td>NoneType</td>
<td>None</td>
<td>One or more metric functions (e.g. <code>[MAE, RMSE]</code>). A
feature is only removed when its removal improves <strong>all</strong>
metrics simultaneously.</td>
</tr>
<tr>
<td>lags_to_consider</td>
<td>NoneType</td>
<td>None</td>
<td><code>{col: max_lag}</code> — all lags 1..max_lag start as
selected.</td>
</tr>
<tr>
<td>candidate_features</td>
<td>NoneType</td>
<td>None</td>
<td>Exogenous columns that start as selected.</td>
</tr>
<tr>
<td>transformations</td>
<td>NoneType</td>
<td>None</td>
<td><code>{col: [transform_objects]}</code> — all transforms start as
selected.</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td></td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td><strong><code>{"best_lags": {col: [...]}, "best_exogs": [...],    "best_transforms": {col: [name_str, ...]}}</code></strong></td>
</tr>
</tbody>
</table>

## Feature selection methods for Markov Switching Autoregressive Regression

### peshbeen.model_selection.ms_arr_forward_feature_selection

``` python
ms_arr_forward_feature_selection(
    model: object,
    df: pandas.core.frame.DataFrame,
    cv_split: int,
    H: int,
    step_size: int | None = None,
    metrics: Callable | List[Callable] = None,
    lags_to_consider: int | List[int] | None = None,
    candidate_features: List[str] | None = None,
    transformations: List | None = None,
    starting_lags: List[int] | None = None,
    starting_transforms: List | None = None,
    validation_type: str = 'cv',
    iterations: int = 10,
    verbose: bool = False
)
```

Forward stepwise feature selection for
[`ms_arr`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ms_arr.html#ms_arr)
models.

At each iteration every remaining candidate (lag, exogenous column, or
lag-transform) is tested individually by adding it to the current best
feature set. The candidate that produces the largest improvement is
permanently added. The loop continues until no remaining candidate
improves the evaluation criterion.

The HMM state (A, pi, stds, coeffs) is warm-started from the round
winner and propagated to subsequent rounds for consistent
initialisation.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>object</td>
<td></td>
<td>A configured <a
href="https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ms_arr.html#ms_arr"><code>ms_arr</code></a>
instance with <code>fit_em()</code> already called (recommended to use
few EM iterations for this initial fit, e.g. <code>iterations=10</code>)
or a template model with the same configuration but not yet fitted. The
model is copied internally and never mutated, so the caller’s instance
remains unchanged.</td>
</tr>
<tr>
<td>df</td>
<td>DataFrame</td>
<td></td>
<td>Full training DataFrame. Must contain the target column and any
candidate exogenous columns.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of time-series cross-validation folds.</td>
</tr>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon (test window size for each fold).</td>
</tr>
<tr>
<td>step_size</td>
<td>Union</td>
<td>None</td>
<td>Step size between consecutive CV folds. Defaults to H.</td>
</tr>
<tr>
<td>metrics</td>
<td>Union</td>
<td>None</td>
<td>Required when validation_type=‘cv’. Selection driven by first
metric; a candidate is accepted only when it improves all metrics.</td>
</tr>
<tr>
<td>lags_to_consider</td>
<td>Union</td>
<td>None</td>
<td>Candidate lags. Int → 1..n; list → specific lags.</td>
</tr>
<tr>
<td>candidate_features</td>
<td>Union</td>
<td>None</td>
<td>Exogenous column names to test as candidates.</td>
</tr>
<tr>
<td>transformations</td>
<td>Union</td>
<td>None</td>
<td>Lag-transform objects to test as candidates.</td>
</tr>
<tr>
<td>starting_lags</td>
<td>Union</td>
<td>None</td>
<td>Lags always included in the initial set (not candidates).</td>
</tr>
<tr>
<td>starting_transforms</td>
<td>Union</td>
<td>None</td>
<td>Transforms always included in the initial set (not candidates).</td>
</tr>
<tr>
<td>validation_type</td>
<td>str</td>
<td>cv</td>
<td>Criterion for selection: ‘cv’, ‘AIC’, ‘BIC’, or ‘AIC_BIC’. When
‘cv’, metrics must be provided and drive selection. When ‘AIC’ or ‘BIC’,
the respective information criterion is used. When ‘AIC_BIC’, a
candidate is accepted only if it improves both AIC and BIC.</td>
</tr>
<tr>
<td>iterations</td>
<td>int</td>
<td>10</td>
<td>EM iterations used inside fit_em() for each candidate
evaluation.</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td>Print a message each time a candidate is accepted.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td><strong><code>{"best_lags": [...], "best_exogs": [...], "best_transforms": [...]}</code></strong></td>
</tr>
</tbody>
</table>

### peshbeen.model_selection.ms_arr_backward_feature_selection

``` python
ms_arr_backward_feature_selection(
    df: pandas.core.frame.DataFrame,
    cv_split: int,
    H: int,
    step_size: int | None = None,
    model: object = None,
    metrics: Callable | List[Callable] = None,
    lags_to_consider: int | List[int] | None = None,
    candidate_features: List[str] | None = None,
    transformations: List | None = None,
    validation_type: str = 'cv',
    iterations: int = 100,
    verbose: bool = False
)
```

Backward stepwise feature selection for
[`ms_arr`](https://mustafaslanCoto.github.io/peshbeen/modules/02_models/ms_arr.html#ms_arr)
models.

Starts with the full feature set and at each iteration tries removing
each current feature individually. The feature whose removal produces
the largest improvement is permanently dropped. The loop continues until
no removal improves the evaluation criterion.

The HMM state (A, pi, stds, coeffs) is warm-started from the round
winner and propagated to subsequent rounds.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>df</td>
<td>DataFrame</td>
<td></td>
<td>Full training DataFrame. All candidate exogenous columns must be
present.</td>
</tr>
<tr>
<td>cv_split</td>
<td>int</td>
<td></td>
<td>Number of time-series cross-validation folds.</td>
</tr>
<tr>
<td>H</td>
<td>int</td>
<td></td>
<td>Forecast horizon (test window size for each fold).</td>
</tr>
<tr>
<td>step_size</td>
<td>Union</td>
<td>None</td>
<td>Step size between consecutive CV folds. Defaults to H.</td>
</tr>
<tr>
<td>model</td>
<td>object</td>
<td>None</td>
<td>A configured but unfitted ms_arr instance. Never mutated.</td>
</tr>
<tr>
<td>metrics</td>
<td>Union</td>
<td>None</td>
<td>Required when validation_type=‘cv’. A feature is only removed
when</td>
</tr>
<tr>
<td>doing so improves all metrics simultaneously.</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>lags_to_consider</td>
<td>Union</td>
<td>None</td>
<td>Initial lag set. Int → 1..n; list → specific lags.</td>
</tr>
<tr>
<td>candidate_features</td>
<td>Union</td>
<td>None</td>
<td>Exogenous columns that start in the model and are tested for
removal.</td>
</tr>
<tr>
<td>transformations</td>
<td>Union</td>
<td>None</td>
<td>Lag-transform objects that start in the model and are tested for
removal.</td>
</tr>
<tr>
<td>validation_type</td>
<td>str</td>
<td>cv</td>
<td>Criterion for selection: ‘cv’, ‘AIC’, ‘BIC’, or ‘AIC_BIC’.</td>
</tr>
<tr>
<td>iterations</td>
<td>int</td>
<td>100</td>
<td>EM iterations used inside fit_em() for each candidate
evaluation.</td>
</tr>
<tr>
<td>verbose</td>
<td>bool</td>
<td>False</td>
<td>Print a message each time a feature is removed.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td><strong><code>{"best_lags": [...], "best_exogs": [...], "best_transforms": [...]}</code></strong></td>
</tr>
</tbody>
</table>

# Transformations

### peshbeen.transformations.fourier_terms

``` python
fourier_terms(
    index: 'Union[pd.Index, tuple]',
    period: 'Union[int, float]',
    num_terms: 'int',
    frequency: 'Optional[str]' = None,
    t_start: 'Optional[int]' = None
)
```

Generate Fourier terms for a given index or (start, end) tuple.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>index</td>
<td>Union[pd.Index, tuple]</td>
<td></td>
<td>Either a pandas Index directly (recommended), or a (start, end)
tuple of integers or datetime strings.</td>
</tr>
<tr>
<td>period</td>
<td>Union[int, float]</td>
<td></td>
<td>The period of the seasonality (e.g., 365.25/7 for weekly yearly
seasonality).</td>
</tr>
<tr>
<td>num_terms</td>
<td>int</td>
<td></td>
<td>The number of Fourier term pairs (sin + cos) to generate.</td>
</tr>
<tr>
<td>frequency</td>
<td>Optional[str]</td>
<td>None</td>
<td>Frequency string (e.g., “W-SAT”, “D”, “M”, “W”). Only relevant when
index is a (start, end) tuple.</td>
</tr>
<tr>
<td>t_start</td>
<td>Optional[int]</td>
<td>None</td>
<td>Starting position of t. Only used when index is a (start, end)
tuple. Use len(train_index) to ensure continuity between train and
test.</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td><strong>pd.DataFrame</strong></td>
<td></td>
<td><strong>DataFrame of Fourier terms aligned to the provided
index.</strong></td>
</tr>
</tbody>
</table>

### peshbeen.transformations.rolling_mean

``` python
rolling_mean(
    window_size: 'int',
    shift: 'int' = 1,
    min_samples: 'int' = 1
)
```

A class to compute the rolling mean of a time series with specified
window size and shift.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>window_size</td>
<td>int</td>
<td></td>
<td>The size of the rolling window.</td>
</tr>
<tr>
<td>shift</td>
<td>int</td>
<td>1</td>
<td>The number of periods to shift the data before applying the rolling
mean (default is 1).</td>
</tr>
<tr>
<td>min_samples</td>
<td>int</td>
<td>1</td>
<td>The minimum number of observations in the window required to have a
value (default is 1).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.transformations.rolling_std

``` python
rolling_std(
    window_size: 'int',
    shift: 'int' = 1,
    min_samples: 'int' = 1
)
```

A class to compute the rolling standard deviation of a time series with
specified window size and shift.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>window_size</td>
<td>int</td>
<td></td>
<td>The size of the rolling window.</td>
</tr>
<tr>
<td>shift</td>
<td>int</td>
<td>1</td>
<td>The number of periods to shift the data before applying the rolling
standard deviation (default is 1).</td>
</tr>
<tr>
<td>min_samples</td>
<td>int</td>
<td>1</td>
<td>The minimum number of observations in the window required to have a
value (default is 1).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.transformations.rolling_quantile

``` python
rolling_quantile(
    window_size: 'int',
    quantile: 'float',
    shift: 'int' = 1,
    min_samples: 'int' = 1
)
```

A class to compute the rolling quantile of a time series with specified
window size, quantile, and shift.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>window_size</td>
<td>int</td>
<td></td>
<td>The size of the rolling window.</td>
</tr>
<tr>
<td>quantile</td>
<td>float</td>
<td></td>
<td>The quantile to compute (between 0 and 1).</td>
</tr>
<tr>
<td>shift</td>
<td>int</td>
<td>1</td>
<td>The number of periods to shift the data before applying the rolling
quantile (default is 1).</td>
</tr>
<tr>
<td>min_samples</td>
<td>int</td>
<td>1</td>
<td>The minimum number of observations in the window required to have a
value (default is 1).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.transformations.rolling_min

``` python
rolling_min(
    window_size: 'int',
    shift: 'int' = 1,
    min_samples: 'int' = 1
)
```

A class to compute the rolling minimum of a time series with specified
window size and shift.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>window_size</td>
<td>int</td>
<td></td>
<td>The size of the rolling window.</td>
</tr>
<tr>
<td>shift</td>
<td>int</td>
<td>1</td>
<td>The number of periods to shift the data before applying the rolling
minimum (default is 1).</td>
</tr>
<tr>
<td>min_samples</td>
<td>int</td>
<td>1</td>
<td>The minimum number of observations in the window required to have a
value (default is 1).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.transformations.rolling_max

``` python
rolling_max(
    window_size: 'int',
    shift: 'int' = 1,
    min_samples: 'int' = 1
)
```

A class to compute the rolling maximum of a time series with specified
window size and shift.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>window_size</td>
<td>int</td>
<td></td>
<td>The size of the rolling window.</td>
</tr>
<tr>
<td>shift</td>
<td>int</td>
<td>1</td>
<td>The number of periods to shift the data before applying the rolling
maximum (default is 1).</td>
</tr>
<tr>
<td>min_samples</td>
<td>int</td>
<td>1</td>
<td>The minimum number of observations in the window required to have a
value (default is 1).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.transformations.expanding_mean

``` python
expanding_mean(
    shift: 'int' = 1
)
```

A class to compute the expanding mean of a time series with specified
shift.

<table>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>shift</td>
<td>int</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.transformations.expanding_std

``` python
expanding_std(
    shift: 'int' = 1
)
```

A class to compute the expanding standard deviation of a time series
with specified shift.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>shift</td>
<td>int</td>
<td>1</td>
<td>The number of periods to shift the data before applying the
expanding standard deviation (default is 1).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### peshbeen.transformations.expanding_quantile

``` python
expanding_quantile(
    shift: 'int' = 1,
    quantile: 'float' = 0.5
)
```

A class to compute the expanding quantile of a time series with
specified shift and quantile.

<table>
<colgroup>
<col style="width: 6%" />
<col style="width: 25%" />
<col style="width: 34%" />
<col style="width: 34%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Default</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>shift</td>
<td>int</td>
<td>1</td>
<td>The number of periods to shift the data before applying the
expanding quantile (default is 1).</td>
</tr>
<tr>
<td>quantile</td>
<td>float</td>
<td>0.5</td>
<td>The quantile to compute (between 0 and 1) (default is 0.5).</td>
</tr>
<tr>
<td><strong>Returns</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
