source

ml_direct_forecaster


def ml_direct_forecaster(
    model:Any, # A regression model object (e.g. LGBMRegressor(), XGBRegressor(), LinearRegression(), etc.)
    target_col:str, # Name of the target variable column in the input DataFrame.
    H:Optional[Union[int, List[int]]], # Forecast horizon(s). If int, forecasts 1..H. If list, forecasts specified horizons.
    lags:Optional[Union[int, List[int]]]=None, # Lags to include as features. Default is None.
    lag_transform:Optional[list]=None, # Lag-transform functions to apply to the target variable. Default is None.
    difference:Optional[int]=None, # Order of ordinary differencing. Default is None.
    seasonal_diff:Optional[int]=None, # Seasonal period for seasonal differencing. Default is None.
    trend:Optional[str]=None, # Trend strategy: 'linear' or 'ets'. Default is None.
    pol_degree:int=1, # Polynomial degree for linear trend. Default is 1.
    ets_params:Optional[Dict[str, Any]]=None, # Parameters for ExponentialSmoothing when trend='ets'. Default is None.
    change_points:Optional[List[int]]=None, # Breakpoint indices for piecewise linear trend. Default is None.
    box_cox:Union[bool, float, int]=False, # Box-Cox transformation. If float/int, used as lambda. If True, lambda is estimated. Default is False.
    box_cox_biasadj:bool=False, # Bias adjustment when inverting Box-Cox. Default is False.
    cat_variables:Optional[List[str]]=None, # List of categorical feature column names. If provided, these columns will be treated as categorical variables and encoded accordingly. Default is None (no categorical variables).
    categorical_encoder:Optional[Any]=None, # Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(), etc.) to apply to the categorical variables specified in cat_variables. The encoder should have fit() and transform() methods that can be applied to the input DataFrame. Default is None (no categorical encoding) and if None, categorical variables can only be used if the model can handle them natively (e.g. LGBM or CatBoost).
)->None:

Initialize the ml_direct_forecaster with the specified model and preprocessing options. Unlike ml_forecaster, this class uses a direct forecasting strategy: a separate model is trained for each horizon h.

Type Default Details
model Any A regression model object (e.g. LGBMRegressor(), XGBRegressor(), LinearRegression(), etc.)
target_col str Name of the target variable column in the input DataFrame.
H Optional[Union[int, List[int]]] Forecast horizon(s). If int, forecasts 1..H. If list, forecasts specified horizons.
lags Optional[Union[int, List[int]]] None Lags to include as features. Default is None.
lag_transform Optional[list] None Lag-transform functions to apply to the target variable. Default is None.
difference Optional[int] None Order of ordinary differencing. Default is None.
seasonal_diff Optional[int] None Seasonal period for seasonal differencing. Default is None.
trend Optional[str] None Trend strategy: ‘linear’ or ‘ets’. Default is None.
pol_degree int 1 Polynomial degree for linear trend. Default is 1.
ets_params Optional[Dict[str, Any]] None Parameters for ExponentialSmoothing when trend=‘ets’. Default is None.
change_points Optional[List[int]] None Breakpoint indices for piecewise linear trend. Default is None.
box_cox Union[bool, float, int] False Box-Cox transformation. If float/int, used as lambda. If True, lambda is estimated. Default is False.
box_cox_biasadj bool False Bias adjustment when inverting Box-Cox. Default is False.
cat_variables Optional[List[str]] None List of categorical feature column names. If provided, these columns will be treated as categorical variables and encoded accordingly. Default is None (no categorical variables).
categorical_encoder Optional[Any] None Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(), etc.) to apply to the categorical variables specified in cat_variables. The encoder should have fit() and transform() methods that can be applied to the input DataFrame. Default is None (no categorical encoding) and if None, categorical variables can only be used if the model can handle them natively (e.g. LGBM or CatBoost).
Returns None

source

ml_direct_forecaster.fit


def fit(
    df:pd.DataFrame, # Training DataFrame containing the target and any feature columns.
)->None:

Fit a separate model for each horizon h in 1..H. For each h, the target is shifted h steps forward so the model learns to predict the value h steps ahead directly from the current lag features, bypassing recursive error accumulation.

Type Details
df pd.DataFrame Training DataFrame containing the target and any feature columns.
Returns None

source

ml_direct_forecaster.forecast


def forecast(
    H:int, # Forecast horizon. Must be <= self.H (models are only trained up to self.H).
    exog:Optional[pd.DataFrame]=None, # Optional future exogenous variables (H rows).
)->np.ndarray: # Forecast values of length H.

Generate direct multi-step forecasts. Each horizon h is predicted independently by its own model using the most recent lag features — no predictions are fed back as inputs.

Type Default Details
H int Forecast horizon. Must be <= self.H (models are only trained up to self.H).
exog Optional[pd.DataFrame] None Optional future exogenous variables (H rows).
Returns np.ndarray Forecast values of length H.

source

ml_direct_forecaster.cross_validate


def cross_validate(
    df:pd.DataFrame, # DataFrame containing the target and any feature columns.
    cv_split:int, # Number of cross-validation splits.
    metrics:List[Callable], # Metric functions (e.g. ``[MAE, RMSE]``) used to evaluate forecast accuracy across folds. Call ``.cv_summary()`` after cross-validation to retrieve the aggregated scores.
    step_size:int=1, # Step size to move the test window forward in each split.
    h_split_point:Optional[int]=None, # Optional index to split the test set into two parts for separate evaluation (e.g. to evaluate short-term vs long-term performance). If None, no split is done.
)->Tuple[pd.DataFrame, pd.DataFrame]: # DataFrame containing overall performance metrics averaged across splits, and a DataFrame with predictions and true values for each split.

Run cross-validation using time series splits.

Type Default Details
df pd.DataFrame DataFrame containing the target and any feature columns.
cv_split int Number of cross-validation splits.
metrics List[Callable] Metric functions (e.g. [MAE, RMSE]) used to evaluate forecast accuracy across folds. Call .cv_summary() after cross-validation to retrieve the aggregated scores.
step_size int 1 Step size to move the test window forward in each split.
h_split_point Optional[int] None Optional index to split the test set into two parts for separate evaluation (e.g. to evaluate short-term vs long-term performance). If None, no split is done.
Returns Tuple[pd.DataFrame, pd.DataFrame] DataFrame containing overall performance metrics averaged across splits, and a DataFrame with predictions and true values for each split.