direct_ml – peshbeen

ml_direct_forecaster


def ml_direct_forecaster(
    model:Any, # A regression model object (e.g. LGBMRegressor(), XGBRegressor(), LinearRegression(), etc.)
    target_col:str, # Name of the target variable column in the input DataFrame.
    H:Optional[Union[int, List[int]]], # Forecast horizon(s). If int, forecasts 1..H. If list, forecasts specified horizons.
    lags:Optional[Union[int, List[int]]]=None, # Lags to include as features. Default is None.
    lag_transform:Optional[list]=None, # Lag-transform functions to apply to the target variable. Default is None.
    difference:Optional[int]=None, # Order of ordinary differencing. Default is None.
    seasonal_diff:Optional[int]=None, # Seasonal period for seasonal differencing. Default is None.
    trend:Optional[str]=None, # Trend strategy: 'linear' or 'ets'. Default is None.
    pol_degree:int=1, # Polynomial degree for linear trend. Default is 1.
    ets_params:Optional[Dict[str, Any]]=None, # Parameters for ExponentialSmoothing when trend='ets'. Default is None.
    change_points:Optional[List[int]]=None, # Breakpoint indices for piecewise linear trend. Default is None.
    box_cox:Union[bool, float, int]=False, # Box-Cox transformation. If float/int, used as lambda. If True, lambda is estimated. Default is False.
    box_cox_biasadj:bool=False, # Bias adjustment when inverting Box-Cox. Default is False.
    cat_variables:Optional[List[str]]=None, # List of categorical feature column names. If provided, these columns will be treated as categorical variables and encoded accordingly. Default is None (no categorical variables).
    categorical_encoder:Optional[Any]=None, # Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(), etc.) to apply to the categorical variables specified in cat_variables. The encoder should have fit() and transform() methods that can be applied to the input DataFrame. Default is None (no categorical encoding) and if None, categorical variables can only be used if the model can handle them natively (e.g. LGBM or CatBoost).
)->None:

Initialize the ml_direct_forecaster with the specified model and preprocessing options. Unlike ml_forecaster, this class uses a direct forecasting strategy: a separate model is trained for each horizon h.

	Type	Default	Details
model	Any		A regression model object (e.g. LGBMRegressor(), XGBRegressor(), LinearRegression(), etc.)
target_col	str		Name of the target variable column in the input DataFrame.
H	Optional[Union[int, List[int]]]		Forecast horizon(s). If int, forecasts 1..H. If list, forecasts specified horizons.
lags	Optional[Union[int, List[int]]]	None	Lags to include as features. Default is None.
lag_transform	Optional[list]	None	Lag-transform functions to apply to the target variable. Default is None.
difference	Optional[int]	None	Order of ordinary differencing. Default is None.
seasonal_diff	Optional[int]	None	Seasonal period for seasonal differencing. Default is None.
trend	Optional[str]	None	Trend strategy: ‘linear’ or ‘ets’. Default is None.
pol_degree	int	1	Polynomial degree for linear trend. Default is 1.
ets_params	Optional[Dict[str, Any]]	None	Parameters for ExponentialSmoothing when trend=‘ets’. Default is None.
change_points	Optional[List[int]]	None	Breakpoint indices for piecewise linear trend. Default is None.
box_cox	Union[bool, float, int]	False	Box-Cox transformation. If float/int, used as lambda. If True, lambda is estimated. Default is False.
box_cox_biasadj	bool	False	Bias adjustment when inverting Box-Cox. Default is False.
cat_variables	Optional[List[str]]	None	List of categorical feature column names. If provided, these columns will be treated as categorical variables and encoded accordingly. Default is None (no categorical variables).
categorical_encoder	Optional[Any]	None	Categorical encoder object (e.g. OneHotEncoder(), MeanEncoder(), etc.) to apply to the categorical variables specified in cat_variables. The encoder should have fit() and transform() methods that can be applied to the input DataFrame. Default is None (no categorical encoding) and if None, categorical variables can only be used if the model can handle them natively (e.g. LGBM or CatBoost).
Returns	None

source

ml_direct_forecaster.fit


def fit(
    df:pd.DataFrame, # Training DataFrame containing the target and any feature columns.
)->None:

Fit a separate model for each horizon h in 1..H. For each h, the target is shifted h steps forward so the model learns to predict the value h steps ahead directly from the current lag features, bypassing recursive error accumulation.

	Type	Details
df	pd.DataFrame	Training DataFrame containing the target and any feature columns.
Returns	None

source

ml_direct_forecaster.forecast


def forecast(
    H:int, # Forecast horizon. Must be <= self.H (models are only trained up to self.H).
    exog:Optional[pd.DataFrame]=None, # Optional future exogenous variables (H rows).
)->np.ndarray: # Forecast values of length H.

Generate direct multi-step forecasts. Each horizon h is predicted independently by its own model using the most recent lag features — no predictions are fed back as inputs.

	Type	Default	Details
H	int		Forecast horizon. Must be <= self.H (models are only trained up to self.H).
exog	Optional[pd.DataFrame]	None	Optional future exogenous variables (H rows).
Returns	np.ndarray		Forecast values of length H.

source

ml_direct_forecaster.cross_validate


def cross_validate(
    df:pd.DataFrame, # DataFrame containing the target and any feature columns.
    cv_split:int, # Number of cross-validation splits.
    metrics:List[Callable], # Metric functions (e.g. ``[MAE, RMSE]``) used to evaluate forecast accuracy across folds. Call ``.cv_summary()`` after cross-validation to retrieve the aggregated scores.
    step_size:int=1, # Step size to move the test window forward in each split.
    h_split_point:Optional[int]=None, # Optional index to split the test set into two parts for separate evaluation (e.g. to evaluate short-term vs long-term performance). If None, no split is done.
)->Tuple[pd.DataFrame, pd.DataFrame]: # DataFrame containing overall performance metrics averaged across splits, and a DataFrame with predictions and true values for each split.

Run cross-validation using time series splits.

	Type	Default	Details
df	pd.DataFrame		DataFrame containing the target and any feature columns.
cv_split	int		Number of cross-validation splits.
metrics	List[Callable]		Metric functions (e.g. `[MAE, RMSE]`) used to evaluate forecast accuracy across folds. Call `.cv_summary()` after cross-validation to retrieve the aggregated scores.
step_size	int	1	Step size to move the test window forward in each split.
h_split_point	Optional[int]	None	Optional index to split the test set into two parts for separate evaluation (e.g. to evaluate short-term vs long-term performance). If None, no split is done.
Returns	Tuple[pd.DataFrame, pd.DataFrame]		DataFrame containing overall performance metrics averaged across splits, and a DataFrame with predictions and true values for each split.