Time Series Forecast (deprecated)

Time Series Forecast (deprecated)

Forecast multivariate time series from year to minute frequency with Deep Learning and statistical models

⚠️ This plugin has been removed from the Dataiku public plugin store as of Dataiku 14.0 release. The functionality covered by this plugin is now available with Dataiku Time Series Forecasting feature.

Plugin Information

Version	1.2.1
Author	Dataiku
Released	2021-02
Last updated	2023-04
License	Apache-2.0
Source code	Github
Reporting issues	Github

⚠️ Starting with DSS version 11 this plugin is considered as “deprecated”, we recommend using the native time series forecasting features.

“Forecasting is required in many situations: deciding whether to build another power generation plant in the next five years requires forecasts of future demand; scheduling staff in a call centre next week requires forecasts of call volumes; stocking an inventory requires forecasts of stock requirements. Forecasts can be required several years in advance (for the case of capital investments), or only a few minutes beforehand (for telecommunication routing). Whatever the circumstances or time horizons involved, forecasting is an important aid to effective and efficient planning.”
— Hyndman, Rob J. and George Athanasopoulos

With this plugin, you will be able to forecast multivariate time series from year to minute frequency with Deep Learning and statistical models. It covers the cycle of model training, evaluation, and prediction, through the two following recipes:

Train and evaluate forecasting models: Train forecasting models and evaluate them on historical data
Forecast future values: Use trained forecasting models to predict future values after your historical dataset

How to set up
How to use
- Train and evaluate forecasting models recipe
- Forecast future values recipe
Models
- Statistical models
- Deep Learning models
Advanced topics

How to set up

Right after installing the plugin, you will need to build its code environment. If this is the first time you install this plugin, click on Build new environment.

Note that Python version 3.6 or 3.7 is required, with system development tools and Python interpreter headers to build the packages. You can refer to this documentation if you need to install these beforehand.

Warning: if you were previously using the former forecast plugin and now want to use this new forecast plugin instead, you will need to update the existing flows with new recipes.

How to use

In this section, we will use the example of forecasting retail sales across multiple stores and departments. You can find the underlying data and flow on this public gallery project.

1. Train and evaluate forecasting models

Use this recipe to train forecasting models and evaluate them on historical data.

Input

Historical dataset with time series data (one parsed date column, one or more numerical target columns, and optionally one or more time series identifiers columns)

Output

Trained model folder to save trained forecasting models
Performance metrics dataset of forecasting models evaluated on a split of the historical dataset
Evaluation dataset with evaluation forecasts used to compute the performance metrics
- This dataset can be used to build charts and visualize your models’ performance

Example of an Evaluation dataset visualized in a chart

Settings

Input parameters

Time column: Column with parsed dates and no missing values
- To parse dates, you can use a Prepare recipe.
- To fill missing values, you can use the Time Series Preparation resampling recipe.
Frequency of the time column, from year to minute
- For minute and hour frequency, you can select the number of minutes or hours.
- For week frequency, you can select the end-of-week day.
Target column(s): Time series columns you want to forecast (must be numeric)
- You can select one (univariate forecasting) or multiple columns (multivariate forecasting).
Long format: Select this option when the dataset contains multiple time series stacked on top of each other
- If selected, you then have to select the columns that identify the multiple time series, with the Time series identifiers parameter.

For example, this long format dataset of weekly sales per store and department has Store and Dept as time series identifiers columns

Sampling

Sampling method: Choose between
- Last records (most recent): To only use the last records of each time series during training (the N most recent data)
- No sampling (whole data): To use all records
Nb. records: Maximum number of records to extract per time series if Last records was selected

Modeling

Forecasting horizon: Number of future values to predict
- This number will be reused in the 2. Forecast future values recipe
- Be careful, high values increase training time.
Forecasting mode: With the following parameter, you can choose to let Dataiku create your models with AutoML modes or have full control over the creation of your models with Expert modes. Check this section to see details on each model.
- You can choose between 4 different forecasting modes:
  - AutoML – Quick prototypes (default): Train baseline models quickly
    - Statistical models: Trivial identity and Seasonal naive are trained
    - Deep Learning models: a FeedForward neural network is trained with 10 epochs of 50 batches with sizes of 32 samples
  - AutoML – High performance: Be patient and get even more accurate models
    - Statistical models: Trivial identity and Seasonal naive are trained
    - Deep Learning models: FeedForward, DeepAR and Transformer are trained with 10 (30 for multivariate) epochs of an automatically adjusted number of batches with sizes of 32 samples
  - Expert – Choose algorithms: Choose which models to train, set the seasonality of the statistical models and tune Deep Learning models training parameters
    - Season length: Length of the seasonal period (in selected frequency unit) used by statistical models.
      - For example, season length is 7 for daily data with a weekly seasonality (season length is 4 for a 6H frequency with a daily seasonality).
    - Number of epochs: Number of times the Deep Learning models see the training data.
    - Batch size: Number of samples to include in each batch. A sample is a time window of length 2 x forecasting horizon
    - Scale number of batches: Automatically adjust the number of batches per epoch to the training data size to statistically cover all the training data in each epoch
      - Example: 10 time series of length 10000 will give 209 batches per epoch with a batch size of 32 and a forecasting horizon of 15.
    - Number of batches per epoch: Use this to set a fixed number of batches per epoch to ensure the training time does not increase with the dataset size.
  - Expert – Customize algorithms: Set additional keywords arguments to each algorithm

Evaluation

Split to evaluate performance metrics. The final model will be retrained on the entire sample.

Splitting strategy: Choose between:
- Time-based Split: Evaluate on the last Forecasting horizon values
- Time series cross-validation: Evaluate the forecast predictions on rolling windows
  - Models are trained multiple times on expanding rolling windows datasets and metrics computed using the last Forecasting horizon values of each window. The final metrics are averaged over all rolling windows. Two additional parameters must be set:
  - Number of rolling windows: Number of splits used in the training set. Higher values increase runtime.
  - Cutoff period: Number of time steps between each split. If -1, Horizon / 2 is used.

Advanced

Add external features: To add numeric features for exogenous time-dependent factors (e.g., holidays, special events).
- External feature columns:
  - Be careful that future values of external features will be required to forecast.
  - You should only use this parameter for features that you know about in advance, e.g., holidays, special events, promotions.
  - If you have features you would like to include in your models but which you do NOT know about in advance, e.g., the weather, we recommend either:
    - Including these features as Target columns to forecast
    - Using external forecasting data providers, e.g. weather forecasting APIs
  - Note that external features are only usable by AutoARIMA, DeepAR, Transformer, and MQ-CNN algorithms.
If you have installed the GPU version of the plugin, please refer to this section on GPU-specific parameters.