Recipes to clean, train models on, and predict time series data.

The Forecast plugin provides visual recipes in Dataiku DSS to work on time series data to solve forecasting problems.

Forecasting is required in many situations: deciding whether to build another power generation plant in the next five years requires forecasts of future demand; scheduling staff in a call centre next week requires forecasts of call volumes; stocking an inventory requires forecasts of stock requirements. Forecasts can be required several years in advance (for the case of capital investments), or only a few minutes beforehand (for telecommunication routing). Whatever the circumstances or time horizons involved, forecasting is an important aid to effective and efficient planning.
— Hyndman, Rob J. and George Athanasopoulos

Dataiku Forecast plugin

Example of time series forecasting at the week level in Dataiku DSS.

Plugin Information


Version 0.4.0
Author Dataiku (Alexandre Combessie)
Released 2019-01-22
Last updated 2020-01-01
License MIT License
Source code Github
Reporting issues Github


This plugin offers a set of 3 visual recipes to forecast yearly to hourly time series. It covers the full cycle of data cleaning, model training, evaluation and prediction.

  • Cleaning, aggregation, and resampling of time series data, i.e. data of one or several values measured over time (Recipe)
  • Training of forecasting models of time series data, and evaluation of these models (Recipe)
  • Predicting future values based on trained models (Recipe)

It follows classic forecasting methods, as described in Hyndman, Rob J., and George Athanasopoulos (Forecasting: principles and practice. OTexts, 2018) and in Taylor, Sean J., and Benjamin Letham (Forecasting at Scale, The American Statistician, 2018).

This plugin does NOT work on narrow temporal dimensions (data must be at least at the hourly level) and does not provide signal processing techniques (Fourier Transform…).

This plugin works well when:

  • The training data consists of one or multiple time series at the hour, day, week, month or year level and fits in server’s RAM.
  • The object to predict is the future of one of these time series.

Forecasting is slightly different from “classic” Machine Learning (ML) as currently available visually in Dataiku, because:

  • Forecast models output multiple values whereas one Visual ML analysis is designed to predict a single output.
  • Open source implementations of forecast models are different from the Python/Scala models available in Visual ML.
  • Evaluation of forecast accuracy uses specific methods (errors across a forecast horizon, cross-validation) which are not currently available in Visual ML.

Installation Notes

The plugin can be installed from the Plugin Store or via the zip download (see above).

Note that the plugin uses an R code environment so R must be installed and integrated with Dataiku on your machine (version 3.5.0 or above). Anaconda R is not supported.

Support of Prophet models has been removed since version 0.3.0 of the plugin to avoid issues with the installation of the plugin code-environment. The underlying problem is that Prophet relies on the RStan package, which has dependencies that require additional setup at the operating system level.

It is still possible to use Prophet models by installing the plugin from this folder on GitHub. In this case, please refer to the RStan Getting Started wiki for setup.

How To Use

Clean recipe

Use this recipe to aggregate, resample, and clean missing values and outliers from the time series.
Dataset containing time series.
Dataset containing cleaned time series.

Clean recipe



Input Data

  • Time column: The column with time information in Dataiku date format (may need parsing beforehand in a Prepare recipe)
  • Series columns: The columns with time series numeric values.

Resampling and Aggregation

  • Time granularity: This determines the amount of time between data points in the cleaned dataset.
  • Aggregation method: When multiple rows fall within the same time period, they are aggregated into the cleaned dataset either by summing (default) or averaging their values.

Missing Values: Choose one of the following imputation strategies:

  • Interpolate (default) uses linear interpolation for non-seasonal series. For seasonal series, a robust STL decomposition is used. A linear interpolation is applied to the seasonally adjusted data, and then the seasonal component is added back.
  • Replace with average/median/fixed value.
  • Do nothing.

Outliers: Choose one of the following imputation strategies:

  • Interpolate (default) uses the same technique as for missing values.
  • Replace with average/median/fixed value.
  • Do nothing.

Outliers are detected by fitting a simple seasonal trend decomposition model using the tsclean method from the forecast package.

Train and Evaluate recipe

Use this recipe to train forecasting models and evaluate them on cleaned historical data.
Dataset with time series data (ideally the output of the Clean recipe)
Folder containing the forecast R objects.
Dataset containing the evaluation results.

Train and Evaluate recipe



Input Data

  • Time column: The column with time information in Dataiku date format
  • Target series column: The time series you want to predict.
  • Target series column: The time series you want to predict.
  • Feature columns (optional): Columns with external numeric regressors, for instance indicators of holidays. Note that future values of these regressors are required when forecasting.


  • Automated mode. Select which models to train. By default we only try two model types: Baseline and Neural Network, as they converged for all the datasets used in our benchmarks. You may select more models, but be aware that some model types take more time to compute, or may fail to converge on datasets. In the latter case, you will get an error when running the recipe, telling you which model type to deactivate.The following models are available in the recipe:
    • Baseline
    • Neural Network (can use external regressors)
    • ARIMA (can use external regressors)
    • Seasonal Trend
    • Exponential Smoothing
    • State Space
  • Expert mode. Gives access to advanced parameters that are custom to each model type. For details, see the forecast R package documentations.

Error Evaluation

  • Train/Test Split (default): Train/test split where the test set consists of the last *H* values of the time series. You can change H with the Horizon parameter. The models will be retrained on the train split and evaluated on their errors on the test split, for the entire forecast horizon.
  • Time Series Cross-validation:Time series method to split your dataset into multiple rolling train/test splits. The models will be retrained and evaluated on their errors for each split. Error metrics are then averaged across all splits. Each split is defined by a cutoff date: the train split is all data before or at the cutoff date, and the test split is the H values after cutoff. H is the Horizon parameter, same as for the Train/Test Split strategy. Cutoffs are made at regular intervals according to the Cutoff period parameter, but cannot be before the Initial training parameter. Both parameters are expressed in number of time steps. Having a large enough initial training set guarantees that the models trained on the first splits have enough data to converge. You may want to increase that parameter if you encounter model convergence errors.
    The exact method used for cross-validation is described in the Prophet documentation and explained in a simpler version by Hyndman.
    Note that Cross-Validation takes more time to compute since it involves as multiple retraining and evaluation of models. In contrast, the Train/Test Split strategy only requires one retraining and evaluation. In order to alleviate that problem, we implemented retraining so that models are refit to each training split but hyperparameters are not re-estimated. This is done on purpose to accelerate computation.

Predict recipe

Use this recipe to predict future values or produce historic residuals using a previously trained model.
Folder containing forecast R objects (from the Train and Evaluate recipe).
Dataset containing evaluation results (from the Train and Evaluate recipe).
(Optional) Dataset containing the future values of the external regressors, if you have used the “Feature columns” parameter of the Train and Evaluate recipe.
Dataset containing forecasts and/or historical residuals.

Predict recipe


Model Selection: Choose how to select the model used for prediction:

  • Automatic: if you want to select the best model according to an error metric computed in the evaluation dataset input.
  • Manual: if you want to select the model yourself.

Prediction: Choose whether you want to include the history, the forecast, or both. If you are including the forecast, specify the Horizon and the probability percentage for the Confidence interval. If you are including the history, please note that residuals are equal to the historical value minus the one-step forecast.

The output dataset is a good candidate for the user to build charts to visually inspect the forecast results. Please see examples of such charts below:

Advanced Usages

Forecasts by Entity a.k.a. Partitioned Forecasts

If you want run the recipes to get one forecast for each entity (e.g. for each product or store), you will need partitioning. That requires to have all datasets partitioned by 1 dimension for the category, using the discrete dimension feature in Dataiku. If the input data is not partitioned, you can use a Sync recipe to repartition it, as explained in this article.

Get the Dataiku Data Sheet

Learn everything you ever wanted to know about Dataiku (but were afraid to ask), including detailed specifications on features and integrations.

Get the data sheet