The Forecast plugin provides visual recipes in Dataiku DSS to work on time series data to solve forecasting problems.
“Forecasting is required in many situations: deciding whether to build another power generation plant in the next five years requires forecasts of future demand; scheduling staff in a call centre next week requires forecasts of call volumes; stocking an inventory requires forecasts of stock requirements. Forecasts can be required several years in advance (for the case of capital investments), or only a few minutes beforehand (for telecommunication routing). Whatever the circumstances or time horizons involved, forecasting is an important aid to effective and efficient planning.”
-- Hyndman, Rob J. and George Athanasopoulos
Example of time series forecasting at the week level in Dataiku DSS.
|Author||Dataiku (Alexandre Combessie)|
This plugin offers a set of 3 visual recipes to forecast yearly to hourly time series. It covers the full cycle of data cleaning, model training, evaluation and prediction.
It follows classic forecasting methods, as described in Hyndman, Rob J., and George Athanasopoulos (Forecasting: principles and practice. OTexts, 2018) and in Taylor, Sean J., and Benjamin Letham (Forecasting at Scale, The American Statistician, 2018).
This plugin does NOT work on narrow temporal dimensions (data must be at least at the hourly level) and does not provide signal processing techniques (Fourier Transform…).
This plugin works well when:
Forecasting is slightly different from "classic" Machine Learning (ML) as currently available visually in Dataiku. It is mainly different because:
The plugin can be installed from the Plugin Store or via the zip download (see above).
Note that the plugin uses an R code environment so R must be installed and integrated with Dataiku on your machine (version 3.5.0 or above).
You may encounter issues with the installation of the RStan package in the code environment on some operating systems. RStan has some system-level dependencies that may require additional setup. In this case, please see the RStan Getting Started wiki.
Use this recipe to aggregate, resample, and clean missing values and outliers from the time series.
Dataset containing time series.
Dataset containing cleaned time series.
Use this recipe to train forecasting models and evaluate them on cleaned historical data.
Dataset with time series data (ideally the output of the Clean recipe)
Folder containing the forecast R objects.
Dataset containing the evaluation results.
The following models are available in the recipe:
Use this recipe to predict future values or produce historic residuals using a previously trained model.
Folder containing forecast R objects (from the Train and Evaluate recipe).
Dataset containing evaluation results (from the Train and Evaluate recipe).
Dataset containing forecasts.
Model Selection. Choose how to select the model used for prediction: Automatic if you want to select the best model according to an error metric computed in the evaluation dataset input; Manual to select a model yourself.
Prediction. Choose whether you want to include the history, the forecast, or both. If you are including the forecast, specify the horizon and the probability percentage for the confidence interval.
The output dataset is a good candidate for the user to build charts to visually inspect the forecast results. Please see examples of such charts below: