Dataiku
Product
Plugins
Time Series Forecast (legacy)

Time Series Forecast (legacy)

Forecast univariate time series from year to hour frequency with R models.
⚠️ This plugin is now "legacy" and will be maintained only to fix bugs. For the latest features, we recommend using the new Forecast plugin.

Plugin information

Version	0.5.2
Author	Dataiku (Alex COMBESSIE)
Released	2019-01
Last updated	2021-09
License	MIT License
Source code	Github
Reporting issues	Github

⚠️ Starting with DSS version 11 this plugin is considered as “deprecated”, we recommend using the native time series forecasting features.

“Forecasting is required in many situations: deciding whether to build another power generation plant in the next five years requires forecasts of future demand; scheduling staff in a call centre next week requires forecasts of call volumes; stocking an inventory requires forecasts of stock requirements. Forecasts can be required several years in advance (for the case of capital investments), or only a few minutes beforehand (for telecommunication routing). Whatever the circumstances or time horizons involved, forecasting is an important aid to effective and efficient planning.”
— Hyndman, Rob J. and George Athanasopoulos

With this plugin, you will be able to forecast univariate time series from year to hour frequency with R models. It covers the full cycle of data cleaning, model training, evaluation, and prediction, through the following 3 recipes:

Clean time series: resample, aggregate and clean the time series from missing values and outliers
Train and evaluate forecasting models: Train forecasting models and evaluate their performance on historical data
Forecast future values and get historical residuals: Use trained forecasting models to predict future values and/or get historical residuals

This plugin works well when:

The training data consists of a single time series at the hour, day, week, month, or year frequency and fits in the server’s RAM.
The object to predict is the future of this time series.

This plugin does NOT work on narrow temporal dimensions (data must be at least at the hourly level) and does not provide signal processing techniques (Fourier Transform, …).

How to set up

As part of the installation process, the plugin will create a new R code environment. Hence, R must be installed and integrated with Dataiku on your machine prior to the installation. You may need to follow this documentation if that is not the case.

Note that the plugin requires at least R 3.5 and that Anaconda R is not supported.

How to use

0. Clean time series (optional)

Resample, aggregate and clean the time series from missing values and outliers. This recipe is not required if your data is already resampled and has no missing values.

Input

Historical dataset

Output

Cleaned historical dataset

Cleaned time series dataset — Example of cleaned historical dataset

Settings

Input parameters

Time column: Column with date information in parsed date format. If your dates are not parsed, you can use the Parse date processor in a Prepare recipe.
Series columns: Columns with time series numeric values.

Resampling and aggregation

Frequency: This determines the amount of time between data points in the cleaned dataset.
Aggregation method: When multiple rows fall within the same time period, they are aggregated into the cleaned dataset either by summing (default) or averaging their values.

Missing values: Choose one of the following Imputation strategies

Interpolate (default) uses linear interpolation for non-seasonal series. For seasonal series, a robust seasonal trend decomposition is used. Linear interpolation is applied to the seasonally adjusted data, and then the seasonal component is added back.
Replace with average/median/fixed/previous value.
Do nothing.

Outliers: Choose one of the following Imputation strategies

Interpolate (default) uses the same technique as for missing values.
Replace with average/median/fixed/previous value.
Do nothing.

Outliers are detected by fitting a simple seasonal trend decomposition model using the tsclean method from the forecast package.