Events aggregator Plugin

The EventsAggregator plugin generates aggregated features on a dataset that contains events (i.e. with a date column and some additional features). The generated features can be used in order to train machine learning algorithms.

For example, if you are an e-commerce website and you want to predict churn, the plugin will generate features per customer from a dataset of past orders (example of columns: timestamp, customer id, product category, product price), such as the frequency of past orders, the percentage of purchases per category. Another example is fraud detection where the plugin can generate features per customer for each event on past events .

The plugin can generate more than 20 different types of aggregated features, such as the frequency of the events, recency of the events, the distribution of the features, etc., over several time windows and populations that you can define.

Plugin information

Version0.2.0
AuthorDataiku (Du Phan, Joachim Zentici)
Released2018-08-01
Last updated2018-08-01
LicenseApache Software License
Source codeGithub
Reporting issuesGithub

How To Use

Each row of your input dataset should correspond to an event. This means it:

  • has a timestamp (i.e. has a data field)
  • belongs to a group, using one or several aggregation keys (eg: user_id, shop_id,...)
  • has a list of input features

Let's take an example. If you have two aggregation keys user_id and shop_id, a group is constituted by a combination of the variables.