Genetic algorithm for feature engineering

Automatically creates combined features and selects them with a genetic algorithm based on the DEAP framework.

Genetic Algorithms are inspired by the concepts of evolution through natural selection. They are often used in high dimensional spaces where grid / random search would be prohibitive.

Genetic Algorithms encode the space to explore with genes and proceed by generations. For each generation:

  • individuals forming the current population are evaluated (fitness)
  • the best individuals are chosen to mix their genes together (crossover)
  • independent random changes are performed (mutation)

This plugins deals with feature creation and selection, powered by genetic algorithms. Starting from a dataset with features and a target, it will automatically select among features both from the dataset and their combinations (product, sum and differences). In this setting, an individual is represented by a boolean array with a value for every feature (originals and combinations) indicating whether it is selected or not as an input for the model to train.

Plugin Information

Version 0.0.1
Author Dataiku
Released 2018-08-01
Last updated 2018-08-01
License LGPL v3.0
Source code Github
Reporting issues Github

Get the Dataiku Data Sheet

Learn everything you ever wanted to know about Dataiku (but were afraid to ask), including detailed specifications on features and integrations.

Get the Data Sheet