Data Preparation

The visual recipes

Visual recipes

Do you want to compute aggregations, join datasets, transfer data between sources, filter, split, or merge? This can all be achieved using the visual recipes:

Sync recipe Prepare recipe Sample/Filter recipe
Group recipe Window recipe Join with... recipe
Split recipe Stack recipe Push to editable recipe

Data wrangling

Visual data preparation

Dataiku DSS preparation scripts enable advanced data wrangling and instant visualizations.

A lot of features of Dataiku DSS will help you prepare your data quickly and efficiently. Read this how-to about visual data preparation: preprocessing, reshaping and enriching your data are all covered!

Use cases

Web logs

Learn how to enrich your datasets containing rich types by following our howto guide on enriching weblogs. We will cover geographic enrichment of IP adresses as well as user agent and URL parsing.

Merging and joining in a prepare recipe

To understand how merging and joining work, you can watch this free training video covering these concepts, and how they can be done from within a prepare recipe.

More contents on wrangling

Datasets schema and columns meaning

Data wrangling starts by understanding your columns’ properties such as name, comments, storage type, and business meaning. Make sure you understand all about the difference between storage type and meaning of your data.

Writing formulas

Handling dates

Parsing dates is a very common preprocessing step that you can chain with powerful processors, like extracting date components, enriching your data with holiday information…

Reshaping data

Some processors of the prepare recipe can be used to reshape your data.

Distributed execution

Visual preparation recipes can run distributed on Hadoop and Spark.