Data Preparation

The visual recipes

Visual recipes

Do you want to compute aggregations, join datasets, transfer data between sources, filter, split, or merge? This can all be achieved using the visual recipes:

Sync Prepare Sample/Filter
Group Distinct Window
Join with... Split Top N
Sort Pivot Stack
Push to editable Export to folder

This series of videos covers our visual recipes, each in about 5 minutes!

Data wrangling

Visual data preparation

Dataiku DSS preparation scripts enable advanced data wrangling and instant visualizations.

A lot of features of Dataiku DSS will help you prepare your data quickly and efficiently. Read this how-to about visual data preparation: preprocessing, reshaping and enriching your data are all covered!

Use cases

Web logs

Learn how to enrich your datasets containing rich types by following our howto guide on enriching weblogs. We will cover geographic enrichment of IP addresses as well as user agent and URL parsing.

Merging and joining in a prepare recipe

To understand how merging and joining work, you can watch this free training video covering these concepts, and how they can be done from within a prepare recipe.

More contents on wrangling

Datasets schema and columns meaning

Data wrangling starts by understanding your columns’ properties such as name, comments, storage type, and business meaning. Make sure you understand all about the difference between storage type and meaning of your data.

Writing formulas

Handling dates

Parsing dates is a very common preprocessing step that you can chain with powerful processors, like extracting date components, enriching your data with holiday information…

Reshaping data

Tabular data is typically stored in long or wide format. Reshaping data is the act of converting from one format to the other.

The Pivot recipe reshapes a dataset from long to wide format.

Some processors of the Prepare recipe can be used to reshape your data.

Distributed execution

Visual preparation recipes can run distributed on Hadoop and Spark.