Do you want to compute aggregations, join datasets, transfer data between sources, filter, split, or merge? This can all be achieved using the visual recipes:
|Join with...||Split||Top N|
|Push to editable||Export to folder|
Dataiku DSS preparation scripts enable advanced data wrangling and instant visualizations.
A lot of features of Dataiku DSS will help you prepare your data quickly and efficiently. Read this how-to about visual data preparation: preprocessing, reshaping and enriching your data are all covered!
Learn how to enrich your datasets containing rich types by following our howto guide on enriching weblogs. We will cover geographic enrichment of IP addresses as well as user agent and URL parsing.
To understand how merging and joining work, you can watch this free training video covering these concepts, and how they can be done from within a prepare recipe.
Data wrangling starts by understanding your columns’ properties such as name, comments, storage type, and business meaning. Make sure you understand all about the difference between storage type and meaning of your data.
Parsing dates is a very common preprocessing step that you can chain with powerful processors, like extracting date components, enriching your data with holiday information…
Tabular data is typically stored in long or wide format. Reshaping data is the act of converting from one format to the other.
The Pivot recipe reshapes a dataset from long to wide format.
Some processors of the Prepare recipe can be used to reshape your data.
Visual preparation recipes can run distributed on Hadoop and Spark.