Data Vorbereitung
Concepts
Data Preparation Walkthrough
The Visual Recipes
Visual recipes
Do you want to compute aggregations, join datasets, transfer data between sources, filter, split, or merge? This can all be achieved using the visual recipes:
Sync, Prepare, Sample/Filter, Group, Distinct, Window, Join with…, Split, Top N, Sort, Pivot, Stack, Push to editable, Export to folder, Stack.
This course covers our visual recipes in detail!
Data Wrangling
Visual data preparation
Dataiku DSS preparation scripts enable advanced data wrangling and instant visualizations.
- Discover how this works by following our tutorial.
- Check out the whole list of processors available in that environment.
- See for standard wrangling tasks.
Use Cases
Web logs
Learn how to enrich your datasets containing rich types by following our howto guide on enriching weblogs. We will cover geographic enrichment of IP addresses as well as user agent and URL parsing.
Merging and joining in a prepare recipe
To understand advanced joins in the prepare recipe, you can read this tutorial covering these concepts.
More Content on Data Wrangling
Datasets schema and columns meaning
Data wrangling starts by understanding your columns’ properties such as name, comments, storage type, and business meaning. Make sure you understand all about the difference between storage type and meaning of your data.
Writing formulas
You can find the description of all functions of the formula language in our reference documentation. If you are editing a formula in DSS, you can find this reference in the editor from the reference tab.
Handling dates
Parsing dates is a very common preprocessing step that you can chain with powerful processors, like extracting date components, enriching your data with holiday information…
Read the reference documentation of the Parse date processor
Reshaping data
Tabular data is typically stored in long or wide format. Reshaping data is the act of converting from one format to the other.
The Pivot recipe reshapes a dataset from long to wide format.
- Read how to use the Pivot recipe to create pivot tables
- Read how to use the Pivot recipe to pivot by values
Some processors of the Prepare recipe can be used to reshape your data.
- Reference documentation for these processors can be found here.
Distributed execution
Visual preparation recipes can run distributed on Hadoop and Spark.
- Read this documentation to see how to activate this option, and run your data prep using the processing power of your cluster!