Data preparation can be accessed in two parts in DSS:
- In the Lab, through the Visual analysis. This is used to iterate between preparation, visualization and machine learning.
- In a Prepare recipe in the Flow, used to create a new dataet
More information is available in The Lab and The Flow.
Both give you access to the same set of Interactive data preparation features. However, their goal is different:
In a visual analysis, preparation is used for cleaning and enriching your data, in order to explore it in details, analyze the content of columns, make charts based on prepared data, or train machine learning models
In a prepare recipe, preparation is used for cleaning and enriching your data, in order to build a new "clean" dataset.
In addition, you can Deploy the script of a Visual analysis, which gives a new Prepare recipe. When you Deploy a visual analysis, the new prepare recipe is independent from the analysis: they become separate objects, which are not linked anymore.
Which one should I choose ?
When you are in front of a dataset that you want to prepare, it can be quite difficult to choose between using a Visual Analysis or a Prepare recipe.
If your goal is to prepare in order to create a machine learning model, use a visual analysis
If your goal is to study your dataset, analyze the prepared data within it, or build charts on the prepared data, but do not plan on keeping the prepared data, use a visual analysis
If your main goal is to create a new dataset, both choices are possible:
- Either create a prepare recipe directly and build your script
- Or create a visual analysis, build your script and then deploy your visual analysis to a new prepare recipe.
While the interactive data preparation features are the same, the main difference is that in a visual analysis, you can make charts on the prepared data, but not directly in a prepare recipe (you have to create charts on the target dataset, but you'll have to build it first).
If you are pretty confident about what you need to do on your dataset, and don't need interactive charting while you build your script, then choose a preparation recipe: this will save you one step.
If you are discovering your dataset, and need the interactive charting capabilities as guide for building your preparation, then create a visual analysis, build your script, and then deploy your visual analysis as a new prepare recipe.