The Flow is the visual representation of a data pipeline, showing datasets and their transformations (through what we call Recipes). Thanks to the Flow, Dataiku is aware of the dependencies each dataset has and can optimally rebuild datasets whenever one of the parent datasets or recipes has been modified. Note that the dependency of the data can be set using a finer granularity than the dataset using partitions to minimize computation time.
Click on the build icon in the Actions menu. A modal appears to set the build options:
The first option will run the recipe outputting the dataset. This is used when one wants to recompute only the current dataset for example after modifying the recipe or the dataset schema. It does not take into account any changes upstream of the dataset.
The second option “Recursive” will recompute whatever is necessary upstream, depending on what has changed recently. Three sub-options are available :
When one wants to pick more precisely which upstream datasets to build or not, she can set an option above and click on the Preview button.
The dependencies graph will then be analysed to compute the precise activities required. The user will then be able disable the unecessary processes one by one either in the list of activities or in the graph view.
When one wants to build all the datasets of a project, she can click on the build all button at the bottom right of the flow or right-click on a dataset in the flow and select the option rebuild from here in the menu Tools.
Note: Using the rebuild from here tool, the datasets required to built those final datasets may include datasets upstream of the original one.
Note that you can schedule or automate reconstruction using complex triggers thanks to our scenario system. Please refer to the Automation portal for more details.