In DSS, data analysts manipulate and interact with datasets, in projects.
Within a project, there are two distinct parts:
The Lab is where you experiment, iterate, analyze, explore.
Work done in the lab is “not in production”. It is iterative work by nature, representative of the exploration part of the work of a data scientist.
DSS offers two kinds of environments to work in the Lab:
To work in the Lab, click on the “Lab” button. The Lab button is available:
The Lab window opens
You can then either start a new visual analysis, a new code-based or notebook, or go back to an already-created element.
The Visual analysis is the main workspace for visual work in the Lab. It has three main functions:
Interactive data preparation, to prepare, clean, and enrich your data. For more information, see our Portla on data preparation
Data visualization, based directly on the prepared data.
Training Machine Learning models, based directly on the prepared data. For more information, see our Portal on machine learning
Code notebooks let you explore and analyze your datasets through interactive code environments
Python and R support is provided via an integrated Jupyter Notebook within DSS.
In DSS, the persistent data manipulations (cleansing, aggregation, joining, etc) are performed within recipes which take datasets as inputs and outputs.
The lineage of a dataset (or a model) is thus defined by the inputs and outputs of its ancestors recipes. The overall view of the dependency structure of a project is accessible in the Flow tab:
The knowledge of these dependencies helps the DSS engine to minimize the amount of data processes to be launched when (re)building a dataset.
The Flow can be considered as everything which is “in production”, “active” in DSS, compared to the “experimentation” in the Lab.
The main building blocks of the Flow are thus: