In DSS, teams or users organize their datasets and associated tasks into separate projects:
The projects help managing:
- the functional separation of data assets and associated tasks
- the organisational security, thanks to per-project user access management.
Within a project, the DSS items that the user manipulates are accessed within 6 "universes" mapping the main concepts of DSS:
- Within each universe, related items are organized graphically or by lists
- Each universe is accessed by clicking on the corresponding top level icon in the navigation bar right next to the project title.
A Dataset (in the DSS sense) is a series of rows with the same data structure. The underlying data:
- can lie on various storage systems (file system, SQL database, Hadoop, etc) to which DSS is connected,
- can have an associated file format (CSV, JSON, Hadoop file formats, etc).
The DSS Dataset Abstraction Layer allows users to access, visualize and write the data in a unified way whatever the storage system.
Creating your first DSS dataset and learning how to cleanse it is the subject of the DSS 101 Tutorial.
Flow & Recipes
In DSS, the data manipulations (cleansing, aggregation, joining, etc) are performed within Recipes which take datasets as inputs and outputs.
The lineage of a dataset (or a model) is thus defined by the inputs and outputs of its ancestor recipes. The overall view of the dependency structure of a project is accessible in the Flow tab:
The knowledge of these dependencies helps the DSS engine minimizing the amount of data processes to be launched when (re)building a dataset.
Lab - Visual Analysis
Data Science in real life is full of dirty data. Data tacklers' daily tasks include:
- cleansing the data,
- creating features,
- building visualizations,
- creating multiple ML models together with their assessments.
Within the Lab, DSS provides a dedicated module called Visual Analysis to quickly iterate over these processes. This module allows data workers to experiment in a visual friendly manner various constructions prior to their deployment in the Flow in order to efficiently build their data driven applications.
We strongly invite you to follow the Tutorial 101 to discover how tackling data problems can be fluid within the DSS Analysis.
Lab - Notebooks
Some people like to do this iterative process using code. DSS is shipped with interactive development environments that are called Notebooks.
These can be used either
- to draft code in Python, R or SQL (including Hive and Impala),
- to create some advanced reports mixing text and complex visualizations using Python or R.
Jobs & Scenarios
The secret to efficiently taking advantage of your data assets lies in the ability to (re)play the full pipeline of an analysis and to always have up-to-date predictive scoring.
The monitoring of the associated tasks is accessed in the jobs tab. Every time the reconstruction of a dataset is requested, DSS creates new Job with all the build dependency information defined in the Flow.
Scenarios help you automate these reconstruction tasks, for example running daily updates to your models. Reports on scenarios that ran previously and their results are shown in Monitoring.
Dashboard & Insights
The DSS Dashboard is a communication tools to organize, share or deliver the Insights on your data (charts, datasets, static reports, etc).
On the Dashboard, the team structures their findings and the final data consumers get their updated summary.
Users with Web coding skills can create advanced custom Web applications using our dedicated editor and REST API. Templates and code samples are provided to help you get started. Head to the dedicated howtos to learn more!