In Dataiku DSS, you organize datasets and associated tasks into separate projects:
Projects help you to manage:
Within a project, the items that you manipulate are accessed within 6 “universes” mapping the main Dataiku concepts:
A Dataset (in the Dataiku sense) is a series of rows with the same data structure. The underlying data:
The Dataiku DSS Dataset Abstraction Layer allows users to access, visualize and write the data in a unified way whatever the storage system.
Creating your first DSS dataset and learning how to cleanse it is the subject of the Basics Tutorial.
In DSS, data manipulation (cleansing, aggregation, joining, etc) is performed within Recipes which take datasets as inputs and outputs.
The lineage of a dataset (or a model) is thus defined by the inputs and outputs of its ancestor recipes. The overall view of the dependency structure of a project is accessible in the Flow tab:
The knowledge of these dependencies helps the DSS engine minimize the number of data processes to be launched when (re)building a dataset.
Data Science in real life is full of dirty data. Data tacklers’ daily tasks include:
Within the Lab, Dataiku DSS provides a dedicated module called Visual Analysis to quickly iterate over these tasks. Visual Analysis allows you to experiment with your data in a code-free environment prior to deployment in the Flow in order to efficiently build your data driven applications.
We strongly invite you to follow the Tutorial: Basics to discover how tackling data problems can be fluid within the DSS Analysis.
Some people prefer to do their analyses using code. Dataiku DSS is shipped with interactive development environments that are called Notebooks.
These can be used either
Users with Web coding skills can create advanced custom Web Apps using our dedicated editor and REST API. Templates and code samples are provided to help you get started. Head to the dedicated howtos to learn more!
The secret to efficiently taking advantage of your data assets lies in the ability to (re)play the full pipeline of an analysis and to always have up-to-date predictive scoring.
Monitoring associated tasks is accessed in the jobs tab. Every time you build a dataset, Dataiku DSS creates a new Job with all the build dependency information defined in the Flow.
Scenarios help you automate these reconstruction tasks; for example, running daily updates to your models. Reports on scenarios that ran previously and their results are shown in Monitoring.
The DSS Dashboard is a communication tools to organize, share or deliver the Insights on your data (charts, datasets, static reports, etc).
On the Dashboard, the team structures their findings and the final data consumers get their updated summary.