Dataiku DSS is an on-premises/on-cloud software product (not SaaS) that operates as part of your data stack’s existing infrastructure.
A data stack typically includes development, production, and deployment environments; in order to work across these environments, a separate instance of Dataiku DSS is installed in each environment.
A Dataiku DSS instance is an installation of the product to serve the needs of a particular environment:
Pipelines in the Design and Automation nodes are organized into projects, which can be accessed from the main page after logging in to the Dataiku DSS instance.
A Dataiku DSS project is a container for all your work on a particular activity. The project home acts as the command center from which you can see the overall status of a project, view recent activity, and collaborate through comments, tags, and a project to-do list.
A Dataiku DSS dataset is a tabular view into your data that allows you to access, visualize and write data in the same way, regardless of the underlying storage system. You can connect to a variety of storage systems (file system, SQL database, Hadoop, etc), and file formats (CSV, JSON, Hadoop file formats, etc).
Creating your first DSS dataset and learning how to cleanse it is the subject of the Basics Tutorial.
A Dataiku DSS recipe is a set of actions to perform on one or more input datasets, resulting in one or more output datasets. Each time you prepare, join, group… your datasets, this will be through a recipe. A recipe can be visual or code.
Dataiku allows “coders” and “clickers” to seamlessly collaborate on the same project through code and visual recipes.
The lineage of a dataset (or a model) is thus defined by the inputs and outputs of its ancestor recipes. The Flow is a visual representation of your work as a set of dependencies between datasets and the recipes used to produce them.
The knowledge of these dependencies helps the Dataiku DSS engine minimize the number of data processes to be launched when (re)building a dataset.
The Visual Analysis lab allows you to experiment with your data in a code-free environment where you can:
We strongly invite you to follow the Tutorial: Basics to discover how tackling data problems can be fluid within the DSS Analysis.
The code lab allows you to experiment with your data in Jupyter notebooks (for Python / R) or SQL notebooks when working with SQL DB’s, Hive, or Impala. You can perform interactive analysis in these notebooks and then deploy them to the Flow as code recipes.
The code lab also provides the ability to create some advanced R Markdown reports that mix text, code, and complex visualizations using Python and R. These reports can be shared on dashboards or distributed in various printable formats.
Jobs are created when you build a dataset. Dataiku DSS provides a full job log to let you monitor what works and what does not, along with the ability to debug potential errors.
Scenarios help you automate reconstruction tasks; for example, running daily updates to your models to always have up-to-date predictive scoring. Reports on scenarios that ran previously and their results are shown in Monitoring.
The Dashboard is a communication tool to organize, share, and deliver the Insights of your data project. Insights can include any Dataiku DSS object, such as charts, datasets, web apps, and reports.