The Dataiku DSS Visual Grammar

Applies to DSS 2.3 and above | February 05, 2016

Dataiku DSS covers a very large variety of data transformation and manipulation work.

It offers a map of the data science worlds in simple, recognizable visual elements.

Let’s look at everyday job of data workers:

  • They load their data into various databases
  • They create processing operations that overlap each other
  • They create displays and share them with business experts for validation
  • They create models and put them into production
  • They monitor the predictive application

All the visual grammar of DSS is made to reflect these business operations through structural graphics.

For example, let’s look at the following application that predicts a churn score for telecom users. It is represented by a pipeline of datasets:

A complete Dataiku Flow for predicting churn scores for telecom users

  • In yellow, we have the visual transformation steps. The icons illustrate the nature of this transformation (join, group, preparation, etc.). In this case, they are all the same: preparation.

  • In blue, we have the datasets. The icons illustrate the nature of the database in which the associated data lives (hadoop, SQL database, filesystem, remote ftp, etc.).

  • In green, we have the machine learning elements. The icons represent the step (model training, prediction or scoring).

Note that transformations are in circular elements while persistable elements (data, models) are in square elements (or diamonds like diamond Shreddies?).

Here’s another pipeline for a log analysis application:

A complete Dataiku flow for analyzing logs

This time the dominant color is orange: the color of the code. The icons of these transformations tell us that the languages used are Pig (the little pig’s head) and Hive (beehive). The datasets live on Hadoop (the elephant).

Now let’s select one of the datasets in the pipeline. A bar populated with icons appears on the right.

The sidebar showing actions that can be taken on a selected dataset

Here, there are several colors:

  • Yellow, these are the data recipes called visual recipes. These recipes are used to operate the most common transformations without coding, with a graphical user interface.

  • In orange, there are the different languages which can be used with the studio. We see that Python and R are dark orange. They are active. The dataset selected has its data living on the file system. So we can not use SQL, Hive or Pig unless we copy this data into an appropriate database. Had they been in hadoop, all languages would have been active.

  • In red are the items that can be used for communication, such as charts that can be put on a dashboard.

  • At the top, the gray icons present the most standard actions related to this dataset: explore, download, (re)build, etc.

  • The lab icon for visual or code-based exploration and analysis

The bar on the right is available throughout the studio and guides the user by showing the list of actions that are available from the item they are displaying. It makes the user’s everyday life easier by filtering out the choices that would be inappropriate and enables new users to be guided while they are building their first predictive application.

In addition to the data pipeline, the objects, Dataiku DSS tools are stored in worlds that follow also these same colorful conventions. Each world is materialized in the top bar next to the project name for an icon:

The Dataiku toolbar

Here, we find:

The Dataiku toolbar with the Flow icon highlighted The world of the data pipeline with transformation recipes.
The Dataiku toolbar with the Datasets icon highlighted The world of datasets.
The Dataiku toolbar with the Analysis icon highlighted The world of visual analyses in which predictive models are built.
The Dataiku toolbar with the Notebooks icon highlighted The world of code notebooks (Python, R, SQL).
The Dataiku toolbar with the Job Monitoring icon highlighted The world of application monitoring (a new color!)
The Dataiku toolbar with the Dashboard icon highlighted The world where you store all shared items at the end of communications between team members on a dashboard.

And there you go. It’s just as colorful as the map of subway lines but everyone can find their way without having the station list in front of them!

Happy travels with the Dataiku DSS! Download (free) is available here.