What's new

Highlights of the latest Dataiku DSS releases.
More details in our release notes

Scala V 3.1 - July 2016

DSS goes Spark-native with the addition of Scala! Scala is the most native language for the Spark ecosystem. It is the only language in which Spark users can write very fast User-Defined functions to work on Dataframes.

DSS 3.1 includes a Spark-Scala recipe and an interactive notebook. The recipe features automatic code validation.

H2O (Sparkling Water) integration V 3.1 - July 2016

H2O is a distributed machine-learning library, with a wie range of algorithms and methods.
DSS now includes full support for H2O (in its "Sparkling Water" variant) in its visual machine learning interface.

You can now create H2O models with absolutely no code required:

  • Deep Learning models
  • Generalized Linear Models
  • Gradient boosting
  • Random forests
  • Naive Bayes models

DSS 3.1 also includes support for in-database machine learning on Vertica, using Vertica's Advanced Analytics package

Navigator V 3.1 - July 2016

Boost your productivity! You can now very quickly navigate from a DSS object to another (from recipe to dataset to another recipe to model to analysis ...).

Hit Shift+A on any screen to enter the navigator.

New machine learning visualizations V 3.1 - July 2016

Getter better insights into your models. DSS 3.1 now includes visualization of trees for decision tree, random forest and gradient boosting algorithms.
It also includes visualization of partial dependency plots for gradient boosting algorithms.

And much more ... V 3.1 - July 2016

DSS 3.1 is a tremendous release for DSS. Among the major other new features:

  • an improved DSS home page with the ability to define custom project status and workflow
  • prebuilt notebook templates with support for PCA, correlations analysis, time series analytics
  • support for Netezza, SAP HANA and Google BigQuery
  • more support for custom algorithms in machine learning
  • better preprocessing options in machine learning
  • the ability to read unlimited Excel files
  • and many others!

Find all details in our release notes

Metrics & Checks V 3.0 - May 2016

You can now track various advanced metrics about datasets, recipes, models and managed folders (size of a dataset, average of a column in a dataset, model performance metrics, ...). You can also define custom metrics using Python or SQL.

Metrics are historized for deep insights into the evolution of your data flow and can be fully accessed through the DSS APIs.

Then, you can define automatic data checks based on these metrics, that act as automatic sanity tests of your data pipeline. For example, automatically fail a job if the average value of a column has drifted by more than 10% since the previous week.

Version control & activity V 3.0 - May 2016

Version control (based on git) is now integrated much more tightly in DSS

  • View the history of your project, recipes, scenarios, ... from the UI
  • Write your own commit messages
  • Auto or explicit commit modes

Plus, we've added a lot of team activity dashboards to follow what's going on in your data projects.

And much more ... V 3.0 - May 2016

DSS 3.0 is one of our biggest releases ever. We did not even mention the new public APIs, administrator monitoring dashboards, the new security options, improvements to project export, ...

Find all details in our release notes

Visual preparation, reloaded V2.3 - February 2016

The visual interactive data preparation DSS has received a huge overhaul. It is now even easier and more productive to clean and enrich your datasets.

  • Color the cells of the data table: locate co-occurences easily
  • Group, color and comment steps
  • New Quick Columns View for immediate overview of all your data
  • Much improved Formula and Python processors, for more advanced transformations
  • Users can now define their own meanings
  • And much more, see our release notes

Flow Views V2.3 - February 2016

The Flow views system is an advanced productivity tool, especially designed for large projects. It provides additional "layers" onto the Flow

  • Color by tag and propagate tags
  • Recursively check and propagate schema changes accorss a Flow
  • Check the consistency of datasets and recipes accorss a project

Data Catalog V2.3 - February 2016

Since the very first versions, DSS let you search within your project.
Thanks to the new Data Catalog, you now have an extremely powerful instance-wide search. Even if you have dozens of projects, you'll be able to find easily all DSS objects, with deep search (search a dataset by column name, a recipe by comments in the code, …).

New SQL / Hive / Impala notebook V2.3 - February 2016

The SQL / Hive / Impala notebook now features a "multi-cells" mechanism that lets you work on several queries at once, without having to juggle several notebooks or search endlessly in the history.

You can also now add comments and descriptions, to transform your SQL notebooks into real data stories.

Prediction API server V2.2 - November 2015

DSS now features a real-time API server for predictions.

Using the REST API of the DSS API node, you can request predictions for new previously-unseen records in real time.
The DSS API node provides high availability and scalability for scoring of records.

Thanks to its advanced features, the DSS API node is at the heart of the feedback and improvement loop of your predictive models

Window analytics recipe V2.2 - November 2015

Window functions are one of the most powerful features of SQL (and SQL-like: Hive, Impala, Spark, ...) databases. They are also one of the least known and most tricky to use when using the SQL language.

The Window visual recipe in DSS makes it extremely easy to leverage these features. Thanks to this, you'll be able to do things like:

  • Filter rows by order of appearance within a group
  • Compute moving averages, cumulative sums, ...
  • Compute the number of events occured in the 7 days prior to another event

This recipe makes it easy to create your window functions without writing code, and also brings advanced productivity for advanced users (mass actions, multiple windows, pre and post filters, ...)

And much more in DSS 2.2 ! V2.2 - November 2015

In addition to these major features, DSS 2.2 improves virtually every module of DSS:

  • The plugins system has been vastly enhanced
  • A new system for long-running tasks, improving reliability
  • Better support for dates without timezones
  • New APIs let you automate even more parts of DSS

Read our Release notes for more information

Spark (SparkSQL, MLLib, PySpark and SparkR) V2.1 - September 2015

DSS now includes full integration with Apache Spark, the next-generation distributed analytic framework. Spark brings blazingly fast in-memory processing to all kind of data.

The integration of Spark in DSS 2.1 is pervasive and extends to all of the following features, which are now Spark-enabled:

  • Visual data preparation
  • "VisualSQL" recipes (Grouping, Joining, Stacking)
  • Guided machine learning in analysis
  • Training and prediction in Flow
  • PySpark recipe
  • SparkR recipe
  • SparkSQL recipe
  • PySpark-enabled notebook
  • SparkR-enabled notebook

All DSS data sources can be handled using Spark. As always in DSS, you can mix technologies freely, both Spark-enabled and traditional

Plugins V2.1 - September 2015

Plugins let you extend the features of DSS. You can add new kinds of datasets, recipes, visual preparation processors, custom formula functions, and more.

Plugins can be downloaded from the official Dataiku community site, or created by you and shared with your team.

Improved charts V2.1 - September 2015

The charts module has been completely redesigned with many new features and chart types added:

  • Scatterplots
  • Horizonal bar charts
  • Pie and donuts charts
  • Boxplots
  • Pivot tables (text view)
  • Geo scatter map
  • Geo grid map
  • 2D distribution chart
  • A brand new user experience
  • New drill-down and exclude features

Managed Folder and editable dataset V2.1 - September 2015

DSS comes with a large number of supported formats, machine learning engines, ... But sometimes you want to do more. DSS code recipes (Python and R) can now read and write from "Managed Folders", handles on filesystem-hosted folders, where you can store any kind of data.

Editable datasets are a new kind of dataset in DSS, which you can directly create and modify in the DSS UI, ala Excel or Google Spreadsheets.

Shareable code snippets V2.1 - September 2015

In all modules of DSS where you can write code, you now have the ability to insert code snippets. DSS comes builtin with lots of useful snippets, and you can also write your own and share them with your team.

And much more in DSS 2.1 ! V2.1 - September 2015

DSS 2.1 is a truly amazing release, and we have only started to cover what's new:

  • A REST public API
  • New Jupyter notebook and R API
  • Impala recipe
  • Shell recipe

See our Release notes

New User Experience V2.0 - May 2015

The user experience of DSS has been redesigned and new helper tools added.

  • Thanks to the organization in universes, you’ll always find what you need at your fingertips.
  • The new sidebar gives you immediate access to all actions in context.
  • A redesigned search that gives you immediate access to your recent items and contextually-relevant objects
  • The streamlined Flow lets you focus on what matters most and reduces visual clutter
  • Use checklists to organize your collaborative work in projects

Vastly improved machine learning V2.0 - May 2015

DSS 2.0 brings a completely redesigned machine learning interface, that will let you create both highly performing supervised and unsupervised models. Major enhancements include:

  • tight integration with data preparation: the modeling process being highly iterative, DSS lets you now automatically take into account new variables created in a data preparation Script in your Models
  • more algorithms: Decision Trees and Gradient Boosted Trees for classification. You can keep on using your own custom models as long as they have an scikit-learn compatible API.
  • advanced cross validation: use several strategies to train your models on one dataset and test them on another one
  • enhanced features preprocessing : Feature hashing for high cardinality categorical features. Binarization and quantile-cut for numerical features. Hashing, vectorization and TF/IDF for text features
  • vastly improved results screens: DSS now offers more helpers to interpret the results of your models, including coefficients of linear models. A new Decision Chart view has been added (an “expanded” view of the confusion matrix for different probability cut-offs), as well as a Density Chart to explore the probability distribution of your mode
  • easier deployment process: models can be easily used inside a workflow to score new records, and they can also be retrained on the fly, as your datasets are evolving. DSS will let you see the history of trained models, and select which version should be used in your workflow.

New visual recipes V2.0 - May 2015

Because writing long SQL code can be tedious and sometimes error prone, and also because not everybody wants to even write code, DSS comes with 5 new visual “recipes” that will help you prepare your data faster.

  • Group: quickly generate summarized statistics from the columns of your dataset according to one or several grouping keys (a.k.a group by operations in SQL). No more need to write long and error prone queries: define your grouping keys, your aggregations (including custom expressions), and DSS will generate the underlying code for you, eventually pushing it down to your database or Hadoop cluster.
  • Join: an exciting new feature of DSS that helps you merge two or more datasets together. DSS will take you through the steps of visually defining your joins: input data, joining keys, type of join (inner, left…), list of columns to fetch, and even complex conditions such as fuzzy lookups
  • Split: create several datasets from an initial dataset, splitting it depending on the value of a column. Need to create one dataset per year ? Need to split a large log file storing different types of events ? Think about this recipe !
  • Stack: the opposite operation to Split. It will let you vertically concatenate multiple datasets. Don’t worry if the columns are not in the same order in the datasets, or even if you have different columns: DSS will make both schemas consistent.
  • Sample/Filter: select records matching user-defined rules, from the simplest filters to the most complex patterns (including regular expressions). It also provides different ways to sample the records from the dataset, hence making the process of data exploration quicker and easier.

Column-oriented view in preparation V2.0 - May 2015

Column-oriented view to understand and apply mass actions to your datasets more intuitively.

Diagnostics V2.0 - May 2015

DSS 2.0 lets you easily access to the logs of the server, providing a way to quickly debug your workflows and identify potential issues. It may be super useful for yourself but also when you are interacting with Dataiku’s support team.

To find this tool, go to the Administration menu, and click on the Logs tab. You will easily be able to browse and download the last 1000 records of the backend log file for instance, or use the complete Diagnostic tool to include checks on the configuration of the server hosting DSS.

And much more in DSS 2.0 ! V2.0 - May 2015

See our Release notes

Geographic data visualization (BETA) V1.4 - January 2015

Automatically display your geographic data on beautiful maps.

Also includes many new geo-related features

New visual data transformations V1.4 - January 2015

Do more without code with DSS

  • Group and aggregate rows (in-memory, in-database or on-Hadoop)
  • Split datasets
  • Filter rows

Enhanced enterprise security V1.4 - January 2015

Improved integration with enterprise security architectures

  • Interact with Kerberos-secured Hadoop clusters
  • Connect DSS to your enteprise LDAP directory

Advanced clusters profiling V1.3 - October 2014

New visualizations and insights let your explore the results of your clustering models.

  • View and compare data distributions
  • DSS automatically generates the most prominent facts for each dataset
  • Give labels and descriptions to your clusters

New R support V1.3 - October 2014

Do you use the R langage for your data analytics ? DSS now has advanced R integration

  • Read and write datasets on any storage directly from R
  • R notebook for interactive work

Hadoop File Formats V1.3 - October 2014

DSS now has support for many new Hadoop file formats. Including Pig, Hive, Impala and complex types support

  • Avro
  • Parquet
  • Sequence files
  • RC and ORC files

More flexibility for complex data flows V1.3 - October 2014

  • Preview your jobs and understand how your datasets interact
  • New features for advanced partitioning

And much more in DSS !

Read our release notes and our blog posts for more details.