What's new

Highlights of the latest Dataiku DSS releases.
More details in our release notes

Deep Learning V 5.0 - September 2018

Define a Deep Learning architecture using the Keras library to build a custom model in Dataiku’s Visual Machine Learning. You can then train, deploy, and score the model like any other model created and managed in Dataiku DSS.

Deep learning offers extremely flexible modeling of the relationships between a target and its input features, and is used in a variety of challenging applications, such as image processing, text analysis, and time series.

Learn more in general about deep learning in our Deep Learning Guidebook, and learn specifically about the Dataiku implementation in our tutorials and in the reference documentation

Enhanced Collaboration V 5.0 - September 2018

Dataiku DSS is all about making teams more efficient, so we want you to promote your work to others! In 5.0, you can:

Docker and Kubernetes Containerized Execution V 5.0 - September 2018

Share the load! Some processing tasks can be spun off of a DSS Design or Automation node into hosts powered by Docker or Kubernetes. This is fully compatible with cloud managed serverless Kubernetes stacks!

  • Python and R recipes
  • Plugin recipes
  • In-memory machine-learning
Please see Running in containers for more information.

And much more ... V 5.0 - September 2018

Dataiku DSS 5.0 also brings the following:

  • Resource management with Cgroups
  • Revamped homepage and global navigation within Dataiku DSS
  • Organize projects into folders
  • Automatically stop idele Jupyter notebooks
  • Train XGBoost models on GPU

Find all details in our release notes.

Deploy models in production on the cloud with Kubernetes V 4.3 - June 2018

Deploy models as APIs in production in a few clicks. Deploy highly scalable modes, on-premise or on the cloud, using native Kubernetes integration.
Learn more in our blog post, our product page and in the reference documentation

Deploy dynamic and elastic EMR clusters V 4.3 - June 2018

Launch an Amazon EMR cluster from the Dataiku interface in minutes. You don’t need to worry about node provisioning, cluster setup, Hadoop configuration, or cluster tuning. Dataiku together with Amazon EMR take care of these tasks so you can focus on analysis.
Request power only when you need it and stop being dependant of your IT folks. You can now provision one, hundreds, or thousands of compute instances to process data at any scale that will become immediately available for your data team.
Learn more in our blog post and in the reference documentation

And much more ... V 4.3 - June 2018

Dataiku DSS 4.3 also brings the following:

  • Fast scoring and more options for XGBoost algorithm
  • Reorder columns visually in the prepare recipe
  • New fast sync for options AWS and Azure
  • New productivity options for the Flow
  • and many others!

Find all details in our release notes.

New Homepage for Automation Monitoring and Scheduling V 4.2 - March 2018

Schedule and monitor analytics pipelines and follow the evolution of models and datasets using metrics.

Sample weights for training and optimization V 4.2 - March 2018

Use weights so that rows you consider more important account for more in the training and optimization of models.

And much more ... V 4.2 - March 2018

Dataiku DSS 4.2 is not a small release! Some other major new features include:

  • support for writing in BigQuery, including in visual recipes
  • generating of models via a public API
  • SQL impersonation on Oracle and SQL Server
  • download plugins from Git repositories
  • and many others!

Find all details in our release notes.

Live Model Competition V 4.1 - November 2017

Watch in real time as different machine learning models compete, and save time and resources by picking winners and losers before training is complete.

New Point-and-Click Data Prep V 4.1 - November 2017

Dataiku DSS 4.1 introduces new visual recipes that bring powerful analytical functionalities to non-coders, including pivoting, sorting, and splitting datasets.

Reproducible Environments V 4.1 - November 2017

Dataiku DSS 4.1 now supports virtual code environments. Take a snapshot of the packages used for each project so you don't have to worry about upgrades impacting existing or deployed projects.

Expanded End-to-End Capabilities V 4.1 - November 2017

Dataiku DSS 4.1 significantly strengthens the API node. In addition to scoring Dataiku-created, Python and R models, it can also run any function coded in these languages. It also allows for parameterized SQL queries and database lookups.

New Capabilities for Coders V 4.1 - November 2017

The latest release brings advanced visualization libraries like RShiny and Bokeh for rapidly creating engaging interactive web applications within dashboards. Additionally, RMarkdown reports let users easily share their results outside of Dataiku. Other features include the support of Python 3, as well as a brand new code editor.

More Powerful Charts V 4.1 - November 2017

Dataiku DSS 4.1 introduces additional dimensions in its chart engine. You can now use another property to split your chart into multiple subcharts, or even create an animated version of the chart!

And much more ... V 4.1 - November 2017

Dataiku DSS 4.1 is one of our largest releases ever. Some other major new features include:

  • a “magical flow” with UX improvements in our visual representation of project workflows
  • an expanded toolkit for creating plug-ins within Dataiku DSS
  • a wide assortment of color palettes for data visualization
  • a new type of chart to display geometries
  • the creation of an ensemble model using multiple trained models
  • the capability to gridsearch MLLib models
  • indexation of external databases into a central catalog
  • plugin edition for all users
  • and many others!

Find all details in our release notes.

Interactive dashboards V 4.0 - February 2017

Dataiku DSS 4.0 features new, user-built, interactive dashboards, which allow users to create and customize dashboards with charts, metrics, text and images -- and also with models and scenarios, so that dashboard users can dive deeper into the analysis than ever before.
Dataiku users can also now create multiple dashboards per project.

Spark pipelines V 4.0 - February 2017

With Dataiku 4.0, run consecutive Spark recipes in a single Spark job and avoid writing intermediate datasets, thus dramatically improving run-time performance. In a regular data pipeline, we would have to load the full dataset at the beginning of each new calculation, but in Dataiku 4.0, we are able to run all the calculations in-memory, meaning we skip a whole lot of re-loading data.

Dataiku 4.0 also features support for Spark 2.

Interactive hierarchical clustering V 4.0 - February 2017

With Dataiku 4.0, you can say goodbye to the often long and labor-intensive process involved in cluster analysis. With a new "hierarchical clustering" feature, you can interactively define your clusters by digging deeper into some clusters than others.

And much more ... V 4.0 - February 2017

Dataiku 4.0 is by far the largest Dataiku DSS release ever. Among the other major new features:

  • Hadoop multi-user setup for secure collaboration
  • Quick machine learning models according to user-specified requirements
  • Controllable notifications, including integration with Slack, Hipchat, and Github
  • New sampling methods (exact random, stratified, class rebalancing, last records)
  • The ability to sort and analyze the entire data set while exploring data
  • and many others!

Find all details in our release notes.

Scala V 3.1 - July 2016

DSS goes Spark-native with the addition of Scala! Scala is the most native language for the Spark ecosystem. It is the only language in which Spark users can write very fast User-Defined functions to work on Dataframes.

DSS 3.1 includes a Spark-Scala recipe and an interactive notebook. The recipe features automatic code validation.

H2O (Sparkling Water) integration V 3.1 - July 2016

H2O is a distributed machine-learning library, with a wie range of algorithms and methods.
DSS now includes full support for H2O (in its "Sparkling Water" variant) in its visual machine learning interface.

You can now create H2O models with absolutely no code required:

  • Deep Learning models
  • Generalized Linear Models
  • Gradient boosting
  • Random forests
  • Naive Bayes models

DSS 3.1 also includes support for in-database machine learning on Vertica, using Vertica's Advanced Analytics package

Navigator V 3.1 - July 2016

Boost your productivity! You can now very quickly navigate from a DSS object to another (from recipe to dataset to another recipe to model to analysis ...).

Hit Shift+A on any screen to enter the navigator.

New machine learning visualizations V 3.1 - July 2016

Getter better insights into your models. DSS 3.1 now includes visualization of trees for decision tree, random forest and gradient boosting algorithms.
It also includes visualization of partial dependency plots for gradient boosting algorithms.

And much more ... V 3.1 - July 2016

DSS 3.1 is a tremendous release for DSS. Among the major other new features:

  • an improved DSS home page with the ability to define custom project status and workflow
  • prebuilt notebook templates with support for PCA, correlations analysis, time series analytics
  • support for Netezza, SAP HANA and Google BigQuery
  • more support for custom algorithms in machine learning
  • better preprocessing options in machine learning
  • the ability to read unlimited Excel files
  • and many others!

Find all details in our release notes.

Metrics & Checks V 3.0 - May 2016

You can now track various advanced metrics about datasets, recipes, models and managed folders (size of a dataset, average of a column in a dataset, model performance metrics, ...). You can also define custom metrics using Python or SQL.

Metrics are historized for deep insights into the evolution of your data flow and can be fully accessed through the DSS APIs.

Then, you can define automatic data checks based on these metrics, that act as automatic sanity tests of your data pipeline. For example, automatically fail a job if the average value of a column has drifted by more than 10% since the previous week.

Version control & activity V 3.0 - May 2016

Version control (based on git) is now integrated much more tightly in DSS

  • View the history of your project, recipes, scenarios, ... from the UI
  • Write your own commit messages
  • Auto or explicit commit modes

Plus, we've added a lot of team activity dashboards to follow what's going on in your data projects.

And much more ... V 3.0 - May 2016

DSS 3.0 is one of our biggest releases ever. We did not even mention the new public APIs, administrator monitoring dashboards, the new security options, improvements to project export, ...

Find all details in our release notes.

Visual preparation, reloaded V2.3 - February 2016

The visual interactive data preparation DSS has received a huge overhaul. It is now even easier and more productive to clean and enrich your datasets.

  • Color the cells of the data table: locate co-occurences easily
  • Group, color and comment steps
  • New Quick Columns View for immediate overview of all your data
  • Much improved Formula and Python processors, for more advanced transformations
  • Users can now define their own meanings
  • And much more, see our release notes.

Flow Views V2.3 - February 2016

The Flow views system is an advanced productivity tool, especially designed for large projects. It provides additional "layers" onto the Flow

  • Color by tag and propagate tags
  • Recursively check and propagate schema changes accorss a Flow
  • Check the consistency of datasets and recipes accorss a project

Data Catalog V2.3 - February 2016

Since the very first versions, DSS let you search within your project.
Thanks to the new Data Catalog, you now have an extremely powerful instance-wide search. Even if you have dozens of projects, you'll be able to find easily all DSS objects, with deep search (search a dataset by column name, a recipe by comments in the code, …).

New SQL / Hive / Impala notebook V2.3 - February 2016

The SQL / Hive / Impala notebook now features a "multi-cells" mechanism that lets you work on several queries at once, without having to juggle several notebooks or search endlessly in the history.

You can also now add comments and descriptions, to transform your SQL notebooks into real data stories.

Prediction API server V2.2 - November 2015

DSS now features a real-time API server for predictions.

Using the REST API of the DSS API node, you can request predictions for new previously-unseen records in real time.
The DSS API node provides high availability and scalability for scoring of records.

Thanks to its advanced features, the DSS API node is at the heart of the feedback and improvement loop of your predictive models

Window analytics recipe V2.2 - November 2015

Window functions are one of the most powerful features of SQL (and SQL-like: Hive, Impala, Spark, ...) databases. They are also one of the least known and most tricky to use when using the SQL language.

The Window visual recipe in DSS makes it extremely easy to leverage these features. Thanks to this, you'll be able to do things like:

  • Filter rows by order of appearance within a group
  • Compute moving averages, cumulative sums, ...
  • Compute the number of events occured in the 7 days prior to another event

This recipe makes it easy to create your window functions without writing code, and also brings advanced productivity for advanced users (mass actions, multiple windows, pre and post filters, ...)

And much more in DSS 2.2 ! V2.2 - November 2015

In addition to these major features, DSS 2.2 improves virtually every module of DSS:

  • The plugins system has been vastly enhanced
  • A new system for long-running tasks, improving reliability
  • Better support for dates without timezones
  • New APIs let you automate even more parts of DSS

Read our Release notes for more information.

Spark (SparkSQL, MLLib, PySpark and SparkR) V2.1 - September 2015

DSS now includes full integration with Apache Spark, the next-generation distributed analytic framework. Spark brings blazingly fast in-memory processing to all kind of data.

The integration of Spark in DSS 2.1 is pervasive and extends to all of the following features, which are now Spark-enabled:

  • Visual data preparation
  • "VisualSQL" recipes (Grouping, Joining, Stacking)
  • Guided machine learning in analysis
  • Training and prediction in Flow
  • PySpark recipe
  • SparkR recipe
  • SparkSQL recipe
  • PySpark-enabled notebook
  • SparkR-enabled notebook

All DSS data sources can be handled using Spark. As always in DSS, you can mix technologies freely, both Spark-enabled and traditional

Plugins V2.1 - September 2015

Plugins let you extend the features of DSS. You can add new kinds of datasets, recipes, visual preparation processors, custom formula functions, and more.

Plugins can be downloaded from the official Dataiku community site, or created by you and shared with your team.

Improved charts V2.1 - September 2015

The charts module has been completely redesigned with many new features and chart types added:

  • Scatterplots
  • Horizonal bar charts
  • Pie and donuts charts
  • Boxplots
  • Pivot tables (text view)
  • Geo scatter map
  • Geo grid map
  • 2D distribution chart
  • A brand new user experience
  • New drill-down and exclude features

Managed Folder and editable dataset V2.1 - September 2015

DSS comes with a large number of supported formats, machine learning engines, ... But sometimes you want to do more. DSS code recipes (Python and R) can now read and write from "Managed Folders", handles on filesystem-hosted folders, where you can store any kind of data.

Editable datasets are a new kind of dataset in DSS, which you can directly create and modify in the DSS UI, ala Excel or Google Spreadsheets.

Shareable code snippets V2.1 - September 2015

In all modules of DSS where you can write code, you now have the ability to insert code snippets. DSS comes builtin with lots of useful snippets, and you can also write your own and share them with your team.

And much more in DSS 2.1 ! V2.1 - September 2015

DSS 2.1 is a truly amazing release, and we have only started to cover what's new:

  • A REST public API
  • New Jupyter notebook and R API
  • Impala recipe
  • Shell recipe

See our Release notes.

New User Experience V2.0 - May 2015

The user experience of DSS has been redesigned and new helper tools added.

  • Thanks to the organization in universes, you’ll always find what you need at your fingertips.
  • The new sidebar gives you immediate access to all actions in context.
  • A redesigned search that gives you immediate access to your recent items and contextually-relevant objects
  • The streamlined Flow lets you focus on what matters most and reduces visual clutter
  • Use checklists to organize your collaborative work in projects

Vastly improved machine learning V2.0 - May 2015

DSS 2.0 brings a completely redesigned machine learning interface, that will let you create both highly performing supervised and unsupervised models. Major enhancements include:

  • tight integration with data preparation: the modeling process being highly iterative, DSS lets you now automatically take into account new variables created in a data preparation Script in your Models
  • more algorithms: Decision Trees and Gradient Boosted Trees for classification. You can keep on using your own custom models as long as they have an scikit-learn compatible API.
  • advanced cross validation: use several strategies to train your models on one dataset and test them on another one
  • enhanced features preprocessing : Feature hashing for high cardinality categorical features. Binarization and quantile-cut for numerical features. Hashing, vectorization and TF/IDF for text features
  • vastly improved results screens: DSS now offers more helpers to interpret the results of your models, including coefficients of linear models. A new Decision Chart view has been added (an “expanded” view of the confusion matrix for different probability cut-offs), as well as a Density Chart to explore the probability distribution of your mode
  • easier deployment process: models can be easily used inside a workflow to score new records, and they can also be retrained on the fly, as your datasets are evolving. DSS will let you see the history of trained models, and select which version should be used in your workflow.

New visual recipes V2.0 - May 2015

Because writing long SQL code can be tedious and sometimes error prone, and also because not everybody wants to even write code, DSS comes with 5 new visual “recipes” that will help you prepare your data faster.

  • Group: quickly generate summarized statistics from the columns of your dataset according to one or several grouping keys (a.k.a group by operations in SQL). No more need to write long and error prone queries: define your grouping keys, your aggregations (including custom expressions), and DSS will generate the underlying code for you, eventually pushing it down to your database or Hadoop cluster.
  • Join: an exciting new feature of DSS that helps you merge two or more datasets together. DSS will take you through the steps of visually defining your joins: input data, joining keys, type of join (inner, left…), list of columns to fetch, and even complex conditions such as fuzzy lookups
  • Split: create several datasets from an initial dataset, splitting it depending on the value of a column. Need to create one dataset per year ? Need to split a large log file storing different types of events ? Think about this recipe !
  • Stack: the opposite operation to Split. It will let you vertically concatenate multiple datasets. Don’t worry if the columns are not in the same order in the datasets, or even if you have different columns: DSS will make both schemas consistent.
  • Sample/Filter: select records matching user-defined rules, from the simplest filters to the most complex patterns (including regular expressions). It also provides different ways to sample the records from the dataset, hence making the process of data exploration quicker and easier.

Column-oriented view in preparation V2.0 - May 2015

Column-oriented view to understand and apply mass actions to your datasets more intuitively.

Diagnostics V2.0 - May 2015

DSS 2.0 lets you easily access to the logs of the server, providing a way to quickly debug your workflows and identify potential issues. It may be super useful for yourself but also when you are interacting with Dataiku’s support team.

To find this tool, go to the Administration menu, and click on the Logs tab. You will easily be able to browse and download the last 1000 records of the backend log file for instance, or use the complete Diagnostic tool to include checks on the configuration of the server hosting DSS.

And much more in DSS 2.0 ! V2.0 - May 2015

See our Release notes.

Geographic data visualization (BETA) V1.4 - January 2015

Automatically display your geographic data on beautiful maps.

Also includes many new geo-related features.

New visual data transformations V1.4 - January 2015

Do more without code with DSS

  • Group and aggregate rows (in-memory, in-database or on-Hadoop)
  • Split datasets
  • Filter rows

Enhanced enterprise security V1.4 - January 2015

Improved integration with enterprise security architectures

  • Interact with Kerberos-secured Hadoop clusters
  • Connect DSS to your enteprise LDAP directory

Advanced clusters profiling V1.3 - October 2014

New visualizations and insights let your explore the results of your clustering models.

  • View and compare data distributions
  • DSS automatically generates the most prominent facts for each dataset
  • Give labels and descriptions to your clusters

New R support V1.3 - October 2014

Do you use the R langage for your data analytics ? DSS now has advanced R integration

  • Read and write datasets on any storage directly from R
  • R notebook for interactive work

Hadoop File Formats V1.3 - October 2014

DSS now has support for many new Hadoop file formats. Including Pig, Hive, Impala and complex types support

  • Avro
  • Parquet
  • Sequence files
  • RC and ORC files

More flexibility for complex data flows V1.3 - October 2014

  • Preview your jobs and understand how your datasets interact
  • New features for advanced partitioning

And much more in DSS !

Read our release notes and our blog posts for more details.