Highlights of the latest Dataiku DSS releases.
More details in our release notes
DSS goes Spark-native with the addition of Scala! Scala is the most native language for the Spark ecosystem. It is the only language in which Spark users can write very fast User-Defined functions to work on Dataframes.
DSS 3.1 includes a Spark-Scala recipe and an interactive notebook. The recipe features automatic code validation.
H2O is a distributed machine-learning library, with a wie range of algorithms and methods.
DSS now includes full support for H2O (in its "Sparkling Water" variant) in its visual machine learning interface.
You can now create H2O models with absolutely no code required:
DSS 3.1 also includes support for in-database machine learning on Vertica, using Vertica's Advanced Analytics package
Boost your productivity! You can now very quickly navigate from a DSS object to another (from recipe to dataset to another recipe to model to analysis ...).
Hit Shift+A on any screen to enter the navigator.
Getter better insights into your models. DSS 3.1 now includes visualization of trees for decision tree, random forest and gradient boosting algorithms.
It also includes visualization of partial dependency plots for gradient boosting algorithms.
DSS 3.1 is a tremendous release for DSS. Among the major other new features:
Find all details in our release notes
You can now track various advanced metrics about datasets, recipes, models and managed folders (size of a dataset, average of a column in a dataset, model performance metrics, ...). You can also define custom metrics using Python or SQL.
Metrics are historized for deep insights into the evolution of your data flow and can be fully accessed through the DSS APIs.
Then, you can define automatic data checks based on these metrics, that act as automatic sanity tests of your data pipeline. For example, automatically fail a job if the average value of a column has drifted by more than 10% since the previous week.
Version control (based on git) is now integrated much more tightly in DSS
Plus, we've added a lot of team activity dashboards to follow what's going on in your data projects.
DSS 3.0 is one of our biggest releases ever. We did not even mention the new public APIs, administrator monitoring dashboards, the new security options, improvements to project export, ...
Find all details in our release notes
The visual interactive data preparation DSS has received a huge overhaul. It is now even easier and more productive to clean and enrich your datasets.
The Flow views system is an advanced productivity tool, especially designed for large projects. It provides additional "layers" onto the Flow
Since the very first versions, DSS let you search within your project.
Thanks to the new Data Catalog, you now have an extremely powerful instance-wide search. Even if you have dozens of projects, you'll be able to find easily all DSS objects, with deep search (search a dataset by column name, a recipe by comments in the code, …).
The SQL / Hive / Impala notebook now features a "multi-cells" mechanism that lets you work on several queries at once, without having to juggle several notebooks or search endlessly in the history.
You can also now add comments and descriptions, to transform your SQL notebooks into real data stories.
DSS now features a real-time API server for predictions.
Using the REST API of the DSS API node, you can request predictions for new previously-unseen records in real time.
The DSS API node provides high availability and scalability for scoring of records.
Thanks to its advanced features, the DSS API node is at the heart of the feedback and improvement loop of your predictive models
Window functions are one of the most powerful features of SQL (and SQL-like: Hive, Impala, Spark, ...) databases. They are also one of the least known and most tricky to use when using the SQL language.
The Window visual recipe in DSS makes it extremely easy to leverage these features. Thanks to this, you'll be able to do things like:
This recipe makes it easy to create your window functions without writing code, and also brings advanced productivity for advanced users (mass actions, multiple windows, pre and post filters, ...)
In addition to these major features, DSS 2.2 improves virtually every module of DSS:
Read our Release notes for more information
DSS now includes full integration with Apache Spark, the next-generation distributed analytic framework. Spark brings blazingly fast in-memory processing to all kind of data.
The integration of Spark in DSS 2.1 is pervasive and extends to all of the following features, which are now Spark-enabled:
All DSS data sources can be handled using Spark. As always in DSS, you can mix technologies freely, both Spark-enabled and traditional
Plugins let you extend the features of DSS. You can add new kinds of datasets, recipes, visual preparation processors, custom formula functions, and more.
Plugins can be downloaded from the official Dataiku community site, or created by you and shared with your team.
The charts module has been completely redesigned with many new features and chart types added:
DSS comes with a large number of supported formats, machine learning engines, ... But sometimes you want to do more. DSS code recipes (Python and R) can now read and write from "Managed Folders", handles on filesystem-hosted folders, where you can store any kind of data.
Editable datasets are a new kind of dataset in DSS, which you can directly create and modify in the DSS UI, ala Excel or Google Spreadsheets.
In all modules of DSS where you can write code, you now have the ability to insert code snippets. DSS comes builtin with lots of useful snippets, and you can also write your own and share them with your team.
DSS 2.1 is a truly amazing release, and we have only started to cover what's new:
See our Release notes
The user experience of DSS has been redesigned and new helper tools added.
DSS 2.0 brings a completely redesigned machine learning interface, that will let you create both highly performing supervised and unsupervised models. Major enhancements include:
Because writing long SQL code can be tedious and sometimes error prone, and also because not everybody wants to even write code, DSS comes with 5 new visual “recipes” that will help you prepare your data faster.
Column-oriented view to understand and apply mass actions to your datasets more intuitively.
DSS 2.0 lets you easily access to the logs of the server, providing a way to quickly debug your workflows and identify potential issues. It may be super useful for yourself but also when you are interacting with Dataiku’s support team.
To find this tool, go to the Administration menu, and click on the Logs tab. You will easily be able to browse and download the last 1000 records of the backend log file for instance, or use the complete Diagnostic tool to include checks on the configuration of the server hosting DSS.
See our Release notes
Automatically display your geographic data on beautiful maps.
Also includes many new geo-related features
Do more without code with DSS
Improved integration with enterprise security architectures
New visualizations and insights let your explore the results of your clustering models.
Do you use the R langage for your data analytics ? DSS now has advanced R integration
DSS now has support for many new Hadoop file formats. Including Pig, Hive, Impala and complex types support