Machine Learning

Guided Machine Learning

The interface of Dataiku DSS was designed to make creating potent machine learning models easy. Clicking through the interface is enough for most use cases, whether you are an expert Data Scientist or a beginner! Discover below what you can do in the visual interface:

Batch scoring

  • Learn how to save a model to the flow and use it to score another dataset.
  • A Dataiku model consists of a whole pipeline that combines data preparation, feature handling and a ML model. This means that you can directly score raw data with a Dataiku model, without reimplementing data cleaning nor feature preprocessing.

The lifecycle of a model in production

The models saved in the flow can be retrained, versioned and monitored. These capabilities are critical to all predictive applications used in production. See here how to handle the whole lifecycle of a model.

Machine learning engines

Dataiku DSS lets you use multiple machine learning engines within its guided machine learning framework:

scikit-learn Scikit-learn XGBoost XGBoost
Spark Machine Learning Library (MLlib) Spark MLlib H2O H2O Vertica Advanced Analytics Vertica Advanced Analytics

Custom models

The python and the MLlib machine learning engines allow you to define custom models by adding your own code while still taking advantages of the Dataiku DSS visual interface for machine learning.

Score in real time through a REST API

A saved model can be deployed into a Dataiku DSS API node to query a prediction on new data.

The API node provides all the necessary features for scoring in production:

Prediction examples

Clustering examples

Full control with code

Dataiku DSS allows users to code everything by themselves in Python, R, Scala, SQL or Shell. See the portal on coding in DSS for more information.