howto

Custom Python Models

Applies to DSS 3.1 and above | January 05, 2018

Dataiku DSS provides useful and powerful built-in visual tools, but also always allows you to extend its functionality with code.

This article explains how to add custom Python code-based machine learning (ML) algorithms in our guided Machine Learning framework. You are advised to read the tutorial Your first predictive model prior to starting this reading.

Specifying the Custom Model

In a Visual Analysis (Lab), go to the Models tab. In the Design stage of a model, go to the Algorithms panel. The list of algorithms begins with the built-in models. You can add custom Python models at the bottom of the list. Click the edit button to change the model name, displayed in the output, from the default “Custom Python Model”. Dataiku DSS provides some code samples to get you started:

Algorithms settings for a prediction; code samples for a custom model

The code must follow some constraints depending on the backend you have chosen (in-memory or MLlib).

In this example, we use the Python in-memory backend: the custom code needs to implement a classifier which has the same methods as a classifier in scikit-learn, that is, it must provide the methods fit() and predict() (and predict_proba() when this makes sense)

Dataiku DSS will leverage these in its machine learning pipeline to train, compute performance statistics, and generate the associated visual insights!

Session Output With A Custom Model

Once trained, your custom model appears in the session output in the list of all models built during the session.

Summary output for a custom model along with the standard models

Assessing Custom Model performance

Open the custom model to visualize its performance and all associated visual insights, just as you would with a built-in model.

Detailed output for custom model

The custom model can now be deployed in the flow and used just like a standard built-in model!

Please read the reference documentation to implement your own MLlib models in Scala while still using Dataiku DSS modeling in the Visual Analysis.

What’s Next

If modeling in the Visual Analysis with custom code does not suit your needs, you can also take full control by coding the whole machine learning pipeline by yourself (train, score, validation, etc) using your preferred languages (python or R, Scala or Shell), thus leveraging any external ML libraries.