Tutorial: Deploying to real-time scoring

The Dataiku DSS allows you to deploy predictive models for real-time scoring using its Dataiku API node.

Let’s get started!

In this tutorial, you will learn:

  • packaging an API service, which includes a model, for deployment
  • deploying a service to the real-time scoring environment
  • versioning service packages

We will work with the fictional retailer Haiku T-Shirt’s data.

Prerequisites

This tutorial assumes that:

Create your project

If you have already completed the Tutorial: Deployment to production, we are going to use the same project on the Dataiku Design node.

If not, then in the Dataiku Design node, click on the Tutorials button in the left pane, select the Automation grouping, then Deployment (Tutorial). For the purposes of this tutorial, the model is complete, and we simply need to package the model and deploy it to the API Scoring node.

Creating an API service and packaging a model

Dataiku API Node is dedicated to handle API services in production. The creation of these services are designed in the Dataiku Design Node (or the Dataiku Automation Node if the model is retrained regularly there). In this section, we are going to define a scoring service on Design Node. In the next section, we will deploy and activate it on the Dataiku API Node.

API services

A Dataiku API Service consists of one or more endpoints, e.g. URIs to which HTTP requests are posed, and from which the response is thus expected. For example, getting a prediction score is accessible from an endpoint in a specific service.

  • Dataiku provides easy creation of such endpoints on Dataiku models (i.e. built in a Dataiku analysis), or on models generated with custom code.
  • A Dataiku Model is more than just a mere machine learning algorithm. It includes the entire pipeline starting from raw data, through cleansing of the visual preparation, feature preprocessing, and finally the model scoring.
  • In order to be used in an endpoint, the visual models have to be deployed to the flow.

A video below goes through the content of this section.

Let us start the actual work on the definition of the prediction service. From the project home on the Design Node, navigate to the API Service tab. Click Create your first API service and name it Tutorial_Deployment.

Open the newly created service and click +Create Endpoint. Name it High_Revenue_Customers. Select High revenue prediction as the prediction model used by this endpoint. Click Append.

The model for predicting whether a customer will become high-revenue is now part of the Tutorial_Deployment service and ready to be used. Before we package the service, let’s explore the endpoint a bit.

A video below goes through the content of this section.

  • Features mapping is used for feature enrichement using a lookup on an additional table. This is usefull when the model included features that might not be available to the client making an API request. For example, say our model incorporated information about demographic and economic indicators for the country a customer comes from. We would then want to do some real-time enrichment of the query.

  • The Raw config is the JSON representation of the endpoint configuration.
  • Test queries are useful to check that everything is working as expected and to understand how to query the endpoint.

Add the following test queries on the Test queries panel, then click Run test queries:

{
   "features": {
      "pages_visited_avg": 4,
      "age_first_order": 20,
      "campaign": "False"
   }
}
{
   "features": {
      "pages_visited_avg": 8,
      "age_first_order": 35,
      "campaign": "True"
   }
}

Clicking on the Details, you can see the results of the tests and whether they are correct according to your model.

Navigate to the Packages tab and click Make Package. Leave the default name of v1. Typically, we want to use a short descriptive name for new versions of a service, but v1 is pretty clearly the initial, base version of the service.

Download the package to your local filesystem. It should be a file called Tutorial_Deployment_v1.zip.

Deploying a service

Login to your Dataiku API node. Upload Tutorial_Deployment_v1.zip; say to your home ~/ directory.

In the Dataiku DATA_DIR directory, run the following commands to create the service on the API node, import the package, and then activate the service

[me:DATA_DIR]$ ./bin/apinode-admin service-create Tutorial_Deployment
[me:DATA_DIR]$ ./bin/apinode-admin service-import-generation Tutorial_Deployment ~/Tutorial_Deployment_v1.zip
[me:DATA_DIR]$ ./bin/apinode-admin service-switch-to-newest Tutorial_Deployment

You can now submit queries to this service using the API node API.

As a test, you can put the following URL into your browser, substituting APINODE_SERVER by the proper hostname and APINODE_PORT by the port for your API node.

http://APINODE_SERVER:APINODE_PORT/public/api/v1/Tutorial_Deployment/High_Revenue_Customers/predict-simple?campaign=True&pages_visited_avg=8&age_first_order=35

Versioning a service

A video below goes through the content of this section.

Now, let’s say that we want to make changes to the predictive model and put the new version into production. To do this, we:

  • update the service on the Design node to use the new version of the model
  • repackage the service into a new package version
  • deploy the new package to the API node

In the project on the Design node, navigate to the High_Revenue_Customers endpoint of the service, and click Model details. This takes you to a list of versions of the model that have been built. Select the logistic regression model to be the Active version.

Return to the Tutorial_Deployment service and navigate to the Packages tab. Click Make package. Let’s give the new package a more descriptive name, v2-logistic-model. Download it to your local filesystem.

In the API node, upload Tutorial_Deployment_v2-logistic-model.zip; say, to your home ~/ directory.

In the Dataiku DATA_DIR directory, run the following commands to import the new package and then activate the latest version of the service.

[me:DATA_DIR]$ ./bin/apinode-admin service-import-generation Tutorial_Deployment ~/Tutorial_Deployment_v2-logistic-model.zip
[me:DATA_DIR]$ ./bin/apinode-admin service-switch-to-newest Tutorial_Deployment

Next steps

Congratulations! Deploying a model to production for real-time scoring and managing versions of the model is easy to do in Dataiku DSS. See the related information links on the right for more on real-time scoring.