Tutorial: Deploying to real-time scoring

Dataiku DSS allows you to deploy predictive models as API services for real-time scoring.

Let’s get started!

In this tutorial, you will learn:

  • packaging an API service, which includes a model, for deployment
  • deploying a service to the real-time scoring environment
  • versioning service packages

We will work with the fictional retailer Haiku T-Shirt’s data.

Prerequisites

This tutorial assumes that you have access to a:

In particular, we assume that the Design node is connected to the API Deployer node, and the API Deployer node has a static infrastructure defined over the API node.

Create your project

If you have already completed the Tutorial: Deployment to production, we are going to use the same project on the Dataiku Design node.

If not, then in the Dataiku Design node, click on the Tutorials button in the left pane, select the Automation grouping, then Deployment (Tutorial). For the purposes of this tutorial, the model is complete, and we simply need to package the model and deploy it to the API Scoring node.

Creating an API service and packaging a model

API services are defined in one or more Design or Automation nodes and pushed to the Dataiku API Deployer, which in turn deploys the services to (possibly many) Dataiku API nodes, which are individual servers that do the actual job of answering HTTP requests. In this section, we are going to define a scoring service on a Design node. In the next section, we will use the API Deployer node to deploy and activate it on an API Node.

API services

A Dataiku API Service consists of one or more endpoints, e.g. URIs to which HTTP requests are posed, and from which the response is thus expected. For example, getting a prediction score is accessible from an endpoint in a specific service.

  • Dataiku provides easy creation of such endpoints on Dataiku models (i.e. built in a Dataiku analysis), or on models generated with custom code.
  • A Dataiku Model is more than just a mere machine learning algorithm. It includes the entire pipeline starting from raw data, through cleansing of the visual preparation, feature preprocessing, and finally the model scoring.
  • In order to be used in an endpoint, the visual models have to be deployed to the flow.

A video below goes through the content of this section.

Let’s start the actual work on the definition of the prediction service. From the project Flow on the Design Node, select the High revenue prediction model and click Create API. In the New scoring API from model dialog, name the API service Tutorial_Deployment, and name the endpoint High_Revenue_Customers. Click Append.

The model for predicting whether a customer will become high-revenue is now part of the Tutorial_Deployment service and ready to be used. Before we package the service, let’s explore the endpoint a bit.

A video below goes through the content of this section.

  • Enrichments is used for feature enrichment using a lookup on an additional table. This is useful when the model included features that might not be available to the client making an API request. For example, say our model incorporated information about demographic and economic indicators for the country a customer comes from. We would then want to do some real-time enrichment of the query.

  • Test queries are useful to check that everything is working as expected and to understand how to query the endpoint.

On the Test queries panel, click +Add Queries, then add 3 queries from the Orders_by_Customer dataset, and click Run test queries. You can see the results of the tests and whether they are correct according to your model.

Your service is now ready. Click Push to API Deployer.

Deploying a service

Login to your Dataiku API Deployer node. Tutorial_Deployment is in the list of available services. Click Deploy.

In the New deployment dialog, we’ll leave the default infrastructure to deploy to and name for this deployment. Click Deploy. This deploys the service to the API node and brings us to the Summary tab of the High_Revenue_Customers prediction endpoint.

You can now submit queries to this service using the API node API. The Sample code tab provides snippets for calling the API in various languages.

As a test, you can run the following in a terminal window, substituting APINODE_SERVER by the proper hostname and APINODE_PORT by the port for your API node.

curl -X POST \
  http://APINODE_SERVER:APINODE_PORT/public/api/v1/Tutorial_Deployment/High_Revenue_Customers/predict \
  --data '{ "features" : {
    "customer_id": "000314",
    "order_date_year_distinct": 1,
    "order_date_month_distinct": 1,
    "order_day_of_week_distinct": 1,
    "pages_visited_avg": 7,
    "total_sum": 17.5,
    "gender": "F",
    "age_first_order": 30,
    "user_agent_brand": "Chrome",
    "user_agent_os": "Windows",
    "user_agent_osversion": "Windows 7",
    "user_agent_osflavor": "32 bits",
    "ip_address_city": "Guiyang",
    "ip_address_geopoint": "POINT(106.7167 26.5833)",
    "campaign": false,
    "count": 1
  }}'

You can also run the test queries defined in the Design node; now they are being run on the API node.

Versioning a service

A video below goes through the content of this section.

Now, let’s say that we want to make changes to the predictive model and put the new version into production. To do this, we:

  • update the service on the Design node to use the new version of the model
  • push the new version of the service to the API Deployer
  • deploy the new version to the API node

In the project on the Design node, navigate to the Model panel of the endpoint, and click Go to model page. This takes you to a list of versions of the model that have been built. Select the logistic regression model to be the Active version.

Return to the Tutorial_Deployment service and click Push to API Deployer. Let’s give the new package a more descriptive name, v2-logistic-regression. Click Deploy.

In the API Deployer node, navigate to the Deployments section, where you’ll see the updated version of the service. Click Deploy. Choose to Update the service, and click OK. The service won’t be completely updated until you click Update. Select the Light update. Now the API node is running the latest version of the service.

Next steps

Congratulations! Deploying a model to production for real-time scoring and managing versions of the model is easy to do in Dataiku DSS. See the related information links on the right for more on real-time scoring.