Tutorial: Deploying to production

Once you have designed a flow and automated updates to the flow, you can deploy it to a production environment.

Development and Production environments

A development (or sandbox) environment is an environment where you experiment, testing new flows that transform your data and build models to be put into production. Failures in this environment are an expected part of its experimental nature.

A production environment is where serious operational jobs are run. This environment is runnable whenever necessary and may serve external consumers for their day-to-day decisions, whether those consumers are humans or software. Failure is not an option in production and rolling-back to a previous version is critical.

Dataiku provides two dedicated nodes to handle both development and production:

  • Dataiku Design Node is used for the development of data projects. It provides capabilities for both the creation of data pipelines, models creation, and the definition of how they are meant to be reconstructed. Projects developed in the Design Node are packaged and handed off to the Automation Node.

  • Dataiku Automation Node is used to import packaged projects defined in the Design Node and run them in the production environment. When you make updates to the project in the Design node, you can create an updated version of the project package, import the new package into the Automation node, and control which version of the project runs in production.

Development work from the Design node flows to the Automation node, and while it is technically possible to make changes to a project in the Automation node, those changes don’t flow back to the Design node, so it’s best practice to do all development in the Design node.

Let’s get started!

In this tutorial, you will learn how the Design and The Automation Nodes work together:

  • packaging flows for deployment
  • versioning flows
  • deploying packages in a production environment

We will work with the fictional retailer Haiku T-Shirt’s data.

Prerequisites

This tutorial assumes that:

Create your project

From the homepage of the Dataiku Design node, click +New Project, select DSS Tutorials from the list, select the Automation grouping, then Deployment (Tutorial). For the purposes of this tutorial, the flow and automation scenarios are complete, and we simply need to package the flow and deploy it to the Automation node.

Packaging a flow into a bundle

A video below goes through the content of this section.

In order to package the flow into a bundle, from the project home navigate to the Bundles area. Click Create your first bundle and name it automation_v1.

Key concept: Bundle

A bundle is a snapshot of a complete DSS project.

The bundle includes the project configuration so that it can be deployed to a Dataiku DSS Automation node. In addition, sometimes, in your Flow, the data for some datasets (such as enrichment data) or models (that are retrained in the development and not the production environment) needs to be transported to the production environment.
A bundle can contain data for an arbitrary number of datasets, managed folders and saved models.

A bundle thus acts as a consistent packaging of a complete flow. On the Automation node, you then activate bundles to switch atomically the project to a new version. Bundles are versioned and you can revert to a previous bundle in case of a production issue with the new bundle.

You can set up multiple automation nodes to create continuous delivery pipelines (for example with a pre-production automation node, a performance test one, and the production one)

For this tutorial, we will include the data for the Orders and Customers managed folders, though in a real-life setting, these primary data sources would be different on the development and production environments.

Click Create. Download the bundle to your local system.

The following video goes through what we just covered

Deploying a bundle

A video below goes through the content of this section.

Log in to your Dataiku Automation node and create a New project. Import the bundle you just downloaded.

Connections mapping

Note that you may need to re-map connections to data sources as they exist in the Design node to how they exist in the Automation node. Dataiku DSS will prompt you if this is necessary.

You need to have in the Automation node connections of the proper type, but their definition can change. A simple example of this is a SQL database: you'll have a production database separate from the development database, so when deploying the bundle, you'll need to reattach the SQL datasets to the production database.

Choose to activate a bundle from the list, select automation_v1, and click Activate.

When creating a new project and activating its first bundle, the Dataiku Automation Node deactivates all of its scenario to avoid unwanted data reconstruction unless explicitly requested. Navigate back to the main project page, and click on the Automation link. Activate the scenario by turning the Rebuild data and retrain model scenario auto-trigger to on.

The following video goes through what we just covered

That’s it! The flow you set up in the Design node is now running in production.

Versioning a flow

A video below goes through the content of this section.

Now, let’s say that we want to change the scenario in the flow so that it runs on a monthly basis, rather than when the underlying data sources update. To do this, we will:

  • update the project on the Design node
  • repackage the project into a new bundle version
  • deploy the new bundle to the Automation node

Open the original project on the Design node. Navigate to the Scenarios tab and open the Rebuild data retrain model scenario. Turn off the existing trigger (rather than deleting it, in case you want to switch back to this trigger later), click Add trigger, and select Time-based trigger. Name the trigger Monthly rebuild and retrain. Make the frequency of the trigger Monthly and have it set on the 1st of each month. Save your scenario.

Navigate to the Bundles area. Click Create bundle and name it automation_v2. Leave a descriptive release note for your colleague in charge of the production environment, like Changed the scenario trigger to be monthly. These changes are now visible in the commit log and diff tab, both here on the Design node, and when the bundle is redeployed on the Automation node. Click Create, then download this bundle to your local system.

The following video goes through what we just covered

Activate the new bundle in the automation node

A video below goes through the content of this section.

In the Automation node, navigate to the Bundles tab and click Import bundle. Find the bundle you just created and import it. Select the automation_v2 bundle and activate it.

If you check the scenarios, you will see that the flow in production has been updated to the most recent version. If you need to roll back to a previous version, simply select that version in the Bundles area and activate it.

The following video goes through what we just covered

Next steps

Congratulations! Deploying a flow to production and managing versions of the flow is easy to do in Dataiku DSS. See the related information links on the right for more on automation and production.

See the next tutorial on deploying to scoring API to learn how to deploy your models for real-time scoring.