Deployment, Containerization, Docker and Kubernetes.Features
MLOps with Dataiku
Deploy, monitor, and manage machine learning projects in production.
Deploying Projects to Production
The Dataiku unified deployer manages project files’ movement between Dataiku design nodes and production nodes for batch and real-time scoring. Project bundles package everything a project needs from the design environment to run on the production environment.
With Dataiku, data scientists can see all the deployed bundles, and data engineers of IT operations can quickly know when a new bundle requires testing and roll-out.
Batch Scoring with Automation Nodes
Dataiku automation nodes are production nodes with advanced automation capabilities to schedule everyday tasks for production projects like monitoring, updating data, and retraining models based on a schedule or triggers.
With automation nodes, AI projects run smoothly, and organizations can scale the number of AI projects in production.
Real-Time Scoring with API Nodes
The deployment of predictive insights for real-time applications requires a different set of characteristics than batch scoring, including dynamic scaling of resources to meet changing needs.
Dataiku API nodes make it easy to deploy API endpoint services on elastic, highly available infrastructure to support real-time scoring. With API nodes, organizations can deploy more projects and build downstream applications and processes powered by AI.
Deployment with ONNX, Even on the Edge
For connectivity, speed, cost, and privacy reasons, more and more use cases require putting the model, the sensor, and the data on the same small devices like smartphones or embarked processing units.
ONNX is an open format created by Facebook and Microsoft to enable interoperability between common deep learning frameworks. Dataiku supports model deployment using ONNX for prediction on a variety of environments, including the edge.
Monitoring and Drift Detection
Once AI projects are up and running in production, the real work begins. Operating AI projects use pipelines to process data and score in batch and real-time.
Dataiku monitors the pipeline to ensure all processes execute as planned and alerts operators if there are issues. For models, Dataiku provides data drift detection to check that scoring data and training data remain similar so that the model can deliver reliable results.
Automatic Model Retraining
Production models periodically need to be updated based on newer data, detected data drift, or an appropriate schedule.
Dataiku AI projects include automated retraining based on a schedule or triggers, such as significant drift. With automatic retraining in place, operations teams can focus on other pressing issues like troubleshooting and new projects moving to production.
Production Project and Model Updates
Updating projects manually in production can be challenging and risky, resulting in downtime for critical AI initiatives.
Dataiku makes it easy to update production artifacts — including models — with full Git integration and version management. Dataiku production nodes also support easy test and production environments, allowing for a robust dev-test-prod approach to updates with multiple production nodes.
Automate CI/CD with APIs for DevOps
DevOps tools and processes are standard in enterprise software projects. While AI projects are different in some ways, they still involve code artifacts and can benefit from a continuous integration and deployment approach.
Dataiku provides a full API to perform programmatic operations from external management systems used by DevOps teams. Dataiku integrates with the tools that DevOps teams already use like Jenkins, GitLabCI, Travis CI, or Azure Pipelines, to name a few.
Get Started with Dataiku
Start an online hosted trial, download the free edition,
or compare the features of the Lite, Team, and Enterprise editions.