Operationalization: From 1 to 1000s of Models in Production
Operationalization: From 1 to 1000s of Models in Production
Once a data project has progressed through the stages of data cleaning and preparation, analysis and experimentation, modeling, testing, and evaluation, it reaches a critical stage, one that separates the average company from the truly data-powered one: operationalization.
In order to realize real business value from data projects, machine learning models must not sit on the shelf; they need to be operationalized. Operationalization simply means deploying a machine learning model for use across the organization.
apəreʃənələzeʃən | (n)
to operationalize (v)
The process of converting data insights into actual large-scale business and operational impact. This means bridging the huge gap between the exploratory work of designing machine learning models and the industrial effort (not to mention precision) required for deployment within actual production systems and processes. The process includes, but is not limited to: testing, IT performance tuning, setting up a data monitoring strategy, and monitoring operations.
For example, a recommendation engine on a website, a fraud detection system for customers, or a real-time churn prediction model that is at the heart of a company’s operations cannot just be APIs exposed from a data scientist’s notebook — they require full operationalization after their initial design.
Logistically speaking, o16n is often difficult for enterprises to execute on because it requires coordination, collaboration, and change not just at the organizational level, but often at the system architecture and deployment/IT levels as well.
But it’s truly the final and most important step of the process, as data projects are rendered incomplete without being operationalized — that is, incorporated into the fabric of the business to see monetary results that have real impact.
Before diving into the execution of o16n, it’s helpful to learn from the mistakes of others. Generally, operationalization efforts fail when:
The process is too slow. The reality around most data projects is that they don’t bring real value to the business until they’re in a production environment. Therefore, if this process isn’t happening quickly enough, o16n efforts will fall flat.
Lines of business are not involved in the process. Operationalization happening in a vacuum without any input from business teams is doomed to failure, as projects tend to get delivered that don’t address real needs, or do so superficially.
There is a lack of follow-up and iteration. The point of rapid operationalization is to get models out of a sandbox and into production quickly in order to evaluate their impact on real business processes. If models are operationalized and then forgotten, they could — over time — have adverse effects on the business. Instead, constant monitoring, tweaking, and follow-up is the key.
O16n: Keys for Execution
Operationalization is, without doubt, a challenge, but it’s one that companies can surmount by ensuring the right tools, communication, and processes are applied. Here are some of the steps for a smooth o16n execution:
The first, and perhaps most overlooked, step is simply learning from, listening to, and working with business teams to ensure that the solution that will be operationalized is a viable one. While this sounds like a no-brainer, it’s often overlooked, resulting in lots of lost time and effort from data teams in building a solution that doesn’t actually provide any business value in the end.
From a methodology standpoint, o16n requires a consistent, efficient process for integration, testing, deployment, and then measuring impact and monitoring performance.
After releasing models, it is critical to implement an efficient strategy for their retraining and updating. Implementing a retrain-in-production methodology is a key to o16n success; without it, retraining a model becomes an actual deploy-to-production task, with the result requiring significant manpower and a loss of agility.
Additionally, a successful o16n strategy involves functional monitoring, which is used to convey the model’s performance to the business sponsors, owners, or stakeholders. This provides an opportunity to demonstrate the end-results of the model in production for evaluation.
Getting Models Into Production With Dataiku
Dataiku is one of the world’s leading AI and machine learning platforms, supporting agility in organizations’ data efforts via collaborative, elastic, and responsible AI, all at enterprise scale. Data-powered businesses use Dataiku to power self-service analytics while also ensuring the operationalization of machine learning models in production.
Dataiku provides two dedicated nodes to handle both the development and production of ML models:
Dataiku Design Node is used for the development of data projects. It provides capabilities for the creation of data pipelines and models, plus the definition of how they are meant to be reconstructed. Projects developed in the Design Node are packaged and handed off to the Automation Node.
Dataiku Automation Node is used to import packaged projects defined in the Design Node and run them in the production environment. When you make updates to the project in the Design node, you can create an updated version of the project package, import the new package into the Automation node, and control which version of the project runs in production.
By allowing to split the design from production environments, companies can both put into place an area for data scientists to collaborate and continue to iterate, but also have an environment where the best current model is put into production, can be queried by other clients, and be safe in a stable environment.
AI-Driven Services: The Invaluable Enterprise Asset
Creating real value from data means building - and maintaining - a spectrum of AI-driven applications and services that run as a core part of the business.