Watch our webinar with MandM Direct, UK's second-largest online fashion retailer, to find out how working with Dataiku and Google Cloud Platform has created value from data science in no amount of time.Learn More
Organizations are moving from experimenting with machine learning to scaling it in production environments, but one difficulty is maintenance. How can companies go from managing just one model to managing tens, hundreds, or even thousands?
“As the machine learning community continues to accumulate years of experience with live systems, a wide-spread and uncomfortable trend has emerged: developing and deploying ML systems is relatively fast and cheap, but maintaining them over time is difficult and expensive.”
This is exactly the challenge faced by MandM Direct, one of the largest online retailers in the United Kingdom with over 3.5 million active customers and seven dedicated local market websites across Europe. The company delivers more than 300 brands annually to 25+ countries worldwide, which means in 2020, they grew fast. Their accelerated growth meant more customers and, therefore more data, which magnified some of their challenges and pushed them to find more scalable solutions.
MandM Direct’s rapid growth resulted in two big challenges:
- Getting all the available data out of silos and into a unified, analytics-ready environment: The core data team is made up of four people (two data scientists, one senior analyst, and one data analyst), but they extend their reach by leveraging a hub and spoke model for their data center of excellence, meaning they work with analysts embedded across the business lines to scale their efforts. However, this requires an easy way to enable those teams to leverage data to answer business questions that doesn’t necessarily involve code.
- Scaling out AI deployment in a traceable, transparent, and collaborative manner: MandM’s first machine learning models were written in Python (.py files) and run on the data scientist’s local machine, and they needed a way to prevent interruptions or failure of the machine learning deployments.
In an attempt to tackle the second challenge, the team moved these .py files to Google Cloud Platform (GCP), and the outcome was well received by the business and technical teams in the organization. However, once the number of models in production went from one to three and more, the team quickly realized the burden involved in maintaining models. There were too many disconnected datasets and Python files running on the virtual machine, and the team had no way to check or stop the machine learning pipeline. They needed another solution.