Heetch + Dataiku: Developing an Elastic AI Strategy

Heetch uses Dataiku and Kubernetes to treat large quantities of data while maintaining performance and controlling costs, ensuring a positive return on investment (ROI) and smooth execution on hundreds of data projects conducted throughout the organization.

The past few years in AI have been marked by the great migration from on-premises data centers to public cloud infrastructure. Organizations are drawn to the cloud’s promise of pay-as-you-go, usage-based pricing and capacity easy to scale up and down based on needs; however, in practice, leveraging big data with good performance and at reasonable costs is easier said than done.

In reality, processing large amounts of data in pipelines that deliver real business value via data science, machine learning, and AI projects requires not only serious computational power, but also optimized resource consumption and isolated environments for development and production. On top of all of this, businesses need to put best practices in place that drive efficiency and cost monitoring — clearly, managing all of these moving parts can get complex quickly for organizations of any size, digital native or not.

In practice, leveraging big data with good performance and at reasonable costs is easier said than done."

Introduction & Challenges

Heetch, a French company founded in 2013, has grown quickly to 250 employees united around one goal: making mobility more accessible by offering a smooth user experience. The company has gathered troves of data from drivers, passengers, global operations, and more since its launch, yet they struggled to scale their ability to actually leverage that data.

Five years in, data warehouse costs were spiraling out of control, and performance was suffering as the amount of data grew. The company needed to find a solution that would allow anyone across the organization to work with large amounts of data while also ensuring optimized resource allocation. In 2019, Heetch chose Dataiku as their single platform for building data pipelines and processing raw data, paired with Looker for the seamless visualization and exploration of those flows.

The Solution

In addition to serving as a platform where Heetch could centralize knowledge and best practices, the team also leveraged Dataiku and Kubernetes to address their primary paint point: leveraging data while maintaining good performance and reasonable costs.

Thanks to Dataiku’s native integration with major cloud vendors’ managed Kubernetes services, Heetch was able to integrate their AWS EKS cluster very quickly and saw a drastic increase in value from their data. Teams can now easily offload resource-intensive workloads, like big Python and R jobs, as well as leverage the EKS cluster to distribute compute and run Spark jobs. Using Dataiku means this functionality is available and accessible to any Heetch employee, no matter his or her knowledge in distributed computing — Dataiku abstracts away the complexity.

See in action: Sparks on Kubernetes in Dataiku

However, unlimited power does not mean the organization wanted unlimited spending and surprise AWS bills — calculating ROI on data projects means including both hardware and software costs, so they also wanted to leverage Dataiku for resource consumption optimization. Heetch therefore put in place the ability to differentiate CPU-vore clusters from Memory-vore clusters to optimize user experience and computation speed depending on the type of job launched.

Heetch now has a unified data platform on which different people work in parallel on data projects."

Long-Term Results

Since moving to Dataiku, Heetch saw both a decrease in frustration from the previous bottleneck caused by the data warehouse as well as a notably faster time to market for Heetch data projects coupled with a greater ROI thanks to cost control. The team has launched hundreds of data projects with Dataiku, ranging from features stores and ETL projects to real-time fraud detection, churn prediction, route optimization, passenger/driver segmentation, pricing models, marketing attribution tracking, and more.

Ultimately, Dataiku has allowed Heetch not only to transform its ability to leverage elastic resources, but to uplevel its overall AI maturity:

  • Heetch now has a unified data platform on which different people work in parallel on data projects. From data engineers optimizing flow execution to data scientists working on advanced machine learning and deep learning models and operational people gaining autonomy on data access and processing through self-service analytics.
  • Data is accessible thanks to Dataiku’s abstraction layer and leverageable with appropriate infrastructure (EKS) that is robust, elastic, and scalable.
  • Collaboration and knowledge sharing has been drastically improved, which was especially important for Heetch during remote working in 2020. 
  • Operationalization has benefited the entire organization with more than 100 projects in production on the automation node, running on a regular basis and driving daily business processes.

非公開: Full Elasticity as the Future of AI

There is no question that elasticity, including on-demand compute resource management, is the future of Enterprise AI.

Read more

Go Further:

Architecture with Dataiku

Deliver results on data science, machine learning, and AI initiatives at scale on any cloud platform or on-premise.

Learn More

Dataiku for Tech Experts

Quick experimentation and operationalization for machine learning at scale.

Learn more

2021 AI Trends: Driving Agility and Efficiency in the Enterprise

This non-technical webinar goes in-depth on the trends that will continue to dominate Enterprise AI, particularly when it comes to organizational changes in businesses.

Learn More

On-Demand Dataiku Demo

Dive into the data workflow and explore the powerful features of the platform enabling enterprises to build their own path to AI.

Learn More