Dataiku AI Lab

The gap between machine learning research and how today's organizations are actually leveraging machine learning can be wide. Enter: the Dataiku AI Lab. Our team of experts builds bridges between machine learning research and practical business applications to accelerate our customers' journey to Everyday AI.

What Is the Dataiku AI Lab?

The Dataiku AI Lab is a team of researchers that develops robust and generic state-of-the-art machine learning methods to enable trustworthy, real-life applications. The team has a broad range of interests, from active learning to uncertainty estimation, data shifts to casual ML, and more. Learn about a few of the team’s current projects below.

Current Research: Addressing Dataset Shift

Machine learning model lifecycles are complex and ever evolving. When data changes or drifts, whether naturally or adversarially, it can impact a model’s performance. We need to not only detect data drift but also estimate model performance drop. Check out Dataiku AI Lab’s latest papers on this topic:

Ensembling Shift Detectors: an Extensive Empirical Evaluation – ECML PKDD 2021
Performance Prediction Under Dataset Shift – ICPR 2022

MLOps with Dataiku

Current Research: Exploring Active Learning

When unlabeled data is abundant but labeling resources are limited, active learning leverages knowledge from the model to select samples that maximize the model performance. Dataiku AI Lab has published the following papers on active learning:

Rebuilding trust in active learning with actionable metrics – ICDM 2020
Sample Noise Impact on Active Learning – IAL workshop at ECML-PKDD 2021
Cardinal, a metric-based Active learning framework – SIMPAC journal 2022
OpenAL: Evaluation and Interpretation of Active Learning Strategies – NeurIPS
Human in the Loop Workshop 2022.

We also open source our Python package, Cardinal.

ML Assisted Labeling with Dataiku

Explore More from the Dataiku AI Lab

Data From the Trenches

Read more about recent projects from the Dataiku AI Lab on Data From the Trenches, the team's technical blog.

ML Research, in Practice

The Dataiku AI Lab team discusses the hottest topics in ML research with industry experts in this webinar series.

Dataiku AI Lab Contributions

AI in Product Development and R&D With Michelin

For the past five years, Michelin has been working on incorporating more machine learning into its processes for tire design and testing. Léo Dreyfus-Schmidt, VP of Research at Dataiku and head of the AI Lab, sat down with François Deheeger, Senior Fellow AI and Data Science at Michelin to talk details.

READ ON THE BLOG

Using Causal Inference

In an MIT article, computer science researcher Jeannette Wing says that “Causality … is the next frontier of AI and machine learning.” This technical ebook contains a hearty introduction to causal inference and all its idiosyncrasies, as well as the danger of using regular machine learning to infer causal effects.

GET THE TECHNICAL EBOOK

How to Measure Dataset Similarity

Measuring similarity between two datasets is critical in many ML fields, such as detecting dataset shift and evaluating its impact on a model’s performance. This article describes various datasets’ similarity measures and how they can be leveraged for distribution shift detection and model performance drop.

READ ON MEDIUM

Meet the team

Meet Alexandre

Senior Research Scientist
Simona

Senior Research Scientist