Western Digital: Smarter Email Categorization With NLP

Western Digital built an NLP-based system in Dataiku to categorize and better understand their emails for more efficiency (100 employee hours saved per month), reduced response time and higher customer satisfaction


emails per week auto extracted, analyzed, & labeled


accuracy in email categorization


reduction in email traffic thanks to actions learned from data insight


Before: Manual & Time-Intensive Processes

The logistics control tower team at Western Digital uses a global personal distribution list (PDL) email address for both internal and external communications. People trigger emails to this PDL for a range of topics, from shipping reports to shipment location queries, loss and damage, delivery and invoicing issues, and more. The traffic is therefore around 8,000-10,000 emails per week on average, and it gets even higher at the end of quarters. 

Unfortunately, such massive email traffic created issues for Western Digital, including:

  • Massive time expenditure required to go through all emails, which also meant low working efficiency.
  • Neglected or delayed emails, sometimes on critical messages.
  • Limited ability to assess email response time.
  • Limited ability to prioritize follow-up actions.

Previously, the logistics control tower team at Western Digital tried to analyze those thousands of emails manually, taking two to three employees over two weeks to sort, categorize, annotate, and evaluate.

With Dataiku, they found a better way, automatically sorting the emails by topics (category) with high accuracy. From there, once they understand the email category and sender profiles, they could identify hot and critical issues faster as well as take corrective actions, ultimately reducing response time and raising customer satisfaction. 

Today, the natural language processing (NLP)-based email categorization system Western Digital built using Dataiku is:

  • Auto extracting, analyzing, and labeling 10,000 emails per week.
  • Achieving categorization accuracy of more than 80%.
  • Reducing email traffic by 17% thanks to actions taken from data insight (that’s 100 employee hours saved per month).
  • Cutting email response time down by 20+ hours.

Here’s a more detailed look at how they built this solution in Dataiku.

After: Automated Process for More Control & Happier Customers

The logistics control tower team at Western Digital built the solution to their email challenge in collaboration with data scientists in the advanced analytics and logistics teams. The all-in-one solution for text analysis and data visualization allows the team to sort emails by topics, quantify the average response time spent on each category, and identify major internal and external service requesters per customer profile. 

Practically, in Dataiku, they leveraged several key capabilities as well as plugins and connectors that enabled them to build and maintain their solution faster. For example, even though annotating large datasets is challenging and time consuming, the ML-assisted labeling plugin in Dataiku made it seamless for multiple teams of subject matter experts to collaborate. 

In addition, Western Digital used built-in NLP preprocessor library functions in Dataiku like tokenize text, simplify text, clear stop words, etc. These functions normalized the text data with a few clicks of a button. Other Dataiku plugins such as named entity recognition, which comes with pre-trained Spacy models, were helpful in extracting insights and understanding the data. These readily available features reduced the development time overall and let data scientists focus on analyzing data.

Though data scientists worked on the project, the team also sped up development by leveraging Dataiku AutoML features to build and compare models quickly. Western Digital also used Dataiku MLOps features to configure scenarios, running the data extraction and model inference every week. This ultimately saved several weeks’ worth of development time to build an inference pipeline.

From an end-user perspective, Dataiku allowed the team at Western Digital to easily build visualizations for the extracted metrics. Plugins such as the Tableau hyper format, which allows for the easy export of data into the Tableau server, were a bonus and will bring continuous improvement and flexibility into the future.

SLB: Putting Data & AI to Work for Energy

SLB partners with Dataiku to drive improvements and save millions of dollars through the use of data and AI across the business.

Read more

Go Further:

GRDF: Predicting Risk in Construction Sites

Dataiku is GRDF’s handy toolbox for code management, collaboration, and more.

Learn More

Michelin: Democratizing AI for Improved Industrial Performance

Michelin uses Dataiku to democratize AI, improving quality, maintenance, machine availability, supply chain, energy consumption, and more.

Learn More

Novartis: Streamlining Analytics & AI Across the Organization

Novartis moved from repetitive manual calculations in Excel to informed decision making grounded in accurate and real-time data with Dataiku.

Learn More

Solvay: Real-Time Production Cost Monitoring

Solvay uses Dataiku to monitor and improve soda ash production across 6+ plants, reducing production costs as well as energy consumption to pave the way for a sustainable business.

Learn More