Whataburger has an existing high-visibility dashboard that reports trends in the most recent six months of online and internal customer reviews. To capitalize on the momentum of Generative AI, the team at Whataburger wanted to use Large Language Models (LLMs) to build the dataset underlying the dashboard.
We sat down with Ramon Avendano (Sr. Manager, Enterprise Information Solutions), Chase Thompson (Manager, Data Science and Business Insights), and Margarita Shultz (Sr. Cloud Services Information Architect) from Whataburger to hear how the team did so, with the help of Dataiku. This way, they are not only able to perform sentiment analysis to see the reviews’ trends over time, but can easily share the dashboard with business executives to identify how they are performing service wise or pinpoint areas of improvement.
Getting Into the Details
Before Dataiku, the Whataburger team was using a simple bag-of-words approach to sentiment analysis. The approach involves representing text by counting the frequency of words in a document, creating a numerical vector that captures the word distribution, and using those vectors as input for machine learning models to classify sentiment. Importantly, the term bag-of-words implies disregarding the order of words, looking only at their frequency or presence.
Whataburger used this approach to assign a category to each review, assign a polarity score (positive or negative), and extract keywords to get that information up on the dashboard. This older, less sophisticated bag-of-words method was problematic for Whataburger because it looks at words in a sentence. If the review contains the word “bad,” for example — no matter the context —then the whole review was classified as negative. The bag-of-words approach also had to be manually maintained, requiring analysts to constantly revise it to ensure customer reviews were categorized correctly.
The team decided to partner with Dataiku to use LLMs to generate the same dataset that feeds into their existing dashboard. This way, they are actually able to capture the sentiment of the sentence, which is extremely notable because it’s done with no code required, making it accessible to a wider group than just technical experts. Another benefit of using the LLM approach is that Whataburger can easily add or remove review categories just by adjusting the prompt, without the need to change the pipeline or any code.
Comparing Different LLMs: Which Reigned Triumphant?
A unique aspect of this use case is that it allowed Whataburger and Dataiku to compare different LLMs (both paid and private) on Dataiku infrastructure to determine what worked best. Together, they tried a few free ones (notably Dolly and Falcon) as well as GPT.
Without fine-tuning, the two former models were not performant enough to be useful, as they could not handle complex prompt instructions. For example, they would often make up new categories. The winner was GPT, as it was able to handle all three questions in one query. Further, the categories that the other models did assign were often incorrect and frequently did not follow the proper output format.
In Dataiku, this environment is known as Prompt Studios, a full-featured development environment to test and iterate on prompts, compare prompts, compare various LLMs, and deploy prompts as recipes for large-scale batch generation. Prompt Studios enables Whataburger to check if the LLM is returning good responses as they are iterating, while their previous approach had offered no way of checking the accuracy at all.
Business Impact: LLM-Powered Dashboard for Combing Thousands of Reviews
Whataburger has over 15 million reviews total, with over 10,000 new online reviews coming in each week, predominantly from social media. Therefore, the sheer amount of data they have to review and analyze is significant. The good news is that, with Dataiku, the solution is entirely visual, code free, and easy to manage — so the Whataburger team has more time to focus on using the reviews to improve the customer experience.