While the masses of blogs, webinars, ebooks, and LinkedIn posts continue to center around GenAI and/or AI Governance, I realized I have been remiss to celebrate and highlight two new Dataiku Solutions dedicated to life sciences R&D that we released earlier this year!
Don’t get me wrong, GenAI and AI Governance are incredibly important topics (and rest assured, I include some stellar GenAI clickbait in the latter half of this blog), but analytics and machine learning (ML) applications continue to see rapid acceleration and mainstream use in life sciences.
About two years ago, I wrote a blog around ML in drug discovery, translational sciences, and clinical trials, enumerating how predictive analytics can:
Our AI solutions team has added two new Dataiku Solutions to our healthcare & life sciences catalog exemplifying these points and I’m overdue (and excited) to share them!
The Dataiku Solution for Molecular Property Prediction allows computational chemists to query previously studied small molecules (from public chemical databases) with known bioactivity for a given protein target. These studied molecules are used to train and build predictive models for bioactivity based on their compound structure. Novel compounds are then scored by the model for their predicted bioactivity.
This in-silico discovery process can improve the success and efficiency of the drug development cycle by prioritizing lead development candidate experimentation to help bring a stable compound quickly to preclinical and clinical testing. This solution acts as a template around extending predictive analytics to further molecular properties (like ADMET) using key features of the molecules' structures to accelerate discovery and development of new compounds in your pipeline.
Key Features:
The Dataiku Solution for Clinical Site Intelligence leverages the ClinicalTrials.gov database of nearly 500,000 global studies to easily uncover similar and/or competitive studies and clinical sites leveraged, guide site review and selection in novel studies, and provide sponsor overviews that equip clinical trial planning and operations teams with comprehensive analytics and modeling insights. This solution also acts as a powerful starting point with more opportunities stemming from further internal and external data sources integration to improve protocol design, clinical feasibility, and optimal clinical site selection strategies.
Key Features:
If you have made it this far through the “traditional analytics and ML” section and are waiting for that GenAI cherry — congrats, you’ve arrived!!
Dataiku has worked hard to match the rapid technology development around GenAI with our LLM Mesh capabilities. LLMs can extend traditional analytic and ML pipelines quite beautifully to increase the business reach and value outcomes. Common usage patterns we see there are using LLMs to create structured data points from documents/text (that can feed into and improve predictive models), generating content or self-service insights from both input data and analytic models, and creating conversational AI-powered chatbots with RAG frameworks on dedicated structured data and/or document stores.
Naturally, we jumped right on the bandwagon by extending many of the Dataiku Solutions in our catalog with GenAI features! We’ve built out extensions leveraging LLMs with Dataiku Answers both for our Molecular Property Prediction and Clinical Site Intelligence Solutions.
The clinical trials database naturally contained a large amount of text summarizing and describing trials, the contacts and site information, and the patient eligibility criteria. Using our site intelligence solution, we shared the extracted data queries along with analytics around study enrollment rate predictions to create a trial assistant chatbot.
These data points were embedded into a knowledge bank that fuels Answers, our no-code visual chatbot web-application, and voila! We now have an extension which we are evaluating for performance as a conversational experience with trial design, competitive intelligence, and site planning or selection sourced from both clinicaltrials.gov and the analytic insights provided by the Dataiku Solution for Site Intelligence.
Moving on to Molecular Property Prediction, here we had only a very well structured dataset (but note all those column descriptors stored in the Dataiku metadata).
There’s no reason we can’t have a conversation with a well described dataset is there? New capabilities we’re releasing with Dataiku Answers now allow you to do exactly that!
Instead of a knowledge bank, let’s use an SQL table for retrieval.
Now, let’s ask some questions and discover key details around bioactivity with previously studied molecules for our protein target of interest!
In my experience working across many healthcare and life sciences organizations, building AI-powered chatbots can be one of the most pragmatic ways to begin to deliver real business value in your GenAI initiatives and strategies. Examples like the applications above immediately expand the reach and utility of insights generated from valuable scientific research and clinical documents.
Of course, there are many further ways these solutions could be extended with GenAI. For example with Clinical Site Intelligence, you could generate clinical site summary reports for site selection review. With Molecular Property Prediction, scientists and biotech firms are already pushing the boundaries of GenAI to create recommendation engines for potential new molecular structures as lead candidates. When we begin to holistically embed these new “shiny” tools into our existing toolbox of robust data rules, analytics, and ML, the possibilities truly become limitless.