Clinical Trial Explorer

Build Generative AI-driven insights from clinical trial data using natural language queries.

With both limited bandwidth for analysis and potentially limited access to data sources, clinical trial teams can inadvertently overlook important insights. By making all data available in one place and leveraging  Generative AI — in this case, large language models (LLMs) — healthcare and pharmaceutical professionals can easily query a robust collection of clinical trials data by asking simple questions like:

  • What are the companies sourcing most HIV Infection-related studies?
  • What are the sites with the highest enrollment rate in the United States?
  • What are health conditions related to conducted studies in California over the past year? 

Feature Highlights

  • More Effective Trials: With real-time responses in natural language, teams can understand clinical trial patterns faster.
  • Proactively Identify Issues: Quickly identify patterns such as spikes in health conditions related to trials to mitigate risk.
  • Uncover Patterns: Analyze massive amounts of data with simple queries to gain insights that would have been previously been out of reach.
  • Increase Collaboration and Efficiency: All members of clinical trial teams gain the ability to ask questions most relevant to their roles, increasing team effectiveness

How It Works: Architecture

A Dataiku project combines public and private clinical trial data as well as automates end-to-end data processing and metrics building. This project includes a dashboard for clinical trial teams to focus on their most common questions. 

With addition of LLMs, these teams can now interact with the trial data project by entering questions in natural language into an application. The answer (and related visualizations) will be generated instantly in the language of the user.

Upon receiving a query, the model generates a set of Dataiku instructions that is executed locally to generate the required dashboard, providing the user with a tailored response to their question. This approach allows the model to maintain constant relevancy and an unlimited scope no matter the size or complexity of the underlying data while also preserving the highest level of data privacy, as no actual data values are transmitted to the model during the process.

A containerized version of the LLM could offer stricter control over data and input.