The NHS: Scaling AI for Population Health

The NHS uses Dataiku for MLOps, model monitoring, and more.

Guests to Dr. Joe Zhang’s talk at the London roadshow of the Dataiku 2023 Everyday AI Conferences were treated to a comprehensive look at how he and his colleagues at the National Health Service (NHS) in England have been rapidly prototyping and successfully implementing AI platforms and solutions. Zhang, an AI Platform Owner and Data Scientist with the NHS, is also a healthcare AI academic at Imperial College London.

→ Watch the Full Session Here

Everything Starts With Data

“In the last year, there were 90,000 data flows across NHS England, originating from 7,000 healthcare providers flowing out to hundreds of entities who are extracting, maintaining, and using patient data,” he began. Zhang spoke to the dramatic scale of the ecosystem that the NHS is responsible for coupling with scaled AI solutions.


This is a vast and incredible ecosystem, and a testament to the advances of digital maturity that has been achieved in the NHS in the last 10 years. This would be unimaginable 20 years ago.


Combined with sophisticated advancements in clinical algorithmic research, the NHS is stepping closer and closer to what Zhang called, “the vision of personalized healthcare that we’ve been dreaming about for at least a decade.” This vision is clear: Instead of merely describing and detailing the prevalence of disease in a population, they can start to predict disease for individuals. “For any individual for any given point in time, we can identify what the optimum therapy and intervention for that patient might be.”

In short, this means the NHS is moving towards a future that appears very promising. As the healthcare ecosystem seems to be almost always resource-constrained, the organization is in the early stages of forecasting and allocating resources to best meet patient demand, as well as better target preventative healthcare with greater precision.

Silos Within Silos: Working Through Fragmentation

But these lofty goals won’t be met without difficulty, which Zhang further detailed. “The data flows are rich in data generation,” but, “if you’re trying to do something as big as population health AI and scaling it in the NHS, you’re dealing with thousands of different organizations with their own data control and competing hurdles of perfectly valid interests.” The fragmented nature of NHS data, though there’s a lot of it, leads to complex challenges of building united infrastructures or comprehensive end-to-end processes.

A lot of the clinical AI work is focused in academia — that’s a far distance away from where the action usually happens: at the front lines of clinical care.

Working the Problem: Vertically Integrating AI

The massive challenge in front of the NHS is being solved by a number of teams working on a number of different solutions at the same time, some of which are borrowed from completely separate and seemingly unrelated industries. “We’ve taken a leaf out of the enterprise approach with vertical integration,” Zhang said. He described cutting-edge AI and machine learning (ML) work happening in the field of radiomics. Different from radiology, radiomics is a quantitative method of medical imaging that uses analysis to gain insights from medical images. “We’ve taken all of the elements of the AI lifecycle and brought them together under one roof: everything from defining the need, to developing the products, to collecting the data, to the model development, deployment, post-deployment, and production lifecycle — and measuring impact.”

Choosing the Appropriate Test Bed

To give this holistic data management vision a strong foundation, Zhang and his team needed to select the appropriate geographical area in which to test their models, and they found their answer: northeast London. “We chose northeast London for several reasons. They have very high data maturity, strong and innovative leadership, and a strong data engineering department.”

Zhang described the area’s infrastructure built in Snowflake, where data scientists can ingest linked data from two million patients. “We brought the clinical AI research into the ICB (integrated care board) region and had a local instance of Dataiku that could read the raw data that was being ingested into Snowflake.”

We would use it for data wrangling, for prototyping, for product development, but more importantly, as a deployment platform where we had easy-to-use MLOps and a production infrastructure around the models.

The commissioning bodies in the NHS are the ICBs — they’re the ones who pay for clinical services. This group includes population health pathways and integrated care pathways. Zhang and his team had successfully put a strong foundation in place. “This puts everything we’re doing close to the intervention arm: identifying need, or proactively anticipating need, and feeding predictions back to those who need it.” In addition, he added that this entire infrastructure setup strictly conformed to the NHS England secure data environment compliance rules and regulations.

Integrating AI Into Multivariate Hospital Load Forecasting

Zhang then described what he called the “bread and butter” to those with even basic knowledge of the challenges of the healthcare industry: hospital forecasting. This type of forecasting has been in use in the NHS for about two decades, but has been traditionally linked to recurring seasonal patterns, namely the flu. The increase in season flu cases is so reliable an indicator that it is used to estimate hospital traffic on an annual basis. The transmission of respiratory diseases causes the increase in emergency demand and elective cancellations, which in turn cost the NHS money.

“This all went out the window after COVID, because suddenly we had multiple viruses in the community with competing transmissions and new epidemiological patterns,” Zhang said. Yet even with these new variables in the picture, their AI modeling remained a reliable source of hospital forecasting because of the consistency and accuracy of the data flow. “We had linked and low-latency data flows into a platform where we could employ and deploy deep learning. We used multivariate forecasting with transformer models.”

The NHS was successfully able to use primary care data, including visit data, to track what happened, when it happened, symptoms, disorders, and diseases in patients, all stratified by how vulnerable they were to hospital admission, along with other environmental variables. The results spoke for themselves, even in the middle of a pandemic.

We could predict what would happen in hospitals two weeks from now, and even a month from now, and what would come through the front door. 

Getting Ahead of Major Cardiovascular Events

Zhang opened the next section of his talk with a bold statement. “Here, we can predict who is going to get a heart attack, a stroke, or renal failure over the next three years.” He displayed a model that, because it’s been optimized with AI and ML, performs with even higher precision than the standard NHS models for cardiovascular risk. Trained using AutoML, the model is fully focused on the patient data it utilizes and is fully supported by data engineering. “We deployed this with a rudimentary but functional MLOps to monitor the data quality, the coding patterns, and the distribution of the data coming in for drift, and we’re also able to monitor predictive performance not just from the population, but in subgroups of the population across the geography in different locations.” he said.

He emphasized the partnership with Dataiku that helped to make these models possible. “This was all put into one Dataiku environment in production and the outputs we could use to predict risk at a practice level, but also at an individual level. Because we have access to the health record,” he added, “we can identify methods of treatment optimization.”

Population Healthcare With Precision

Another use case that Zhang indicated was the NHS efforts around anticipating the trajectory and outcomes of patients with multi-morbidity in the population. The NHS used a dataset that included 20,000 disease concepts mapping to the recorded lifetimes of two million individuals in the region. Multi-morbidity describes a condition in which a patient has two or more long-term chronic illnesses. With this enormous data set, his team set about using ML to divide the disease concepts into those with similar meaning, then subdividing them into groups that indicate different patterns of progression, but also different outcomes.

“We can look into these groups and see how these groups have a trajectory of disease in multi-morbidity and divide them by key demographic groups, like ethnicity and deprivation. We can then use this data to model the onset not just by population, but also by region and key population subgroups that we can target with preventative healthcare interventions.” Take a closer look at this kind of analysis with the Dataiku solution for Social Determinants of Health.

This is an example of what happens when you take very good data and good infrastructure, and bring the type of work that is usually found in clinical research in close proximity to population health and potential interventions.

Looking Ahead: The London Health Data Service

Zhang pointed to another encouraging local insight regarding the region where his team is currently focusing their AI efforts. “Northeast London hosts what we call the London Health Data Service, which will come into operation at the end of this year or early next [2024 at the time of this writing].” The service will ingest data from almost 2,000 healthcare providers with a population of ten million patients.

This news represents the next phase in NHS analysis and insight. Powered by low latency data and supported by a strong set of data governance agreements with providers, as well as widespread public outreach, the service has been developed by a well-staffed expert team and stakeholders across London. “The idea is to have one central data layer to provide data as a service not just to the five ICBs across London, but to the subnational secure data environment, which will be one of the main nodes in the country for clinical and life sciences research going forwards.”

Personalized Healthcare at Scale

Even with an extremely positive outlook and efforts already very much underway, Zhang believes that there are areas that can still be optimized, particularly around sustainability and regulation. “We’ll be working with regulators to make sure that we can still produce models responsibly to population needs.”

He believes that responsibility goes beyond regulation, however. “It means embedding ethical AI and fairness concepts into processes that we can deploy on the ground. It’s a strong production lifecycle where we can monitor models, their safety, and their impact and fairness, and respond to them in an effective fashion.” By keeping a staff of experts from recognized academic centers of expertise in the city, the diversity of viewpoints should help to ensure quick, intelligent monitoring and response.

The NHS can potentially build up a library of these models which are trained by the NHS, developed for the NHS, on NHS data, for NHS patients, that we can expand across the region, but also take to other regions, adapt them for that population, and deploy them there.

The ongoing validation of the models they create can only be a benefit over time, not just for population health, but also for further advocacy of AI and ML technology in large populations.

Zhang fully recognizes the success the NHS has had in recent years, but also knows what lies ahead for them. “A promising road,” he said, “but a long way to go.”


The following Q&A occurred during the Everyday AI Conference in London.

Watch Video
Interview of Dr. Joe Zhang, AI Platform Owner and Data Scientist at the NHS

Watch The Full Session


Novartis: Streamlining Analytics & AI Across the Organization

Novartis moved from repetitive manual calculations in Excel to informed decision making grounded in accurate and real-time data with Dataiku.

Read more
Watch video

Mount Sinai: An Enterprise Data Blueprint for Success

Mount Sinai has pivoted its processes to create more holistic methods which enable lasting results and life-long, positive impacts in patients’ lives. At the core of this transformation? Dataiku.

Learn More

Malakoff Humanis: Improving Customer Relations With the Power of NLP

To address their growing challenges in keeping up with customer demands and providing quality customer service, Malakoff Humanis turned to Dataiku’s Deep Belief program and collaborated with Dataiku’s data scientists on two advanced natural language processing (NLP) projects.

Learn More
Watch video

Pfizer: Everyday AI Is A Journey,
Not A Destination

Debbie Reynolds, VP Enterprise Data Solutions and Engineering at Pfizer, discusses how the company has been able to put data at the core of everyday business decisions.

Learn More

Santéclair: Detecting Fraudulent Claims More Effectively

Santéclair uses Dataiku to enable fraud detection teams to target actual fraud cases 3x more effectively, saving money for both the company and its customers.

Learn More