Driving Increased Efficiency in Pharmaceuticals

Data science, machine learning, and AI have the capacity to add tremendous value to the pharmaceutical industry, particularly in research and development.

The pharmaceutical industry is at a crossroads. After a century of rapid progress in the development of new medications, the discovery of new drugs has slowed down significantly and the process of developing new pharmaceuticals has become more expensive. At the same time, the regulatory environment has become more challenging, demanding far more extensive testing before drugs can go to market.

There is major incentive for drug companies to reduce R&D spending, both to free up funds for additional ventures as well as to be able to offer lower prices for their products. There are a number of ways in which sophisticated data science can help researchers save money and time in R&D, supply chain management, and manufacturing.

Watch Video

High-Value Use Cases

Identifying patients for clinical trials. Data science, machine learning, and AI can help introduce efficiencies to the clinical trial process in two ways. First, by more quickly and precisely identifying patients who would be a good fit for a particular trial via advanced analysis of medical records through natural language processing (NLP) or by exploring geographically- and symptom-distinct patients at scale. Secondly, these techniques can examine the interactions of potential trial members’ specific biomarkers and current medication to predict the drug’s interactions and side effects, avoiding potential complications.

Identifying compounds. The estimated cost for drug development by U.S. biopharmaceutical companies is nearly $ 1 billion per drug. Instead of throwing darts at the wall and hoping to land on an eventual hit—an expensive and inefficient process—pharmaceutical companies can leverage machine learning techniques to not only cull through literature and journal publications using (again) NLP but also to pre-screen for the most effective potential compounds to prioritize their time.

Pfizer’s researchers use natural language processing to analyze over a million articles in medical journals, 20 million abstracts of journal articles, and 4 million patents.

Medium, interview with Peter Henstock, Machine Learning & AI Technical Lead, Pfizer

The Future of Computational Biochemistry. Computational biochemistry allows drug-makers to cut out a significant portion of the test tube experiments. Instead, a computer simulates the protein and tests all of its atomic interactions. That analysis will yield a far narrower list of “leads” that researchers can take to the next stage of testing.

Recent experimental techniques (including parallel synthesis of drug-like compounds) has drastically increased the amount of available data for deep learning models. This makes such models adept at bioactivity and synthesis predictions, in addition to molecular design and biological image analysis. Deep learning truly has revolutionized drug discovery, as it can factor in everything from possible toxicity risks to new applications for existing drugs, which subsequently are saved the expense of a Stage 1 trial.

Supply chain and Manufacturing. Identifying the most efficient supply system by optimizing and automating steps of production will become even more important as drugs are increasingly customized to small numbers of patients with certain genetic profiles. Data science, machine learning, and AI techniques allow pharmaceutical companies to better forecast demand and to distribute products more efficiently. When it comes to manufacturing, pharmaceuticals can harness machine learning to control rising equipment maintenance costs and pave the way for self-maintenance through artificial intelligence (AI). Predictive maintenance is widely considered to be the obvious next step for any business with high-capital assets.

Dataiku for Pharmaceuticals

Dataiku helps streamline the pharmaceutical R&D process and enables robust NLP for clinical trial patient selection and identifying compounds. The platform offers a central, collaborative environment for the major steps including:

  • Pre-processing data: Cleaning usually involves deconstructing the data into words or chunks of words (tokenization), removing parts of speech without any inherent meaning (like stop words such as a, the, an), making the data more uniform (like changing all words to lowercase), and grouping words into pre-defined categories such as the names of persons (entity extraction). Manually, this process can take an inordinate amount of time, but Dataiku makes data cleaning and prep easy.
  • Vectorization (or “embedding”): After pre-processing, the non-numerical data is transformed into numerical data, since it’s much faster for a computer to operate on vectors. Dataiku leverages popular deep learning methods such as word2vec to make this a breeze.
  • Testing: Once a baseline project has been created, Dataiku enables users to test its prediction accuracy using cross-validation, a model validation technique that divides data into training and testing subsets. The model is built using the training subset and then tested on the testing subset to see if the model is generalizable.

Additionally, Dataiku helps pharmaceutical organizations nurture a culture that is receptive to data and willing to put in the process change to incorporate its value into workflows:

  • Maintaining security and data privacy regulation compliance at every step of the data pipeline. Not every user that can benefit from data-driven information can necessarily have access to the data. Pseudo-anonymization allows the use of sensitive data (e.g. transaction information) without the need to share the data with each user.
  • Productionalizable models that drive value. Unless machine learning models can be leveraged on a regular basis with real data, their insights are a curiosity at best, and could be potentially harmful if the data does not reflect the population that a drug will be used in. Dataiku provides a seamless environment for the entire data pipeline, from data cleaning to production.
  • Subpopulation analysis. This indicates whether a model is biased towards a particular population. This is especially useful when exploring clinical trials or patient responses to new drugs; if a certain chemical agent works well for some subpopulations, and poorly for others, it may require modification before it can be released to diverse patient pools.
Watch Video
  • Stable machine and deep learning technologies. By taking advantage of popular open source libraries and toolkits, Dataiku’s machine and deep learning resources provide robust and dependable insights.

Getting Started With Deep Learning

Deep learning's main advantage is that it can handle massive amounts of data - particularly unstructured - well. Getting started doesn't have to be hard by leveraging publicly available pre-trained deep learning models to begin.

Read more

Go Further

Improving Manufacturing Processes with Essilor

See how one manufacturing company, Essilor, uses Dataiku to harness large, heterogeneous datasets and develop a robust predictive maintenance solution.

Learn More

Datenanreicherung und -aufbereitung

Dataiku bietet eine nutzerfreundliche visuelle Oberfläche zur interaktiven Bereinigung und Anreicherung von Daten. Über 90 integrierte visuelle Prozessoren unterstützen die Datenaufbereitung ohne Programmieraufwand.

Learn More

Datenuntersuchung und -visualisierung

Mit Dataiku können Sie sofort Rückschlüsse aus Ihren Daten ziehen und mit Ihren Kollegen teilen, unabhängig vom Format und der Größe der Datensätze.

Learn More

Bringing AI to Marketing

The evolution of AI, machine learning, and data science have been an increasingly integral part of the transformation of many industries, and marketing is no exception.

Learn More