Explore previously studied and novel molecular structures (SMILES) for drug candidates. Easily query public chemical databases with ChEMBL and PubChem APIs. Build the solution flow by selecting a preferred chemical database, then query by target protein and specify feature generation and modeling options for bioactivity prediction.
Power recipes in the flow using Python with open source chemoinformatics libraries. Create molecular descriptors and fingerprint features from SMILES chemical structures with RDKit and pre-trained transformer language models like ChemBERTA with HuggingFace.
Gain deeper understanding of the molecular structure space with dimension reduction methods like t-SNE, PCA, and clustering, then train ML models to predict the bioactivity of small molecules for a given protein target.
Use dynamic, interactive dashboards to understand previously studied and novel molecules. Understand chemical properties and molecular descriptors with interactive visualizations, analyses, and tables. Filter to review individual molecule summaries, and screen leading candidates in-silico through the Dataiku App.
Assess novel small molecules and gain valuable insights by computing and visualizing molecular descriptors and fingerprints. Use these insights and the trained bioactivity prediction model to score new molecules, and find similar molecules in previous experiments.
Gain full flexibility by adjusting to specific research fields with a composable and extendable solution. Adapt and expand research to gain new insights and make progressively better molecular predictions.
The Dataiku Solution for Molecular Property Prediction helps answer a broad range of questions like:
Improve the success and efficiency of the drug development cycle by prioritizing lead development candidate experimentation. Accelerate the process of predicting target protein bioactivity in novel small molecules. Leverage AI to build a pipeline of potential drug solutions to help bring a stable compound quickly to preclinical and clinical testing.
A composite organization in the commissioned study conducted by Forrester Consulting on behalf of Dataiku saw the following benefits:
reduction in time spent on data analysis, extraction, and preparation.
reduction in time spent on model lifecycle activities (training, deployment, and monitoring).
return on investment
net present value over three years.