en
Get Started

Architecture with Dataiku

Deliver results on data science, machine learning, and AI initiatives at scale on any cloud platform or on-premise.

 

Support for Cloud or On-Premise

Dataiku can run on-premise or in the cloud — with supported instances on Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure — integrating with storage and various computational layers for each cloud.

 

 

Pushdown Execution

Many data analysis and data science systems include the infrastructure for computation — this tight coupling leads to issues when the system is insufficient for data volume or type of workload.

The solution is pushing down jobs to computation systems like Spark and Kubernetes to handle large workloads with a distributed architecture. Dataiku uses a pushdown architecture to allow organizations to take advantage of existing, elastic, and highly scalable computing systems, including SQL databases, Spark, Kubernetes, and more.

 

 

Elastic Computing with Kubernetes

Elastic computing using cloud services is often the most cost effective way to handle large and dynamic loads created by big data analysis and machine learning.

Dataiku provides a fully managed Kubernetes solution that is compatible with all of the major cloud container services — Amazon EKS, Google Kubernetes Engine (GKE), and Azure Kubernetes Service (AKS) — as well as with on-premises Kubernetes/Docker clusters.

 

CPUs and GPUs

Graphical Processing Units (GPUs) can dramatically accelerate certain types of model training, especially deep learning models.

Dataiku supports the use of both CPUs and GPUs for model training. If multiple GPUs are available, Dataiku can distribute model training workloads across the GPUs to dramatically decrease training time.

 

Reusable Components

To avoid duplicating work across projects, it is helpful if objects can be shared and reused. Dataiku has features that help both coders and non-coders maximize the reuse of their work.

In the Dataiku project flow, all visual components are reusable and portable. Individual preparation steps or entire sections of a flow (datasets and recipes together) can also be shared externally to other projects, allowing users to rename and re-tag objects in the process.

 

Extensibility with Plugins

Organizations can extend the power of Dataiku with custom plugins. The Dataiku plugin library includes over 100 plugins that enhance existing Dataiku instances, including access to new data sources, charts, programming languages, algorithms and modeling techniques, partner integrations, and more.

Learn more

Go Further

Explore Architecture Features

Hadoop, Spark, AWS, Azure and more!

Features

Get a Demo

Watch our end-to-end demo to discover the platform.

On-Demand Dataiku Demo

More Dataiku Connectors

Expand Dataiku with plugins to connect to your existing tools' stack.

CONNECTORS

Discover How Dataiku Enables Data Architects

From AI orchestration to smooth operationalization, explore how Dataiku helps data architects.

Discover

Get Started with Dataiku

Start an online hosted trial, download the free edition,
or compare the features of the Lite, Team, and Enterprise editions.

Let's go