Hadoop, Spark, AWS, Azure and more!Features
Architecture with Dataiku
Deliver results on data science, machine learning, and AI initiatives at scale on any cloud platform or on-premise.
Support for Cloud or On-Premise
Dataiku can run on-premise or in the cloud — with supported instances on Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure — integrating with storage and various computational layers for each cloud.
Many data analysis and data science systems include the infrastructure for computation — this tight coupling leads to issues when the system is insufficient for data volume or type of workload.
The solution is pushing down jobs to computation systems like Spark and Kubernetes to handle large workloads with a distributed architecture. Dataiku uses a pushdown architecture to allow organizations to take advantage of existing, elastic, and highly scalable computing systems, including SQL databases, Spark, Kubernetes, and more.
Elastic Computing with Kubernetes
Elastic computing using cloud services is often the most cost effective way to handle large and dynamic loads created by big data analysis and machine learning.
Dataiku provides a fully managed Kubernetes solution that is compatible with all of the major cloud container services — Amazon EKS, Google Kubernetes Engine (GKE), and Azure Kubernetes Service (AKS) — as well as with on-premises Kubernetes/Docker clusters.
CPUs and GPUs
Graphical Processing Units (GPUs) can dramatically accelerate certain types of model training, especially deep learning models.
Dataiku supports the use of both CPUs and GPUs for model training. If multiple GPUs are available, Dataiku can distribute model training workloads across the GPUs to dramatically decrease training time.
To avoid duplicating work across projects, it is helpful if objects can be shared and reused. Dataiku has features that help both coders and non-coders maximize the reuse of their work.
In the Dataiku project flow, all visual components are reusable and portable. Individual preparation steps or entire sections of a flow (datasets and recipes together) can also be shared externally to other projects, allowing users to rename and re-tag objects in the process.
Organizations can extend the power of Dataiku with custom plugins. The Dataiku plugin library includes over 100 plugins that enhance existing Dataiku instances, including access to new data sources, charts, programming languages, algorithms and modeling techniques, partner integrations, and more.Learn more
Get Started with Dataiku
Start an online hosted trial, download the free editionGet Started