Connect, cleanse, and prepare data for analytics and machine learning projects at scale.


Visual Flow

The Dataiku flow provides a visual representation of a project’s data pipeline and is the central space where coders and non-coders view and analyze data, add recipes to join and transform datasets, and build predictive models.

The visual flow also contains code-based and plugin elements for added customization and extensibility.


Connect to Leading Data Sources

Dataiku provides pre-built connectors to dozens of leading data sources both on-premises and in the cloud, including Amazon S3, Azure Blob Storage, Google Cloud Storage, Snowflake, SQL databases, NoSQL databases, HDFS, and more.


Data Preparation and Enrichment

Dataiku provides easy-to-use visual interfaces to join datasets, group and aggregate, clean, transform, and enrich data, all with a few clicks. Best of all, Dataiku automatically documents all steps in a recipe as part of the visual flow.

If you’d rather code than click, create code recipes using familiar languages such as Python, R, and SQL, developed and edited in your favorite IDE.


100 Built-in Data Transformers

The powerful prepare recipe includes 100 built-in data transformers for common data manipulations like binning, concatenation and strings manipulation, currency and date conversions, geo-enrichment, and reshaping.

Dataiku even suggests relevant functions for you based on the data’s type and values.

For custom transformations, write formulas using a spreadsheet-like expression language or Python code for ultimate flexibility.


Specialized Data Preparation

Dataiku offers a wide variety of functions and tools to parse and enrich specialized data types such as geospatial data, time series, images, and text with additional metadata and structure.

Examples include geo joins and geocoding, time series resampling, image annotation, text vectorization, and much more.

