en
Get Started

Machine Learning with Dataiku

Build advanced machine learning models using the latest techniques.

 

Feature Engineering

To aid in the feature engineering process, Dataiku AutoML automatically fills missing values and converts non-numeric data into numerical values using well-established encoding techniques.

Users can also create new features using formulas, code, or built-in visual recipes to provide additional signals to improve model accuracy. Once created, Dataiku stores feature engineering steps in recipes for reuse in scoring and model retraining.

 

Delivering More Models with AutoML

Automating the model training process using the best practice techniques combined with built-in guardrails allows business analysts to build and compare multiple production-ready models.

Dataiku AutoML uses leading algorithms and frameworks like Scikit-Learn and XGBoost to find the best modeling results in an easy to use interface for users across the business.

 

Notebook ML

Dataiku supports a variety of notebooks for code-based experimentation and model development using Python, R, and Scala-based on Jupyter.

Dataiku also includes eight prebuilt notebooks for data analysis including statistics, dimensionality reduction, time series, and topics modeling.

 

Time Series Visualization and Forecasting

Dataiku supports time-series data preparation, including resampling, windowing, extrema extraction, and interval extraction. Time series visualization creates line charts to display time-series data for analysis.

Data scientists can develop forecasting models using the forecasting plugin or using custom code and notebooks combined with data preparation and visualization in a project to ensure their forecast model is ready for production use.

 

Deep Learning with Keras and Tensorflow

Dataiku fully supports deep learning with Keras and Tensorflow, including training and deployment to CPUs and GPUs.

In Dataiku, deep learning models are treated just like any other model created and managed in Dataiku, making deep learning models easy to deploy as part of projects and business applications.

 

Custom Models using Python and Scala

Dataiku does not restrict you to the algorithms that are part of its AutoML capabilities — it also allows users to write custom models using Python or Scala. Custom models are first-class citizens in Dataiku.

Once deployed in a project, custom models are handled like any other model. This powerful capability to use custom-coded models opens up various use cases that may not be easily modeled by other methods (such as AutoML).

 

Training on Large Datasets with Spark

Dataiku supports model training on large datasets that don’t fit into memory using Spark MLLib or H2O Sparkling Water.

Once configured, Spark becomes available to users for model training. Depending on the configuration, users can then train models using the available algorithms in MLLib like regression, decision trees, etc., or use H2O Sparkling Water with support for deep learning, GBM, GLM, random forest, and more.

Go Further

Explore Machine Learning Features

Deep learning, Time Series, MLlib, Partitioned Models and Custom.

Features

Get a Demo

Watch our end-to-end demo to discover the platform.

On-Demand Dataiku Demo

Dig Deeper With a Sample Project

Explore this Dataiku project on building a model using 5 different libraries

Discover

Discover How Dataiku Enables Data Scientists

From code environments to AutoML and advanced Machine Learning features, and data products in production.

Discover

Get Started with Dataiku

Start an online hosted trial, download the free edition,
or compare the features of the Lite, Team, and Enterprise editions.

Let's go