Deep learning, Time Series, MLlib, Partitioned Models and Custom.Features
Machine Learning with Dataiku
Build advanced machine learning models using the latest techniques.
To aid in the feature engineering process, Dataiku AutoML automatically fills missing values and converts non-numeric data into numerical values using well-established encoding techniques.
Users can also create new features using formulas, code, or built-in visual recipes to provide additional signals to improve model accuracy. Once created, Dataiku stores feature engineering steps in recipes for reuse in scoring and model retraining.
Delivering More Models with AutoML
Automating the model training process using the best practice techniques combined with built-in guardrails allows business analysts to build and compare multiple production-ready models.
Dataiku AutoML uses leading algorithms and frameworks like Scikit-Learn and XGBoost to find the best modeling results in an easy to use interface for users across the business.
Dataiku supports a variety of notebooks for code-based experimentation and model development using Python, R, and Scala-based on Jupyter.
Dataiku also includes eight prebuilt notebooks for data analysis including statistics, dimensionality reduction, time series, and topics modeling.
Time Series Visualization and Forecasting
Dataiku supports time-series data preparation, including resampling, windowing, extrema extraction, and interval extraction. Time series visualization creates line charts to display time-series data for analysis.
Data scientists can develop forecasting models using the forecasting plugin or using custom code and notebooks combined with data preparation and visualization in a project to ensure their forecast model is ready for production use.
Deep Learning with Keras and Tensorflow
Dataiku fully supports deep learning with Keras and Tensorflow, including training and deployment to CPUs and GPUs.
In Dataiku, deep learning models are treated just like any other model created and managed in Dataiku, making deep learning models easy to deploy as part of projects and business applications.
Custom Models using Python and Scala
Dataiku does not restrict you to the algorithms that are part of its AutoML capabilities — it also allows users to write custom models using Python or Scala. Custom models are first-class citizens in Dataiku.
Once deployed in a project, custom models are handled like any other model. This powerful capability to use custom-coded models opens up various use cases that may not be easily modeled by other methods (such as AutoML).
Training on Large Datasets with Spark
Dataiku supports model training on large datasets that don’t fit into memory using Spark MLLib or H2O Sparkling Water.
Once configured, Spark becomes available to users for model training. Depending on the configuration, users can then train models using the available algorithms in MLLib like regression, decision trees, etc., or use H2O Sparkling Water with support for deep learning, GBM, GLM, random forest, and more.
Get Started with Dataiku
Start an online hosted trial, download the free edition,
or compare the features of the Lite, Team, and Enterprise editions.