Machine Learning
Concepts
Experiment
Guided Machine Learning
Getting started
The interface of Dataiku DSS was designed to make creating potent machine learning models easy. Clicking through the interface is enough for most use cases, whether you are an expert Data Scientist or a beginner! Discover below what you can do in the visual interface:
- Follow Machine Learning Basics to create your first predictive model.
- Learn how to interpret a regression model.
- For clustering applications, see how to identify clusters and name them.
Batch scoring
- Learn how to save a model to the flow and use it to score another dataset.
- A Dataiku model consists of a whole pipeline that combines data preparation, feature handling and a ML model. This means that you can directly score raw data with a Dataiku model, without reimplementing data cleaning nor feature preprocessing.
The lifecycle of a model in production
The models saved in the flow can be retrained, versioned and monitored. These capabilities are critical to all predictive applications used in production. See here how to handle the whole lifecycle of a model.
Machine learning engines
Dataiku DSS lets you use multiple machine learning engines within its guided machine learning framework.
Deep Learning
Deep learning offers extremely flexible modeling of the relationships between a target and its input features, and is used in a variety of challenging applications, such as image processing, text analysis, and time series, in addition to models for structured data.
Custom models
The python and the MLlib machine learning engines allow you to define custom models by adding your own code while still taking advantages of the Dataiku DSS visual interface for machine learning.
Score in real time through a REST API
A saved model can be deployed into a Dataiku DSS API node to query a prediction on new data.
The API node provides all the necessary features for scoring in production:
- High availability and scalability for scoring new records.
- Model versioning and rollback using model packages.
- The ability to score in realtime, even with models trained using a distributed engine
- And more advanced capacities such as enriching queries in real-time or handling custom models.
- Refer to the Automation and real-time scoring portal for more information.
Prediction Examples
Churn
- SAMPLE PROJECT: Predict the risk of customers churn
- SAMPLE PROJECT: Uplift modeling
Revenue forecast
- SAMPLE PROJECT: Forecasting sales
- SAMPLE PROJECT: Predict Customer Lifetime Value
Energy & environment
SAMPLE PROJECT: Model energy consumption and predict power peaks
Clustering Examples
Geographical
SAMPLE PROJECT: Geographic clustering based on POIs
Full Control With Code
Dataiku DSS allows users to code everything by themselves in Python, R, Scala, SQL or Shell. See the portal on coding in Dataiku DSS for more information.
- SAMPLE PROJECT: Build a model using 5 different ML libraries
- HOWTO: Mining frequent itemsets in R
- HOWTO: Tuning XGBoost models in DSS