H2O is a server for distributed machine learning over a cluster.

This plugins extends DSS machine learning capabilities to use the H2O library for distributed machine learning, including exclusive algorithms like Deep Learning.

The H2O plugin provides two recipes for training models and scoring datasets.

Example H2O-based Flow.

Plugin information

Last updated2016/01/15
LicenseApache Software License
Source codeGithub
Reporting issuesGithub

How to use

The plugin has two recipes

  • Build model takes a dataset as input (and optionaly a validation dataset), trains a model, and saves it in a folder. This train dataset should be clean, DSS doesn't prepare it like it does for DSS models (e.g. it does not rescale any variable or drop those that look like an id).
  • Predict takes such a folder as input, a dataset to score, and produces a new dataset with one column added: the predictions of the model. (Note that while the training set can be as big as H2O allows, the output dataset is currently exported via pandas and must fit in memory).


  • After installing the plugin, please click Administration → plugins → H2O → settings and fill the form.
  • If your H2O instance is running on a Hadoop cluster, input datasets must be of type HDFS (to be accessible to each node. Otherwise you will get a “File foo does not exist” error.).
  • If your H2O instance is running locally on the DSS host, input datasets must be of type Filesystem or Upload (this allows to locally train a model that doesn't fit in memory.)

For documentation about model parameters, see:

The parameters x, y (for supervised models) and validation_{x,y} are handled by DSS: their configuration is read from the input dataset and their path is passed to H2O. Likewise, DSS chooses a model_id.

More information about the plugin is available in the Github repository

Logo Copyright H2O