howto

Coding R in Dataiku DSS

February 01, 2017

R is a language and environment providing a wide variety of statistical and graphical techniques: linear and nonlinear modeling, statistical tests, time series analysis, classification, clustering, etc… We explain below how Dataiku DSS integrates R.

Data processing

R code can be written in code Recipes or Notebooks.

In-memory

In-database

In-cluster

In order to process you data in Hadoop, you can run Spark through R. Take a look at how to code within a sparkR Recipe. This how to will show you how to use sparkR in a notebook or in a recipe.

Use the distributed version of dplyr, sparklyr for data processing.

Reference documentation

Read the full internal R API documentation.

Starter code

Dataiku provides a lot of code snippets to start with:

  • Whenever you are coding, there is a Sample Code button on the top right of the editor with a list of code snippets. You can also add your own!
  • Upon notebook creation, you can use a predefined Notebook with template code or create your own.

More Advanced R topics

Environment

Additional contents