R is a language and environment providing a wide variety of statistical and graphical techniques: linear and nonlinear modeling, statistical tests, time series analysis, classification, clustering, etc… We explain below how Dataiku DSS integrates R.
In order to process you data in Hadoop, you can run Spark through R. Take a look at how to code within a sparkR Recipe. This how to will show you how to use sparkR in a notebook or in a recipe.
Use the distributed version of dplyr, sparklyr for data processing.
Read the full internal R API documentation.
Dataiku provides a lot of code snippets to start with: