Any Dataiku DSS tool, whether it is visual data manipulation recipes, a code recipe, guided machine learning or data visualizations, can be run using an in-cluster engine. Dataiku DSS leverages various technologies (Hive, Impala, Spark, MLlib, H2O…) to achieve this.
Big Data Architecture
Dataiku DSS Architecture
Pushing computation to your data
Machine learning engines
Machine learning algorithms can be run distributed both for training and for scoring using these engines:
Vertica Advanced Analytics
Optimize your Spark jobs
- Reduce Spark engine overhead and unecessary intermediary datasets write thanks to the Spark pipelines
- Become a Spark master by learning the Spark tips and troubleshooting
Working with partitions
Optimize the speed of your computation with partitions on HDFS.