Classroom Training

Learn from the very best, in person. Sign-up to the live classroom sessions of your choice by filling out the associated forms below!

Dataiku offers instructor-led trainings to help you get the best out of Dataiku DSS, from building your first data science workflow to automating a complete predictive application.

Upcoming Sessions

Start Date Course Duration Location

Trainings Content

DSS 101 : Dataiku DSS Fundamentals

Objectives

DSS 101 covers the fundamentals of building an application in Dataiku DSS : loading data sources, cleaning and enriching, machine learning and visualization.

The course's objective is to :

  • [+] Provide a first understanding of Data Science and Machine Learning techniques
  • [+] Illustrate techniques with concrete use cases and walk through a typical data project
  • [+] Provide a hands-on methodology to build data products in DSS

Audience : beginner to intermediate users with a background in engineering / programming or business analytics.

Contents

  • [+] Data Science - Theory and Concepts : new data, new technology, new algorithms, new deliverables, project workflow
  • [+] Dataiku DSS fundamentals : creating value with data-driven applications
  • [+] Main DSS concepts : projects, collaboration, datasets, analysis, flow, recipes, notebooks, insights
  • [+] Building a predictive application with DSS, step-by-step methodology:
    •       - Creating a dataset
    •       - Exploring and visualizing data
    •       - Working with samples
    •       - Creating and deploying preparation scripts
    •       - Feature Engineering
    •       - Creating and deploying prediction models

Practical Example : using a real life dataset to build a prediction model and a dynamic webapp

Duration : 1 day

Pricing : 500 euros

DSS 201 : Advanced operations in DSS

Objectives

DSS 102 covers the techniques required to install, manage and build Data Products in Dataiku DSS

The course's objective is to :

  • [+] Walk through the software's installation process and administration tools
  • [+] Perform complex operations in DSS : data sources, partitioning, model refinement...
  • [+] Improve productivity and proficiency with DSS

Audience : intermediate to advanced users with a strong background in engineering / programming.
DSS 101 is required

Contents

  • [+] DSS Installation : installation scripts, upgrade procedure and commonly encountered issues
  • [+] DSS Architecture : options for sampling and executing computations and preparation scripts
  • [+] DSS Administration : access rights, security and backup
  • [+] Connecting to Data Sources : SQL recipes, APIs, custom python scripts...
  • [+] Partitioning, dependencies and schema sync
  • [+] Machine learning model refinement tour
  • [+] Reporting - project documentation, notebook, insights - and debugging tools
  • [+] Building web applications in DSS
  • [+] Tips and Tricks : improving productivity with DSS

Practical Examples :

  • [+] Importing data and building a clustering model with python
  • [+] Working with SQL databases and creating custom variables
  • [+] Building and running partitioned recipes

Next sessions : Date 1, Paris, Date 2, Paris

Duration : 2 days

Pricing : 1 000 euros

Hadoop and Spark in DSS

Objectives

Working efficiently with large datasets requires to use large-scale, distributed technologies such as Hadoop or Spark.

The course's objective is to :

  • [+] Understand the "Big Data" ecosystem and the technologies associated to it
  • [+] Show how to use Hadoop and Spark directly from within DSS
  • [+] Make you ready to build large-scale data science workflows on your own using DSS and these technologies

Audience : intermediate to advanced users with a strong background in engineering / programming. General knowledge of distributed processing technologies and practical knowledge of Python, R and SQL are prefarable but not required.

Contents

  • [+] Introduction to the Hadoop and Spark ecosystems
  • [+] Connecting DSS to a Hadoop cluster and configuring connectivity with Spark
  • [+] Introduction to the different ways to interact from DSS with Hadoop and Spark (datasets and recipes)
  • [+] Using visual tools (data preparation, joins, agregations...) with Spark or Hadoop engines
  • [+] Interactive analysis of large datasets from a Jupyter notebook (PySpark and SparkR)
  • [+] Using SQL-based systems (Hive, Impala, SparkSQL)
  • [+] Using MLLib to train machine learning models on Spark

Practical Examples :

We'll use real-world datasets (retail and healthcare) throughout this course to build complete data science workflows using Hadoop and Spark in DSS.

Duration : 2 days

Pricing : 1 000 euros