howto

Coding SQL in Dataiku DSS

February 01, 2017

The SQL language is a domain-specific language used in programming and designed for managing data held in a relational database. It is widely used both within opensource projects and for enterprise databases. We explain below how Dataiku DSS integrates SQL.

Writing SQL

Connecting to Database

Please refer to the reference documentation to see which databases are supported by Dataiku DSS.

Where to write SQL code

It is possible to write SQL code in two places :

Using a SQL visual recipe

Dataiku DSS provides some visual recipes that can be executed on a SQL engine if the input datasets lie on a SQL database.

It is also possible to transform most of the Visual SQL recipes to SQL code that you can then modify within a code recipe. These recipes speed your development by reducing the need to type long queries, and make spotting what their purpose is very easy thanks to their icons.

  • Group recipe: This recipe allows you to compute aggregations on any dataset. This recipe can be exported to SQL code.

  • Window recipe: Compute sliding aggregations on a dataset where each line corresponds to, for example, a given time. Set up the Window Frame you want to compute by defining how many lines below and after each record you want to use, define what column to order your dataset with, and select aggregations for your features. This recipe can be exported to SQL code.

  • Join with… recipe: Enrich a dataset with information from another by joining them together using a common key. This recipe can be exported to SQL code.

  • Stack recipe: Stack multiple datasets with the same columns into a single one. This recipe can be exported to SQL code.

  • Split recipe: Split datasets into multiple smaller ones by defining some rules or filters.

Fine tuning SQL code

Working with partitions

In order to save computation time, you can use partitioned datasets to only recompute parts of your data, depending on your needs. See how to partition SQL datasets and how to handle partitions in your SQL code.

Calling SQL from other languages

When writing very long queries or dynamic queries, it can be useful to use an additional language.

Additional contents

In-cluster

The Hadoop ecosystem provides interaction with data living on HDFS though SQL like coding. Take a look at the code portal to get more information on Hive, Impala or SparkSQL.

Geographical data

Sessionization

Vertica

PostgreSQL

Redshift