Azure Data Lake Store (ADLS) is an enterprise-wide hyper-scale repository for big data analytic workloads. Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics.
This DSS Plugin provides a custom DSS file system provider to read data from Azure Data Lake Store
This Plugin provides a convenient way to read small to medium scale datasets from ADLS.
To benefit from the full features of DSS, you may want to access to ADLS as an "HDFS" dataset instead,
as described in this article.
Build Flows with ADLS data.
|License||Apache Software License|
|Source code||Github repository|
To interact with the ADLS APIs, we are using here "service-to-service" authentication. In case of questions, please refer to the official Azure documentation. To be able to authenticate against the ADLS APIs, the following credentials are required:
The App will need at least read-access on the ADLS directories you want to access.
ADLS is a fully-compatible HDFS-like file system for DSS. As such, it can be used directly with systems such Azure HDInsight (which can be configured to automatically use ADLS as primary or secondary storage) or even on-premises or non-Azure managed clusters (see for instance this blogpost).
The Plugin here does not require Hadoop or Spark integration to interact with ADLS. It addresses the simple case where DSS users simply wants to connect to ADLS, browse its directories, and read data into a regular DSS Dataset for further processing. It's a "lightweight" integration for simple use cases.
The Plugin relies on the Azure azure-datalake-store Python library.
The Plugin contains a custom "file system provider" for DSS that will let you:
To use this Plugin, start by installing it, then:
Create a new ADLS Dataset
Configure your ADLS Dataset
Start using your ADLS Dataset
You have now a regular DSS Dataset pointing at files stored on Azure Data Lake Store, and can be used as a regular DSS Dataset (in Recipes, Analyses or Flows).
When using this Plugin, please remind that:
Additional instructions and source code are available in our Github repository