This plugin provides a dataset and a recipe to download scrapped web data from into DSS lets users automatically turn Web pages into data, thanks to its powerful and very easy to use scraping and parsing technology.

This plugin offers advanced connectivity to scrappers. By using the plugin, you can easily retrieve data hidden in web pages, or enrich existing datasets with external web data.

The DSS plugin can:

  • Retrieve data from a single API using the dataset
  • Bulk-enrich a dataset containing URLs, repeateadly getting data from an extractor on each URL, using the recipe

Plugin Information

Version 1.0.1
Author Dataiku
Released 2016/01/10
Last updated 2016-06-28
License Apache Software License
Source code Github
Reporting issues Github

How To Use

The plugin offers connectivity thanks to 3 different components:

Dataset for single API

The dataset is the simplest integration. It calls the once and populates a dataset with the results.

Use this to fetch structured data from a single page.

Start by defining your extractor in, then create the dataset and paste the API URL into the dataset configuration.

Recipes for bulk enrich

The enrichment recipes are used to enrich a dataset: for each row of the input dataset, this recipe reads the URL in a given column, calls’s API with it, and writes the results to the output dataset. This way of repeatedly calling the API to retrieve data is sometimes called “Bulk extract” or “Chain API” on website.

Start by defining your extractor on one example page in, then create the recipe.

A great way to use this is together with the editable datasets in DSS.

The “Connector” recipe is also used for bulk enrich. To get new data in, one has the choice between “Magic”, “Extractor”, “Crawler” or the more advanced “Connector”. This recipe allows to request an API created with the last one.

Logo Copyright

Get the Dataiku Data Sheet

Learn everything you ever wanted to know about Dataiku (but were afraid to ask), including detailed specifications on features and integrations.

Get the Data Sheet