This plugins uses the ESRI® Arcgis online ® API and allows DSS users to:
This plugin is a perfect companion for users who want to enrich their dataset for analysis or for feature engineering.
This plugin requires an Arcgis online account. Users can buy credits directly from Arcgis online.
A sample flow using ESRI geo enrichment.
|Author||Dataiku (Nicolas Gakrelidz)|
|License||Apache Software License|
First of all, open an Arcgis online account at https://www.arcgis.com/home/signin.html
Depending on the use case:
This plugin will call the Arcgis online API. You need an Arcgis online account. You may want to check the cost of each API call which is different regarding the feature used (geocoding, geo enrichment, getting the data collections). Note that this plugin is developed for data storage usages.
The API only supports numerical identifiers (object ids)
Country names should be given in ISO format (could be given by the geocoding recipe or the dataset named "Show enrichment API coverage"). Country is required for enrichment. For geocoding, it's recommended in order to improve the precision of returned results.
Dataiku DSS doesn’t automatically backup your data. As the data acquired by this plugin has a cost, we recommend that you regularly backup the data collected by the plugin.
An option is available in the enrichment recipes in order to export your data collected into the tmp folder of your DSS data dir.
When performing enrichment for several countries, please note that the data collections are different (name and content) per country. Thus, a cross-country enrichment may generate a huge number of columns.
You may choose either to "generate the output as key, value" that can be processed with a preparation script or to create an enrichment recipe per country.
For an enrichment at a specific statistical named level (ex : postcode), you may try different settings on the datacollection level name to match before enriching a large dataset.
For instance if you want to enrich data containing UK postcodes, you should first create a recipe "get content catalog for countries" and have a look at the output dataset to find the required layer_id. At that point, it's not easy to choose between GB.PoscodeSectors, GB.PostcodeDistricts or GB.PostcodeAreas. This might depend on your input data.
Thus, we recommend that you first create a small sample from the input data in order to check what is the corresponding Layer.
NB: the input postcode must be written the right format for each layer. For example, for the Layer_id GB.PostcodeSectors, the postcode DL12 8UN should be formatted as DL12 8. Don't forget that DSS Visual Prepare can help you in this matter