Named Entity Recognition

This plugin provides a recipe to recognize Named Entities in text data

Plugin Information

Version 1.3.2
Author Dataiku (Alex COMBESSIE, Du PHAN, Hicham EL BOUKKOURI)
Released 2018-09
Last updated 2020-11
License Apache Software License
Source code Github
Reporting issues Github

This plugin provides a tool for extracting Named Entities (i.e. People names, Dates, Places, etc) which can be useful for extracting knowledge from your texts.

The plugin comes with a single recipe that extracts entities using one of two possible models:
– SpaCy: a faster but slightly less precise model. Another advantage of SpaCy is its support for many languages.
– Flair: a slower but more precise model for Named Entity Recognition.

 

How to use

Named Entity Recognition recipe

This recipe extracts named entities such as LOC (localisation) and PER (person) from your texts. The default model is SpaCy which is available for 7 languages. To use a more precise (but slower) model for English, choose Flair.

Using the recipe is straightforward. Just plug in your dataset, select the column containing your texts and run the recipe!

Optionally, you can set some advanced settings. For example, you can choose Flair (only available in English) for more precise extraction. You can also choose the format in which the extracted entities are presented: a separate column for each entity type (default) or a single column with a JSON containing all the entities.

Named Entity Visualization webapp

You can start the webapp from the main webapp menu, under Visual Webapp > Named Entity Visualization. After starting the backend of the webapp, you will be able to try visualizing named entities of any input text using spaCy.

 

References

Alan Akbik, Duncan Blythe and Roland Vollgraf Contextual String Embeddings for Sequence Labeling, 2018 In 27th International Conference on Computational Linguistics.

Get the Dataiku Data Sheet

Learn everything you ever wanted to know about Dataiku (but were afraid to ask), including detailed specifications on features and integrations.

Get the Data Sheet