This plugin provides a tool for extracting Named Entities (i.e. People names, Dates, Places, etc) which can be useful for extracting knowledge from your texts.
The plugin comes with a single recipe that extracts entities using one of two possible models:
– SpaCy: a faster but slightly less precise model. Another advantage of SpaCy is its support for many languages.
– Flair: a slower but more precise model for Named Entity Recognition.
|Author||Dataiku (Alexandre COMBESSIE and Hicham EL BOUKKOURI)|
|License||Apache Software License|
Extract Named Entities
This recipe extracts named entities such as LOC (localisation) and PER (person) from your texts. The default model is SpaCy which is available for both English and French. To use a more precise (but slower) model, choose Flair.
How to use the recipe
Using the recipe is straightforward. Just plug in your dataset, select the column containing your texts and run the recipe!
Optionally, you can set some advanced settings. For example, you can choose Flair (only available in English) for a more precise extraction. You can also choose the format in which the extracted entities are presented: a separate column for each entity type (default) or a single column with a JSON containing all the entities.
This plugin offers a WebApp template for testing SpaCy’s NER model. To successfully run the webapp you will need to:
- Create a Standard WebApp using the template, then enable a python backend.
- Create a python code environment following these requirements:
- In the WebApp’s settings page, select the previously created code environment and activate
Alan Akbik, Duncan Blythe and Roland Vollgraf Contextual String Embeddings for Sequence Labeling, 2018 In 27th International Conference on Computational Linguistics.