Project Gutenberg is an open source initiative to offer free ebooks for books that are no longer protected by copyright.
It was created to encourage the creation and distribution of ebooks.
Project Gutenberg basically contains all major works of literature. This plugin lets you retrieve the complete content of books directly as DSS datasets, with one record per line in the book.
This is a great plugin to get started with Natural Language Processing, i.e. processing human-written text.
We even have a great tutorial that uses this plugin to get you started on your first NLP predictive model: a service that automatically recognizes writings by Mark Twain and Charles Dickens.
Learning to recognize authors from books downloaded from Project Gutenberg.
Version | 0.1.0 |
---|---|
Author | Dataiku (Hanna Julienne) |
Released | 2015/11/12 |
Last updated | 2015/11/12 |
License | Apache Software License |
Source code | Github |
Reporting issues | Github |
You need to install the dependencies of the plugin. Go to the Administration > Plugins page to get the command-line to install dependencies
To learn more about how to use this plugin, please read our tutorial.