Project Gutenberg Books Plugin

Project Gutenberg is an open source initiative to offer free ebooks for books that are no longer protected by copyright.
It was created to encourage the creation and distribution of ebooks.

Project Gutenberg basically contains all major works of literature. This plugin lets you retrieve the complete content of books directly as DSS datasets, with one record per line in the book.

This is a great plugin to get started with Natural Language Processing, i.e. processing human-written text.
We even have a great tutorial that uses this plugin to get you started on your first NLP predictive model: a service that automatically recognizes writings by Mark Twain and Charles Dickens.

Product screenshot

Learning to recognize authors from books downloaded from Project Gutenberg.

Plugin information

AuthorDataiku (Hanna Julienne)
Last updated2015/11/12
LicenseApache Software License
Source codeGithub
Reporting issuesGithub

How to use

You need to install the dependencies of the plugin. Go to the Administration > Plugins page to get the command-line to install dependencies

To learn more about how to use this plugin, please read our tutorial.