Project Gutenberg Books

Provides a dataset to retrieve free ebooks from Project Gutenberg.

Project Gutenberg is an open source initiative to offer free ebooks for books that are no longer protected by copyright.
It was created to encourage the creation and distribution of ebooks.

Project Gutenberg basically contains all major works of literature. This plugin lets you retrieve the complete content of books directly as DSS datasets, with one record per line in the book.

This is a great plugin to get started with Natural Language Processing, i.e. processing human-written text.
We even have a great tutorial that uses this plugin to get you started on your first NLP predictive model: a service that automatically recognizes writings by Mark Twain and Charles Dickens.

Learning to recognize authors from books downloaded from Project Gutenberg.

 

Plugin Information

Version 1.0.1
Author Dataiku (Hanna Julienne)
Released 2015/11/12
Last updated 2015/11/12
License Apache Software License
Source code Github
Reporting issues Github

How To Use

You need to install the dependencies of the plugin. Go to the Administration > Plugins page to get the command-line to install dependencies

To learn more about how to use this plugin, please read our tutorial.

Get the Dataiku Data Sheet

Learn everything you ever wanted to know about Dataiku (but were afraid to ask), including detailed specifications on features and integrations.

Get the Data Sheet