US Patents

This plugin provides a dataset to retrieve US Patents

US Patents bulk datasets are made available as XML file from Google

This connector retrieves that data

Since these XML files are not well-formed, this connector provides built-in cleansing and parsing. The resulting dataset contains one “patent” column (JSON) containing the patent metadata, abstract, description and claims

The connector enables a local folder cache to simplify your developments. You can choose to retrieve the whole patent database (beware it’s big : 40GB) or any year between 2005 and 2015

Plugin Information

Version	0.0.3
Author	Dataiku
Released	2015-11-15
Last updated	2015-11-15
License	Apache Software License
Source code	Github
Reporting issues	Github

How To Use

You need to install the dependencies of the plugin. Go to the Administration > Plugins page to get the command-line to install dependencies

The dataset is partitioned by year if you select to.

Get the Dataiku Data Sheet

Learn everything you ever wanted to know about Dataiku (but were afraid to ask), including detailed specifications on features and integrations.

Get the Data Sheet