US Patents bulk datasets are made available as XML file from Google
This connector retrieves that data
Since these XML files are not well-formed, this connector provides built-in cleansing and parsing. The resulting dataset contains one "patent" column (JSON) containing the patent metadata, abstract, description and claims
The connector enables a local folder cache to simplify your developments. You can choose to retrieve the whole patent database (beware it's big : 40GB) or any year between 2005 and 2015
|License||Apache Software License|
You need to install the dependencies of the plugin. Go to the Administration > Plugins page to get the command-line to install dependencies
The dataset is partitioned by year if you select to.