Google Spreadsheet is very convenient when it comes to sharing data with people. At the end of your workflow in Dataiku DSS, you might want to export a dataset to a Google Spreadsheet document.
This how-to will help you do it with a Python recipe.
Be aware that we tested this how-to only with small datasets.
Head over to Installing Python packages to know how to install them.
Also, you will need to get OAuth Credentials to connect to Google Drive API. Since May 2015, Google does not allow you anymore to connect with your login and password via the API. This page will help you to get a JSON file of the credentials.
In DSS, open a project and create a new Python recipe. Select as an input the dataset you want to export.
As output, we are going to create a “fake” dataset. We will not write anything to it, but in DSS, you always need to have a dataset to build (you build datasets, you don’t run recipes).
Create a new managed dataset in any connection you wish. Name it as you wish.
Name and create the recipe.
In the coding tab, start with importing the libraries and your dataset. Note that this tutorial would work with any Pandas dataframe.
We first need an helper function that will convert a number to a letter. 1->A, 2->B, … 26->Z, 27->AA, 28->AB…
We will use this function to select a range of cells (“A1:B4” for example).
Then, we connect to the spreadsheet with the Gspread library. You can find the
document_id of the spreadsheet in its URL. You need to refer to the JSON file containing OAuth Credentials.
To send data with the Gspread library, we need to select in advance a range of cell. (More info in the documentation of the library.)
Let’s first send the columns names of our dataframe to the spreadsheet.
Now, let’s send the dataframe to the spreadsheet via the API.
Run the recipe, or build the “fake” output dataset.
Be sure that your spreadsheet is editable. You might need to share your spreadsheet with the email you have in your
json_key['client_email']. Otherwise you will get a SpreadsheetNotFound exception when trying to build