howto

How to export a dataset to a Google Spreadsheet?

August 02, 2015

Google Spreadsheet is very convenient when it comes to sharing data with people. At the end of your workflow in Data Science Studio, you might want to export a dataset to a Google Spreadsheet document.

This how-to will help you do it with a Python recipe.

Be aware that we tested this how-to only with small datasets.

Prerequisites

You will need to install the Gspread, OAuth2Client and JSON librairies.

Head over to Installing Python packages to know how to install them.

Also, you will need to get OAuth Credentials to connect to Google Drive API. Since May 2015, Google does not allow you anymore to connect with your login and password via the API. This page will help you to get a JSON file of the credentials.

Create the Python recipe with a fake output

In DSS, open a project and create a new Python recipe. Select as an input the dataset you want to export.

As output, we are going to create a "fake" dataset. We will not write anything to it, but in DSS, you always need to have a dataset to build (you build datasets, you don't run recipes).

Create a new managed dataset in any connection you wish. Name it as you wish.

Name and create the recipe.

Write the code

In the coding tab, start with importing the libraries and your dataset. Note that this tutorial would work with any Pandas dataframe.

# -*- coding: utf-8 -*-
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import numpy as np
import json
import gspread
from oauth2client.client import SignedJwtAssertionCredentials

# input dataset
df = dataiku.Dataset("my_dataset_name").get_dataframe()

Export the dataframe to Google Spreadsheet with Gspread

We first need an helper function that will convert a number to a letter. 1->A, 2->B, ... 26->Z, 27->AA, 28->AB...

We will use this function to select a range of cells ("A1:B4" for example).

def numberToLetters(q):
    q = q - 1
    result = ''
    while q >= 0:
        remain = q % 26
        result = chr(remain+65) + result;
        q = q//26 - 1
    return result

Then, we connect to the spreadsheet with the Gspread library. You can find the document_id of the spreadsheet in its URL. You need to refer to the JSON file containing OAuth Credentials.

json_key = json.load(open('/path/to/CredentialsGoogleDriveAPI.json'))
scope = ['https://spreadsheets.google.com/feeds']
credentials = SignedJwtAssertionCredentials(json_key['client_email'], json_key['private_key'], scope)
gc = gspread.authorize(credentials)

ws = gc.open_by_key('document_id').sheet1

To send data with the Gspread library, we need to select in advance a range of cell. (More info in the documentation of the library.)

Let's first send the columns names of our dataframe to the spreadsheet.

# columns names
columns = df.columns.values.tolist()
# selection of the range that will be updated
cell_list = ws.range('A1:'+numberToLetters(len(columns))+'1')
# modifying the values in the range
for cell in cell_list:
    val = columns[cell.col-1]
    if type(val) is str:
        val = val.decode('utf-8')
    cell.value = val
# update in batch
ws.update_cells(cell_list)

Now, let's send the dataframe to the spreadsheet via the API.

# number of lines and columns
num_lines, num_columns = df.shape
# selection of the range that will be updated
cell_list = ws.range('A2:'+numberToLetters(num_columns)+str(num_lines+1))
# modifying the values in the range
for cell in cell_list:
    val = df.iloc[cell.row-2,cell.col-1]
    if type(val) is str:
        val = val.decode('utf-8')
    elif isinstance(val, (int, long, float, complex)):
        # note that we round all numbers
        val = int(round(val))
    cell.value = val
# update in batch
ws.update_cells(cell_list)

Build it

Run the recipe, or build the "fake" output dataset.

Be sure that your spreadsheet is editable. You might need to share your spreadsheet with the email you have in your json_key['client_email']. Otherwise you will get a SpreadsheetNotFound exception when trying to build

That's it!