en

Google Cloud Translation

This plugin provides a recipe to use Google Cloud Translation to translate text

Plugin information

Version 1.0.3
Author Dataiku (Alex COMBESSIE, Arnaud D’ESQUERRE, Niklas MUENNIGHOFF)
Released 2021-04
Last updated 2023-04
License Apache Software License
Source code Github
Reporting issues Github

This plugin lets you translate text to another language using Google Cloud Translation.

Note that the Google Cloud Translation API is a paid service, check their API pricing page for more information.

How to set up

If you are a Dataiku and Google Cloud admin user, follow these configuration steps right after you install the plugin. If you are not an admin, you can forward this to your admin and scroll down to the How to usesection.

1. Get a service account key for the Translation API – in Google Cloud Console

You can follow the step-by-step instructions on this Google Cloud documentation page. Make sure that billing is activated on your Google Cloud project.

Once you complete the “Create service accounts and keys” step, you will receive your service account key as a JSON file.

Service Account Creation
Service Account Creation

2. Create an API configuration preset – in Dataiku DSS

In Dataiku DSS, navigate to the Plugin page > Settings > API configuration and create your first preset.

API Configuration Preset Creation

3. Configure the preset – in Dataiku DSS

Completed API Configuration Preset
  • Fill the AUTHENTICATION settings
    • Copy-paste the content of your service account key from Step 1 in the GCP service account key field. Make sure the key is valid JSON.
    • Alternatively, you may leave the field empty so that the key is ascertained from the server environment. If you choose this option, please follow this documentation.
  • (Optional) Review the API QUOTA and PARALLELIZATION settings
    • The default API Quota settings ensure that one recipe calling the API will be throttled at 6000 requests (Rate limit parameter) per minute (Period parameter).
      • In other words, after sending 6000 requests, it will wait for 60 seconds, before sending the next batch of up to 6000 requests.
    • You may need to decrease the Rate limit parameter if you envision that multiple recipes will run concurrently to call the API.
      • For instance, if you want to allow 10 concurrent DSS activities, you can set this parameter at 6000/10 = 600 requests per minute.
    • The default Concurrency parameter means that 4 threads will call the API in parallel. This parallelization operates within the API Quota settings defined above.
      • We do not recommend changing this default parameter unless your server has a much higher number of CPU cores.
  • Set the Permissions of your preset
    • You can declare yourself as the Owner of this preset and make it available to everybody, or to a specific group of users.
    • Any user belonging to one of these groups on your Dataiku DSS instance will be able to see and use this preset.

Voilà! Your preset is ready to be used.

Later, you (or another Dataiku admin) will be able to add more presets. This can be useful to segment plugin usage by user group. For instance, you can create a “Default” preset for everyone and a “High performance” one for your Marketing team, with separate billing for each team.

How to use

Let’s assume that you have installed this plugin and that you have a Dataiku DSS project with a dataset containing a column of text to translate.

Input

  • Dataset with a text column to translate
Google Cloud Translation – Example Input Dataset

Google Cloud Translation recipe

To create your first recipe, navigate to the Flow, click on the + RECIPE button and access the Natural Language Processing menu. If your dataset is selected, you can directly find the plugin in the right panel.

Plugin Recipe Creation

Settings

Google Cloud Translation – Settings
  • Review INPUT parameters
    • The Text column parameter is the column in the input dataset that you wish to translate.
    • The Source language parameter is the original language of the Text column . If you would like the translation api to infer the original language, you can select the Auto-detect option.
    • The Target language parameter is the language you would like to translate to.
    • The Format parameter is the formatting your input text is in. Leave the default to Text if you have plain text or change to HTML.
  • Review CONFIGURATION parameters
    • The Preset parameter is automatically filled by the default one made available by your Dataiku admin. You may select another one if multiple presets have been created.
    • The Fail on error parameter lets you choose if the recipe should abort execution if any issues are raised. If unchecked, any errors will be logged in two additional columns in the output.

Output

  • Dataset with text translated to another language
Google Cloud Translation – Example Output Dataset

The columns of the output dataset are as follows:

  • [Input dataset columns]
    • All columns from the input dataset will be preserved
  • [selected column]_language
    • The detected language of the selected column
    • Only present if Auto-detect has been selected as the source language
  • [selected column]_{target iso code}
    • The selected column in its translated version
  • translation_api_response
    • Raw API response in JSON form
  • translation_api_error_message
    • The error message in case an error occurred
    • Only present if Fail on error is not selected during configuration
  • translation_api_error_type
    • The error type in case an error occurred
    • Only present if Fail on error is not selected during configuration

Happy natural language processing!