en
Get Started

Teradata

The Teradata plugin enhances Dataiku's built-in interaction capabilities with Teradata Vantage systems. The plugin provides visual recipes for scaled, in-Database Analytics with your Vantage data.

Plugin information

Version 2.1.1
Author Teradata
Released 2022-09
Last updated 2022-10-04
License MIT + Apache Software License
Source code Github
Reporting issues Teradata support

With this plugin, you will be able to:

  • Access and execute analytic functions that reside in the Teradata Vantage Analytics Database. The functions are comprised by the following sets:
    • A select subset of the Analytics Database analytic functions.
    • The entire Vantage Analytics Library (VAL) functions set.
  • Submit and run R and Python scripts natively in the Analytics Database for scaled and performant in-Vantage execution via Table Operator objects in the Analytics Database.
  • Store your Dataiku-built, PMML-formatted models in Analytics Database tables.
  • Run scaled scoring tasks in the Analytics Database via the Bring Your Own Model Analytics Database software with models you have previously stored in Analytics Database tables. The present plugin version supports scoring with models in PMML and H2O MOJO formats.

Note that to use the plugin:

  • You will need Teradata Vantage access credentials to establish a connection to a target Analytics Database.
  • To connect to an Analytics Database, the Teradata JDBC Driver 16.20 or later is required. First, the driver can be downloaded from the website:
    https://downloads.teradata.com/download/connectivity/jdbc-driver
    Then, your administrator must install the Teradata JDBC Driver file on your Dataiku server per the directions on the page https://doc.dataiku.com/dss/latest/installation/custom/jdbc.html.
    After the Teradata JDBC Driver is installed, and when a connection to a Teradata Vantage system is specified in Dataiku, simply observe the following driver-related details on the Connections page under the Dataiku Settings:

    1. Specify the following properties under the “Advanced JDBC properties”:
      • CHARSET: UTF8
      • TMODE: TERA
      • If your target system connection is LDAP-based, then also specify the property: LOGMECH: LDAP
      • For VantageCloud system users, communication errors can be avoided by specifying the property: ENCRYPTION: TRUE
    2. Check the “Autocommit mode” button to enable the autocommit mode.

How to set up the plugin

Right after installing the plugin, you will need to build its code environment. Python version 3.6 or 3.7 is required; ensure either version is present on your Dataiku server. When creating the code environment, you can accept the default recommendation to have it managed by Dataiku. If you are updating the plugin, then you might be prompted, and can accept, to retain any compatible existing code environment of the previously installed version.

The plugin requires the following dependency packages that are fetched when you build its code environment:

  • pandas >= 0.23.4
  • SQLAlchemy v.1.3.23
  • teradatasqlalchemy v.17.0.0.0
After installation is complete, ensure the plugins are reloaded as follows:
  1. Go to the Applications button on the right side of the Dataiku top banner and select the Plugins page. Once on the page, select the Installed tab on the top of the screen
  2. Push the RELOAD ALL button on the top right corner of the screen.
  3. Proceed to to your project flow and refresh the page.
Furthermore, as a Teradata Vantage user you need to ensure with your Analytics Database Administrator the following specifications are met:
  • The target Analytics Database must be of version 16.20 or later.
  • To use the “In-Vantage Scripting” recipe
    • the corresponding language bundles need to be installed directly on each node of the target Analytics Database per the following table:
      PID  Product name
      9687-2000-0120  R Interpreter and Add-on Pkg on Teradata Advanced SQL
      9687-2000-0122  Python Interpreter and Add-on Pkg on Teradata Advanced SQL
    • your Analytics Database Administrator must grant to the Analytics Database user account the additional following privileges in advance:
      GRANT EXECUTE FUNCTION ON td_sysfnlib.script TO user;
      Note: The preceding can be alternatively granted to a role instead of a user account.
      GRANT EXECUTE PROCEDURE ON SYSUIF.INSTALL_FILE TO user;
      GRANT EXECUTE PROCEDURE ON SYSUIF.REPLACE_FILE TO user;
      GRANT EXECUTE PROCEDURE ON SYSUIF.REMOVE_FILE TO user;
      GRANT EXECUTE ON SYSUIF.DEFAULT_AUTH TO user;
      GRANT CREATE EXTERNAL PROCEDURE ON user TO user;
      GRANT DROP PROCEDURE ON user TO user;
      If your input data for the SCRIPT Table Operator should reside in a database other than your default user database, then your Analytics Database Administrator must also grant to the user in advance some of the above privileges on the database as follows:
      GRANT EXECUTE FUNCTION ON td_sysfnlib.script TO database WITH GRANT OPTION;
      GRANT EXECUTE ON SYSUIF.DEFAULT_AUTH TO database WITH GRANT OPTION;
      GRANT CREATE EXTERNAL PROCEDURE ON database TO user;
      GRANT DROP PROCEDURE ON database TO user;
    • a database name must be specified in your connection’s advanced connection properties as the value to the special property STO_DATABASE. This database is meant to host the files you wish to upload to the target server for in-Database execution. See also the section In-Vantage Scripting recipe in the following.
  • To use the “BYOM Scoring” recipe, the Teradata Vantage BYOM software package must be installed on the target Analytics Database, and you need to have execute privilege on the installation database; see the related BYOM documentation at the following link:
    https://docs.teradata.com/r/Teradata-VantageTM-Bring-Your-Own-Model-User-Guide/May-2022/Bring-Your-Own-Model
    In the Teradata Vantage Analytics Database, the default installation location of the BYOM software is a database called mldb. In line with the preceding, your Analytics Database Administrator needs to grant your Database user account in advance the privilege
    GRANT EXECUTE FUNCTION ON mldb TO user WITH GRANT OPTION;
    on mldb. If BYOM is installed in a different database on your system, then your Analytics Database Administrator needs to grant you the above privilege on the corresponding database, instead.
  • To use the functions in the “Teradata Analytic Functions” recipe, a default database must be specified in your Teradata connection settings.
  • In addition, to use the VAL functions in the “Teradata Analytic Functions” recipe
    • the Vantage VAL software package must be installed on the target Analytics Database. If VAL is not installed in a publicly available account or database on the target Analytics Database, then each user account who needs to run the analytic functions needs suitable privileges granted by the following statements:
      GRANT EXECUTE PROCEDURE ON val.td_analyze TO user;
      GRANT EXECUTE FUNCTION ON val.tda_dt_calc TO user;
      GRANT EXECUTE FUNCTION ON val.tda_kmeans TO user;
      GRANT SELECT ON stats_dbname TO user;
      In the preceding, we assume val to be the target VAL installation database, and stats_dbname to be the database where the VAL Statistical Test Tables are installed.
      Alternatively, the preceding privileges can be granted to a role instead of a user. For more information, see the related VAL documentation at
      https://docs.teradata.com/r/Vantage-Analytics-Library-User-Guide/January-2022
    • a database name must be specified in your connection’s advanced connection properties as the value to the special property VAL_DATABASE. This name must be the target VAL installation database name. See also the section Teradata Analytic Functions recipe in the following.

How to use the plugin

1. Introduction

The Teradata plugin recipes provide wrapper objects for objects in the Teradata Vantage Analytics Database, so that you can access and use a broad range of Teradata Vantage analytical tools, and push analytical tasks in the Analytics Database for scaled and efficient computations—all so from the convenience of the Dataiku environment.

To use the Teradata plugin in your project flow, go to the flow view of your project, click on the +RECIPE button, then select the Teradata plugin; see Figure 1.1.

Figure 1.1: Selecting the Teradata plugin.

Then proceed to select the desired item from the list of available recipes in the plugin; see Figure 1.2. The recipes are described in detail in the following sections.

Figure 1.2: Window with the list of recipes in the Teradata plugin.

2. Teradata Analytic Functions recipe

The Teradata Analytic Functions recipe integrates Dataiku wrappers for the following 2 groups of functions:

  • A series of select native analytic functions in the Teradata Vantage Analytics Database.
    In particular, the present version of the plugin provides wrappers for a distinct group of Analytics Database analytic functions as follows:

  • The entire range of the Vantage Analytics Functions (VAL) library.
    For a complete breakdown of the VAL functions and syntax, see the details on the page
    https://docs.teradata.com/r/Vantage-Analytics-Library-User-Guide/January-2022.
    To use any of the VAL functions, you must first specify in your target connection’s advanced connection properties the special property VAL_DATABASE. Its value must be the database name where VAL is installed in your connected Analytics Database. You can further choose to keep this property secret by checking the nearby “Secret” button; in our use case example, we leave this button unchecked. Also, if you no longer need the property, then you can remove it by clicking on the red trash can icon button on the right-most end of the property row. If the custom property VAL_DATABASE is not specified in the Teradata connection you use with the Teradata Analytic Functions recipe, then an error will occur when you attempt to specify the recipe settings for a VAL function.
    In addition, your user account for the target Analytics Database must have been granted specific privileges on the VAL installation database, as detailed in the earlier section How to set up the plugin.

All included functions can be invoked from the plugin interface with input Datasets from Vantage tables. Function execution is pushed into the Analytics Database, and the function runs on the server where the input Dataset corresponding table resides.

Note: To use the Teradata Analytic Functions recipe, a default database must be specified in advance in your target Teradata connection settings; otherwise, an error will occur during recipe execution.

Note: Depending on the intended usage and individual functions syntax, some analytic functions might require specification of more than one input Dataset. Similarly, depending on the function, you might be able to specify multiple output Datasets. Check the corresponding documentation for the appropriate usage syntax of a function in advance, because the recipe does not check for consistency in the specified input/output of each analytic function. The recipe Inputs/Outputs window provides +ADD buttons for multiple inputs, if needed.

The main environment of the Analytic Functions recipe is the recipe Settings screen, where you are taken after you specify the recipe input and output. When landing on this screen, in the “Function Name” field you can type the desired function name straight away; as you type on, you will see possible options that match your typing, from which you can choose to auto-complete your function name choice. Alternatively, you can double-click in the “Function Name” field; then, as illustrated in Figure 2.1, a drop-down menu appears where you can scroll across the complete list of available functions in the recipe, and select the one you wish to use. The VAL functions are designated by the string “VAL” in the end of their name.

Figure 2.1: Entering the Teradata Analytic Functions settings screen.

By default, the buttons next to the “Analytics Database” and “VAL” functions group titles are selected. The selection designates the recipe accounts for functions from all Analytic Functions groups in the “Function Name” field. Click on either button to exclude (when unchecked) or include (when checked) the functions in the corresponding group from your search.

Click on the “Version” button on the upper part of the screen (see Figure 2.2) to inquire about
the Analytics Database version of the Teradata Vantage system you are connected to.
the highest Analytics Database version for which analytic functions titles are supported by the present plugin version.
For example, assume the plugin provides wrappers for analytic functions that appear up to the Analytic Database v.17.05, and you are connected to an Analytics Database v.17.10. The point of this information is that any new analytic function titles introduced in v.17.10 will be unavailable in the present plugin version.

Figure 2.2: Checking versioning information.

Finally, when you assemble a function call in the recipe interface, you can push the “Query” button on the upper part of the screen to produce the SQL query format that is pushed into the Analytics Database for execution on the basis of your specifications. While the syntax is incomplete or no function is selected, pushing the button will produce the note “No query present to report.”

In the following, we explore how to use the Analytic Functions recipe with one example for each of the categories of the Analytics Database analytic functions, and the VAL functions.

2.1. Example 1

In this example, we illustrate an instance of using Analytics Database analytic functions by using the time series analysis function nPath. This function searches the input data for user-specified patterns, and identifies paths that lead to an outcome. Assume a project flow with a Dataset “bank_web_clicks” that you wish to use as input for nPath. First, select the Teradata Analytic Functions recipe from the list of the Teradata plugin recipes.

In the pop-up window that follows up, specify the input and output Datasets; see Figure 2.3. The output Dataset will be stored in the database and schema corresponding to the connection selected in the Store Into field. Click on the CREATE button when done.

Figure 2.3: Inputs and outputs for nPath function.

You are then transferred to the recipe settings screen, as illustrated in the preceding Figure 2.1. Specify the nPath function in the “Function Name” field. Upon the function specification, the screen adjusts and presents in separate tabs all suitable required and optional arguments for the selected function. Next to the pre-selected function input, one needs to further specify information about data partitioning and other function-specific parameters and attributes. We fill out the required arguments fields with the desired information for the specific analysis, as dictated by the nPath function syntax. We make no use of optional arguments in this example. In the end, our specifications look like the contents in Figure 2.4.

Figure 2.4: Required arguments screen for nPath function.

By pushing the “Query” button on the upper part of the screen, the recipe produces a pop-up window with the current query that will be sent to the connected Analytics Database for execution; see Figure 2.5.

Figure 2.5: SQL query generated by the recipe for executing nPath in-Database.

Upon running the recipe, the output Dataset is populated with data from the Analytics Database table that contains the function execution results, as was specified in the recipe outputs. A sample of the results is shown in Figure 2.6.

Figure 2.6: Results from the nPath function execution.

2.2. Example 2

In this example, we illustrate an instance of VAL functions in the recipe by building a flow with the Decision Tree, Decision Tree Evaluate and Decision Tree Predict VAL functions. By starting with an input Dataset of banking customer data, we will attempt building a small-depth, gain ratio-based decision tree to predict the customer gender by means of using information about customer age, income, and number of children. Assume a project flow with a Dataset “customer_analysis” that contains columns with the dependent features data.

The initial task is to build the decision tree for our analysis. Select the Teradata Analytic Functions recipe from the list of the Teradata plugin recipes. In the pop-up window that appears, specify the input and output Datasets; see Figure 2.7. The output Dataset will be stored in the database and schema corresponding to the connection selected in the Store Into field. Click on the CREATE button when done.

Figure 2.7: Inputs and outputs for decision tree VAL function.

After creating the recipe, its settings screen appears. Double-click on the “Function Name” field and select the “Decision Tree VAL” function from the drop-down menu. When done, the screen shows tabs with the required and optional arguments for the decision tree function. In the required argument fields, the input field is already picked up from the recipe input information. We then fill in 3 “Input Columns” fields with the parameter names “age”, “income”, and “nbr_children”. Observe that as a column name is typed, the recipe provides suggestions of possible columns in the table to select the correct one from. We further specify “gender” to be the response column name. The recipe then appears like in Figure 2.8.

Figure 2.8: Required arguments for the decision tree VAL function.

Under the optional arguments tab, we specify “gainratio” in the “Algorithm” and “Pruning” fields, skip binning, and further specify we would like a maximum depth of 5 with 2 splits; see Figure 2.9.

Figure 2.9: Optional arguments for the decision tree VAL function.

Executing this recipe produces the output Dataset that, as seen in Figure 2.10, essentially contains an XML string with details about the created model.

Figure 2.10: Dataset with model created by the decision tree VAL function.

Back in our flow, we now create in the same manner a couple more Analytic Function recipes from our Teradata plugin. The first invokes the decision tree evaluate VAL function to produce a confusion matrix from the model; the second performs the prediction via the decision tree predict VAL function. Both recipes use the same inputs, namely, the “customer_analysis” and “ca_decision_tree” Datasets.

In the decision tree evaluate VAL function, the required arguments are the names of the input data Dataset and the decision tree model Dataset; see Figure 2.11. In the optional arguments tab, we only specify that we wish to accumulate by the “customer_analysis” Dataset columns “city_name” and “state_code”; see Figure 2.12.

Figure 2.11: Required arguments for the decision tree evaluate VAL function.
Figure 2.12: Optional arguments for the decision tree evaluate VAL function.

When the recipe is executed, the confusion matrix output is saved in the recipe output Dataset “ca_decision_tree_evaluate”, as shown in Figure 2.13.

Figure 2.13: Output of the decision tree evaluate VAL function.

Eventually, we now want to perform the prediction task with our model. In the recipe with decision tree predict VAL function, the required arguments are again the names of the input data Dataset and the decision tree model Dataset (not shown here). In the optional arguments tab, we only specify accumulation by the “customer_analysis” Dataset columns “city_name” and “state_code”, as done with the evaluation function (not shown here). By executing this recipe, the results are rows of the customer ID numbers and their gender, as predicted by our decision tree model. A sample of this decision tree prediction analysis is shown in Figure 2.14.

Figure 2.14: Results of the decision tree prediction VAL function from application of the earlier trained model.

2.3. Usage notes

  1. The Analytic Functions recipe inputs and outputs can be only Datasets that represent tables on the connected Analytics Database.
  2. It can be a common use case that the table for the input Dataset needs to be read from a read-only database on the connected Teradata Vantage server, and the output Dataset(s) results should be written into tables on the same server but in a different database for which you have both read and write permissions. To implement this, you can specify a connection to the target Teradata Vantage server with
    • the read-only database specified as the “Default database” in the “Basic Params” section of your connection.
    • the read-write database specified as the “Schema” value in the “Naming rules for new datasets” section of your connection. This will be the database where all output is written to by any recipe that uses this connection.
  3. Analytic functions with multiple output Datasets typically require that you specify an output Dataset for the function’s output message/result, in addition to any other output tables/Datasets specified in the recipe. The output dataset(s) name(s) must match any name(s) specified within the recipe settings.
  4. For analytic functions that
    • accept output table names as arguments, and
    • produce only a message table that indicates the name of the output model/metrics table
      you need to specify output Datasets with different names from those tables.
      Some analytic functions provide an option to delete an already existing output table prior to executing an algorithm, but others do not. In the former case, the connected Analytics Database throws an “Already exists” exception.
  5. In the Analytic Functions recipe back-end, the plugin submits SQL queries to create output tables in the target Teradata Vantage system by using CREATE SET TABLE. That is, the function output is checked for duplicate rows. If any duplicate rows are found, then they are removed from the output table. The check for duplicate rows adds a slight performance burden compared to using unfiltered output. This behavior is not adjustable in the present version of the plugin.

3. In-Vantage Scripting recipe

The In-Vantage Scripting recipe enables you to manipulate Table Operator objects in the Analytics Database to execute Python and R scripts natively in the Analytics Database nodes environment. In particular, the In-Vantage Scripting recipe enables wrapper calls to the Analytics Database SCRIPT Table Operator (STO) object. The STO streams data from Analytics Database tables to the standard input of Python and R scripts, and streams the scripts’ standard output back to the Analytics Database. The recipe makes it easy for you to upload Python and R script files to all nodes of the connected Analytics Database, where the STO scales your Analytics by invoking and executing simultaneously an instance of your script on each processing unit of the Analytics Database. For usage in the Analytics Database nodes, your Python and R scripts need only be suitably adjusted to account for the Analytics Database data input/output streaming mode. In return, and as a result, your script performance is enhanced via the Teradata Vantage architectural parallelism.

Note: In the present Teradata plugin version, the In-Vantage Scripting recipe works with connections to all different Vantage editions except for the VantageCloud Lake edition.

We examine how to use the In-Vantage Scripting recipe by means of a scoring use case. In particular, we assume the following set-up:

  • You have a Dataiku project flow with a testing Dataset “ADS_Final_Test”, whose rows you can score with a Python model you have available in a binary file “ex1pModForDSS.out”, and a Python scoring script “ex1pScoViaDSS.py”. You keep both the model and the scoring script in a managed folder “language_scripts” in your project.
  • On the Teradata Vantage side, you are Analytics Database user “me”, and you wish to upload and store your scripts and associated files in database “myDB”, for which you have suitable permissions.

To use the Analytics Database STO object from within Dataiku, you must first ensure you specify the custom property STO_DATABASE on your associated Teradata connection page; see Figure 3.1. Specifically:

  • On your connection properties page go to the Custom Properties area, and push the + ADD PROPERTY button.
  • In the Property name field, specify the keyword STO_DATABASE.
  • In the Property value field, specify the database name where you want to upload your files to; in our use case example, this value is myDB.

You can further choose to keep this property secret by checking the nearby “Secret” button; in our use case example, we leave this button unchecked. Also, if you no longer need the property, then you can remove it by clicking on the red trash can icon button on the right-most end of the property row.

Finally, your user account for the target Analytics Database must have been granted specific privileges to use the STO, as detailed in the earlier section How to set up the plugin.

Figure 3.1: Setting the STO_DATABASE connection custom property.

If the custom property STO_DATABASE is not specified in the Teradata connection you use with the SCRIPT Table Operator Analysis recipe, then an error will occur when you attempt to specify the recipe settings; see the following section “Script Loading”.

Next, in your project flow select the In-Database Scripting recipe from the list of the Teradata plugin recipes. Upon creating the recipe, specify the input and output. Figure 3.2 shows the corresponding specifications for our use case example.

Figure 3.2: Inputs and outputs.

The recipe features 2 tabs with arguments used to install/replace the script files on the Analytics Database and/or invoke the SCRIPT Table Operator (STO). Each tab is presented in detail in the following sections.

3.1. Script loading

If you have omitted specifying the custom property STO_DATABASE in the associated Teradata connection, then the Script Loading tab will immediately present you with an error, as shown in Figure 3.3. Attempting to run the recipe while the error is showing will result in job failure.

Figure 3.3: Error when the STO_DATABASE connection custom property is not specified.

Assume you have otherwise specified the STO_DATABASE custom property in your connection. Then, the Script Loading tab screen displays in the bottom of the File Name and File Details section a relative (to your language script) path in the Analytics Database. This path designates the location where any file <filename> will be uploaded by the recipe onto every Analytics Database node; see Figure 3.4.

Note: If your script reads from any auxiliary file <filename> that you also upload to the Analytics Database, then you must adjust your script code by suitably specifying the <filename> location in the Analytics Database where needed, as shown in Figure 3.4. This adjustment should be made before the script is uploaded to the Analytics Database.
In our use case, for example, your Python script is expected to read the binary model file “ex1pModForDSS.out” after the 2 files have been uploaded into the Analytics Database. In the Analytics Database, the model file could be accessed by your code at the path shown in Figure 3.4, where the corresponding Python code could involve a file opening statement like:
fileIn = open('myDB/ex1pModForDSS.out', 'rb')

Figure 3.4: Script and auxiliary files specification.

In this tab, you can specify the following arguments:

  • Script File Location
    • Start here to specify the current location of the language script to upload and install on the target Vantage server. The script can be loaded from a Dataiku Managed Folder or it may already exist on the Vantage server; make a suitable selection from the drop-down menu.
  • Script File Name
    • This field appears after you select a script file location option. Specifies the name of the script file to be uploaded.
    • This is the main script used in the SCRIPT Table Operator.
    • Depending on the selected Script File Location, this input form changes as follows:
      • If the script is on the Teradata Vantage server, then a text input field is provided to specify a file name.
      • If the script is in Dataiku Managed Folders, then a drop-down box is provided that contains a list of the files in the respective location.
  • Script File Alias
    • The file name alias to be used in the SQL statement.
    • This is mainly used by the SCRIPT Installation/Replace process in the metadata tables.
  • Script File Address
    • The fully qualified file location on the Teradata Vantage server.
    • This field only appears if the selected option for Script File Location is “Teradata Vantage Server”.
  • Replace Script button
    • If a file with the same name already exists at the destination location, then selecting the button specifies to replace the file. Otherwise, the replacement request is ignored.
    • Specify to replace an existing script on the Analytics Database when you would like to upload an updated version of the script.
  • Add More Files
    • This button enables you to have one or more additional auxiliary files (e.g., model files) installed in the Analytics Database for use by the script. Push the button once for every additional file you would like to specify.
  • Additional Files:
    • File Location
      • The current location of the file to be installed; can be either a Dataiku Managed Folder or the target Teradata Vantage server.
    • File Name
      • This is the file name of an additional file.
      • Similar to the Script File Name above, this input option is either a Text Field (to specify files located in the Analytics Database) or a drop-down box (if DSS Managed Folder is selected as the File Location).
    • File Address
      • The fully qualified file location on the Teradata Vantage server.
      • Similarly to the Script File Address above, this field only appears when “Teradata Vantage Server” is selected as the file location.
    • File Format
      • Specifies whether the additional file to be installed is a BINARY or TEXT file.
    • Replace Current File
      • Similar to the Replace Script above, selecting this button specifies to replace an existing file with the same name in the destination Analytics Database.

Note 1: Make any necessary script adjustments before you run the recipe, so that a suitable script version is uploaded.

Note 2: If you should need to modify an uploaded version of your script and/or auxiliary files, then you need to make the changes in a copy of the file(s) on your client, and then run the recipe again by uploading the updated file(s) to replace the existing one(s).

3.2. STO arguments

Note: In the following, the recipe prevents using specific characters in some fields for security purposes as follows:

  • When you specify column names in the input Dataset, the double quote ” (unicode character U+0022) cannot be used in the name string. You can otherwise use any character in the column naming specification, according to the related Vantage documentation at: https://docs.teradata.com/r/Teradata-Database-SQL-Fundamentals/June-2017/Basic-SQL-Syntax/Object-Names
  • The semicolon ; (unicode character U+003B) cannot be used in the Conditions field.

    Figure 3.5: Arguments to invoke the SCRIPT Database object.
  • Script Language
    • Select whether you are submitting a Python or an R script.
  • Script Arguments
    • Optional arguments that your script may need for execution. Specify one argument per field, and click on the + button to add argument fields.
  • Script Input Columns
    • Specify the column names of the recipe input dataset in the order needed by your script. Specify one column name per box, and click on the + button to add as many column names as you need. If you leave the filed blank and specify no names, then the script receives as input all columns in the input dataset.
    • This field corresponds to the SCRIPT Table Operator ON clause.
    • Optionally, use the Conditions field to specify any logical conditions for row selection across your script input columns. If specified, then conditions must be in SQL format. Multiple conditions should be separated by any of the AND and OR keywords. This field is reserved for the input to the optional WHERE clause in the SCRIPT Table Operator ON clause.
  • Data Partition Option
    • Select the default option None in the drop-down menu to have the SCRIPT input data treated as a single partition. Select any of the other options so that one or more columns of your input Dataset are designated as partition columns in the Column name(s) to partition by field.
    • If you specify to partition By column values, then different partitions are determined by sets of rows with unique values (if one partition column is specified) or with unique value combinations (if multiple partition columns are specified) in the designated partition column(s). This option corresponds to the SCRIPT Table Operator PARTITION BY clause.
    • If you specify to partition By Database AMP Hash, then data are distributed to Analytics Database AMPs based on AMP hash values contained in the designated partition column(s). This option corresponds to the SCRIPT Table Operator HASH BY clause.
  • Column name(s) to partition by
    • Specify one or more column names from the recipe input dataset to use for data partitioning according to the Data Partition Option menu choice. Specify one column name per box, and click on the + button to add column names as desired.
    • This option appears only when the Data Partition Option is other than None.
  • Column name(s) to order by
    • Specify one or more column names from the recipe input dataset to use their data for row ordering in the partitions. Specify one column name per box, and click on the + button to add column names as desired.
    • For each specified column, also specify the order sequencing in the neighboring drop-down menu. By default, columns are ordered in descending values order.
    • This option appears only when the Data Partition Option is other than None.
    • This option corresponds to the SCRIPT Table Operator ORDER BY clause (if data partitioning is by column values) or LOCAL ORDER BY clause (if data partitioning is by Database AMP hash).
  • Script output variables
    • In this section you must specify all output variables of your script. Each section row corresponds to one variable. Observe that you need to know the number and types of variables returned by your script. To this end, use the + button to add as many rows as needed. This specification corresponds to the SCRIPT Table Operator RETURNS clause. In particular, each row comprises of 2 columns with the following fields:
      • In the first column, specify a name for a variable returned by the script.
      • In the second column, specify the SQL data type of the corresponding variable. SCRIPT Table Operator will be storing your script output variables in an Analytics Database table, hence the requirement to specify SQL data types.
    • Push the Select all variables button to include all of your script output variables in the recipe output dataset. This button is selected by default. This option corresponds to specifying SELECT * in a SCRIPT Table Operator query.
    • Select the blank button to the right of the fields in a row to include the corresponding variable in the recipe output dataset. If any of those buttons is selected at any point, then the Select all variables button is deselected. This option enables you to select specific output variables of your script to be included in the recipe output dataset.
    • It is equivalent to either have the Select all variables button selected or all individual output variable rows buttons selected.

3.3. Running the SCRIPT recipe

Upon setting up the arguments and running the recipe, the output Dataset is populated with data from the Analytics Database table with the scoring results, as was specified in the recipe outputs; see Figure 3.6.

Figure 3.6: Sample output from the In-Database SCRIPT scoring analysis.

4. BYOM recipes

Assume a project flow, in which you have trained and deployed a model with a visual recipe. Further assume you would like to use this model for data scoring in the Teradata Vantage Analytics Database with the Teradata BYOM software. The following sections illustrate using the BYOM recipes to perform these steps, namely:

  • The “Model Export to Vantage” recipe section explores how to BYOM from Dataiku to Vantage by exporting your Dataiku PMML-formatted models to a target Teradata Vantage server.
  • The “BYOM Scoring” recipe section explores in-Database scoring with models that have been imported in select formats from Dataiku or other platforms.

4.1. Model export to Vantage

The recipe takes as input deployed Dataiku models that you have previously saved in PMML format, and produces a dummy Dataset upon completion that logs the recipe operation. The recipe exports the PMML-formatted model to a Teradata Vantage server by saving it into a table on your target Analytics Database. Upon creating the recipe, you specify the destination table and model names you wish to use. The following paragraphs provide is a closer look into these steps.

4.1.1. Saving a Dataiku model in PMML format

Click on the Visual Analyses button on the left side of the Dataiku banner. Select the analyses you have created, and then choose the model you would like to use further in Teradata Vantage. Assume the model you want to save is the Random Forest model that tops the list shown in Figure 4.1.

Figure 4.1: Model listing

On the model screen, click on the top right the Deploy button; see Figure 4.2. Select the Download as PMML option to see the model saved in that format in your project flow.
Note: The option to save a model in PMML format might be unavailable for some models.

Figure 4.2: Saving a model in PMML format.
4.1.2. Selecting the recipe Input/Output

Back in your project flow, select the BYOM – Model Export to Vantage recipe from the list of the Teradata plugin recipes. A pop-up window will then ask you about the recipe input and output; see Figure 4.3. In the input section, specify the saved model in your flow that you wish to export to Teradata Vantage. In the output section, specify a name for the output Dataset. The recipe output is a dummy dataset that will contain a string with success/failure information about the recipe execution.

Figure 4.3: Inputs and outputs in the model export recipe.

Once you create the recipe, you are taken to the recipe main screen; see Figure 4.4. On this screen, you specify information about the target Teradata Vantage server you want to send the plugin to, and the table to save the model in. In more detail:

4.1.3. Target Teradata Vantage server connection properties

In the Select Connection field, click the + ADD A CONNECTION button, and a drop-down menu appears with the existing list of Dataiku connections. Choose your target Teradata Vantage server connection from this list. To drop this choice and select a different connection, push the red trashcan icon on the far-right end of a selected connection’s field; then choose a different connection by clicking again the + ADD A CONNECTION button.

Note 1: The + ADD A CONNECTION button can be pushed multiple times, and multiple connections can be specified. Any connections added below your first choice are ignored, and can be removed from the screen with their corresponding red trashcan icon.

Note 2: If you specify no connection through the + ADD A CONNECTION button and proceed to run the recipe, then an error will occur.

In the Database Name field, specify the database name where you would like to store the models table in on the target Teradata Vantage server. If you leave the field blank, then the connection’s default database will be used.

Note 3: Your Database user account in the target connection must have permissions to create, modify and drop tables in the database you decide to store the model table in.

Figure 4.4: Settings for model exporting.
4.1.4. Model table details

Subsequently, in the Table Details lower section of the screen you can specify the Teradata Vantage table name where you wish to save the model in, and a name for your model so you can identify it among others in the table. For an example, see the bottom part in Figure 4.4.

Note 1: When saving the model, you can specify that a new table is created by clicking on the corresponding CREATE NEW TABLE button on the screen, or use an existing table on the Vantage server. If you specify to create a new table and the table name already exists, then the old table will be overwritten.

Note 2: The model name must be up to 30 characters long. Characters in excess are ignored.

4.1.5. Running the BYOM model export recipe

Upon setting up the arguments and running the recipe, the specified PMML-formatted model is exported into the target Teradata Vantage table. At the same time, a successful execution message is written in the recipe output dataset. Figure 4.5 shows the output of the recipe execution based on the export-to-Vantage task that was set up in Figure 4.4. The model_id referred to in the output message is the column name that contains the model name in the target models table.

Figure 4.5: Report about the model export recipe execution.
4.1.6. Using an exported model on the target Teradata Vantage server

After storing it in a Teradata Vantage table, the PMML model can be accessed and used for any suitable operation. As an example, we illustrate querying the target table with SQL. Figure 4.6 shows a screenshot from a SQL editor with the query results. In case the same table hosts different models in multiple rows, a model you stored can be identified by its name in the table column model_id.

Figure 4.6: Querying a saved model.

Further in this example, Figure 4.7 illustrates using a SQL query to invoke the model through the Advanced SQL Engine PMMLPredict function. PMMLPredict is a function in the Teradata Vantage BYOM software package. For details about the function syntax and query options, see the Teradata documentation page:
https://docs.teradata.com/r/Teradata-VantageTM-Bring-Your-Own-Model-User-Guide/May-2022/BYOM-Functions.

Figure 4.7: Scoring with a saved model on a Vantage server.

In the following section, we illustrate how to score with models stored in Analytics Database tables directly from the Dataiku environment without any SQL programming. This approach is enabled by the BYOM Scoring recipe in the Teradata plugin.

4.2. BYOM scoring

This recipe enables you to perform scaled, fast scoring in-Database against table data in the Analytics Database where you have imported one or more models. As in the Model Export to Vantage recipe, the model source table and model names you wish to use need to be specified inside the recipe after you create it. The following paragraphs have the detailed steps. We illustrate the steps by assuming a use case of consuming a saved Dataiku model that you previously exported to the same Teradata Vantage system in PMML format.

In your project flow, select the BYOM – Scoring recipe from the list of the Teradata plugin recipes. A pop-up window will then ask you about the recipe input and output; see Figure 4.8. In the input section, specify the test input Dataset to score. In the output section, specify a name for the output Dataset with the scored data.

Figure 4.8: Inputs and outputs in the BYOM scoring recipe.

The recipe then takes you to its main screen to specify parameters for the task. Figure 4.9 shows this screen with suitable specifications for our Dataiku model consumption use case. In the following, we step through the different fields in detail.

Figure 4.9: Settings for BYOM scoring.
4.2.1. BYOM scoring information

The upper part of the screen is titled Specify BYOM Scoring Information, and prompts you to select the type and sub-types, if appropriate, of the scoring model you wish to use.

The first field is the Scoring Model Type. In the present version of the Teradata plugin, you can choose between the options PMML and H2O MOJO; see Figure 4.10.

Figure 4.10: Permissible input model types for BYOM scoring.

Example: To illustrate our Dataiku model consumption use case, we assume using a model that has been previously exported from Dataiku to the connected Teradata Vantage server in PMML format.

Note for H2O MOJO models: If you select the H2O MOJO option as the scoring model type, then an additional field H2O Model Type shows up. You can choose between using

  • OPENSOURCE, and
  • DAI (Driverless AI)

options as the H2O model type. Furthermore, if you have an H2O DAI model to use, then you are required to specify the associated license information too. In this scenario, the additional couple of fields H2O DIA License DB Input, and H2O DIA License DB Table Input appear in the lower part of the screen (see next subsection), in which you need to specify the database and table names where your H2O DIA model license resides on the connected server.

After the model type has been selected, the last field in the upper part of the screen is the Predict Function Database Name. In this field, you specify the database name in the Vantage server where the prediction function resides. The default installation location for the BYOM software inside an Analytics Database is a user database called mldb. If unsure, first validate this information with your Teradata Vantage Database Administrator. If applicable, choose the option INPUT-DB-NAME from the drop-down menu of this field to specify a different database name, as shown in Figure 4.11.

Figure 4.11: Determination of BYOM prediction function location in Vantage.
4.2.2. Specify Scoring Model Information

The lower screen section is titled Specify Scoring Model Information, and the recipe asks you to

  • pinpoint the model you wish to use for the scoring task on the Teradata Vantage system where the input dataset table is located, and
  • specify additional options

Sequentially, the section presents you with the following fields:

Model DB Name Input: Use the drop-down menu to select how to specify the database where your model table resides. You can either opt for

  • the “DATABASE LIST”, where you are provided with a new drop-down menu to select the database from a list of available databases on the connected server, or
  • the “INPUT-DB-NAME”, where you will be provided an empty field to specify explicitly the database name.

Example: In the illustration of Figure 4.9, the database name dssuser was explicitly specified.

Model Table Name Input: Use the drop-down menu to select how to specify the table that contains the model you wish to use. You can either opt for

  • the “TABLES-LIST”, where you are provided with a new drop-down menu to select the target table from a list of available tables in the previously selected database, or
  • the “INPUT-TABLE-NAME”, where you will be provided an empty field to specify explicitly the target table name.

Example: In the illustration of Figure 4.9, we opted for the TABLES-LIST option, and selected the table BYOM_Demo-Models from the associated drop-down menu.

Model Id/Name: Use the drop-down menu to choose the model you wish to use among the available models in the previously selected table. Models are named on the basis of their model ID.

Example: In the illustration of Figure 4.9, we located the desired model with model ID Random_Forest_1 by inspecting the drop-down menu next to the Model Id/Name field.

As indicated in the previous subsection, if you specified your model to be of type H2O DAI, then you will be seeing a couple more fields at this point, as follows:

H2O DIA License DB Input: Use the drop-down menu to select how to specify the database where your model license information resides. You can either opt for

  • the “LICENSE-DATABASE-LIST”, where you are provided with a new drop-down menu to select the database from a list of available databases on the server, or
  • the “INPUT-LICENSE-DB-NAME”, where you will be provided an empty field to specify explicitly the database name.

H2O DIA License Table Input: Use the drop-down menu to select how to specify the table that contains your H2O DIA model license. You can either opt for

  • the “LICENSE-TABLES-LIST”, where you are provided with a new drop-down menu to select the target table from a list of available tables in the previously selected database, or
  • the “INPUT-LICENSE-TABLE-NAME”, where you will be provided an empty field to specify explicitly the target table name.

The remaining settings on the recipe screen enable specification of syntax elements of the Analytics Database BYOM prediction function that are supported by the recipe, as follows:

The Accumulate options specify to show all or a desired number of columns of the test table in the recipe’s output table. The recipe enables you to

  • accumulate all columns by pushing the corresponding Accumulate All Columns button, or
  • accumulate specific columns, by pushing one or more times the Add A Column button. Each time you push the button, a drop-down list of the test table column names appears so you can specify the columns to accumulate by. To discard a column choice, push the red trashcan icon on the far-right end of the field that contains the column name.

Note 1: When you select the Accumulate All Columns check box, then the Accumulate Specific Columns field is hidden.

Note 2: When you specify to use the Accumulate Specific Columns option but fail to specify any column names, then an error is produced.

The ModelOutputFields check box enables you to optionally specify the scoring fields to output as individual columns.

The Overwrite model cache check box enables you to activate the OverwriteCachedMode option of the Analytics Database BYOM prediction function. This option specifies any model left in cache be cleared. It is typically used when you update a model, so that the newer model version is used instead of the cached.

For further details about these syntax elements, see the Analytics Database BYOM prediction functions Syntax Elements sections at the Teradata documentation address:
https://docs.teradata.com/r/Teradata-VantageTM-Bring-Your-Own-Model-User-Guide/May-2022/BYOM-Functions.

4.2.3. Running the BYOM Scoring Recipe

Upon setting up the arguments and running the recipe, a corresponding query is formed in the recipe back-end and pushed to the connected Analytics Database for execution. Figure 4.12 shows the scoring results output Dataset in our Dataiku model consumption use case.

Figure 4.12: BYOM scoring output Dataset.

In the present example, we execute the scoring recipe for a PMML model. This action in Dataiku is equivalent to submitting the explicit SQL query that was illustrated in the earlier Section “Using an Exported Model on the Target Vantage Server“.

Note: Since the BYOM scoring recipe only provides a wrapper for the corresponding Analytics Database BYOM prediction function, errors that might occur while using the recipe should be cross-checked with the function documentation on the BYOM page at the aforementioned web address in the previous section.

5. Troubleshooting

5.1. “External Code Failed” Error

Assume that upon attempting to load one of the plugin recipes, you encounter an error immediately after the plugin user interface screen appears. The error might look like the banner in Figure 5.1, where the title is “External code failed” and the actual error message might vary.

Figure 5.1: Plugin “External code failed” error.

This scenario might appear after a change occurs in the Teradata plugin registered on your Dataiku server. A possible fix is to attempt reloading the plugins as follows:

  1. Go to the Applications button on the right side of the Dataiku top banner and select the Plugins page. Once on the page, select the Installed tab on the top of the screen.
  2. Push the RELOAD ALL button on the top right corner of the screen.
  3. Return to your project flow, refresh the page, and attempt to create the recipe anew.

5.2. “Required string parameter ‘query’ is not present” Error

If you should encounter an error “Required string parameter ‘query’ is not present” while attempting to run a Teradata recipe, or any recipe that involves Datasets based on Teradata Vantage tables, then check the settings of your Teradata connection. Please ensure that the “Autocommit Mode” button is checked under the “Advanced Params” section of the connection, and attempt executing the recipe again.

5.3. “Recipe type X is unknown” Error

The error “Recipe type X is unknown” can appear when importing a project to a target Dataiku server that carries the Teradata plugin v.2.0.0 or later. In the above message, X can be related to an older recipe name from the Teradata plugin of version older than 2.0.0 on the Dataiku server where the project was exported from

5.3.1. Background

Teradata plugin versions 2.0.0 or later are incompatible with older plugin versions. In an imported project with older recipes, as defined earlier, the older recipes will not appear on the new project flow on the target server. Instead, you will need to create a corresponding recipe anew by using the newer Teradata plugin version. Note also that in the project migration process, output Datasets of the older recipes retain metadata information that is specific to the old recipes that created those Datasets.

Assume the output Dataset in the new recipe is specified to be the same existing Dataset that was created by the corresponding old recipe. In light of the earlier facts, the existing Dataset metadata will expect to see the Dataset being linked to the old recipe. Alas, the plugin that contained the old recipe is not present on the new server. Due to the incompatibility between the Teradata plugin versioned 2.0.0 or later recipes and the older corresponding ones, an error that looks like “Recipe type X is unknown” will be thrown to the user upon creation of the new recipe.

5.3.2. Resolution

To circumvent the situation described above and resolve the issue:

  1. Create a new dummy output Dataset for the new recipe.
  2. After the recipe is created, go to the recipe interface, click on the “Input/Output” tab, click on the trash bin icon next to the dummy output dataset, and replace it with the original old recipe output Dataset that you intended to use.
    This explicit action replaces the project metadata for the specific output Dataset with information about the new recipe, and prevents errors due metadata related to the old recipe.

5.4. My Teradata recipes do not work

The Teradata plugin recipes are not backwards-compatible with older deprecated counterparts featured in releases of individual Teradata plugins prior to v.2.0.0. If your project contains any of the deprecated recipes from the older, individual Teradata plugins (that is, the former Teradata SQLE Functions Plugin, Teradata SCRIPT Table Operator Plugin, or the Teradata BYOM plugin), and your server has been updated with the Teradata plugin v.2.0.0 or newer, then you will need to explicitly replace each one of those recipes with its corresponding counterpart in the Teradata plugin.

5.5. The BYOM Model Export to Vantage recipe fails with an error

Assume you have made the correct specifications in the “BYOM – Model Export to Vantage” recipe, and the recipe fails to execute successfully. If the error is unrelated to something that might be wrong with the Analytics Database or your connection settings, consider the following additional element to troubleshoot. The recipe requires that the recipe input Dataiku model can be saved in PMML format. It might happen that for whatever reason the target model might have prior restrictions or limitations by Dataiku in being saved as a PMML-formatted object. In this scenario, the recipe cannot use this model, and will fail to run successfully if you invoke it with this input. You can inspect for any restrictions or limitations in saving a Dataiku model in PMML format by going to your project flow, and then selecting and opening the Dataiku model icon that contains the target model. Then click on the model active version title, and in the resulting screen inspect the options under the “Actions” button. The “Download as PMML” option should be available, otherwise the BYOM recipe will be unable to export this model to Vantage.

Plugin support policy

For any issues, support, remarks, comments, or suggestions about the plugin, please visit the Teradata Support website at https://support.teradata.com. Expected response time is typically one business day.

Get the Dataiku Data Sheet

Learn everything you ever wanted to know about Dataiku (but were afraid to ask), including detailed specifications on features and integrations.

Get the data sheet