Introduction to UDFs#
A user-defined function (UDF) is a custom function authored by a user of Trino in a client application. UDFs are scalar functions that return a single output value, similar to built-in functions.
Note
Custom functions can alternatively be written in Java and deployed as a plugin. Details are available in the developer guide.
UDF declaration#
Declare the UDF with the SQL FUNCTION keyword and the supported statements for SQL user-defined functions or Python user-defined functions.
A UDF can be declared as an inline UDF to be used in the current query, or declared as a catalog UDF to be used in any future query.
Inline user-defined functions#
An inline user-defined function (inline UDF) declares and uses the UDF within a
query processing context. The UDF is declared in a WITH
block before the
query:
WITH
FUNCTION doubleup(x integer)
RETURNS integer
RETURN x * 2
SELECT doubleup(21);
-- 42
Inline UDF names must follow SQL identifier naming conventions, and cannot
contain .
characters.
The UDF declaration is only valid within the context of the query. A separate later invocation of the UDF is not possible. If this is desired, use a catalog UDF.
Multiple inline UDF declarations are comma-separated, and can include UDFs calling each other, as long as a called UDF is declared before the first invocation.
WITH
FUNCTION doubleup(x integer)
RETURNS integer
RETURN x * 2,
FUNCTION doubleupplusone(x integer)
RETURNS integer
RETURN doubleup(x) + 1
SELECT doubleupplusone(21);
-- 43
Note that inline UDFs can mask and override the meaning of a built-in function:
WITH
FUNCTION abs(x integer)
RETURNS integer
RETURN x * 2
SELECT abs(-10); -- -20, not 10!
Catalog user-defined functions#
You can store a UDF in the context of a catalog, if the connector used in the catalog supports UDF storage. The following connectors support catalog UDF storage:
In this scenario, the following commands can be used:
CREATE FUNCTION to create and store a UDF.
DROP FUNCTION to remove a UDF.
SHOW FUNCTIONS to display a list of UDFs in a catalog.
Catalog UDFs must use a name that combines the catalog name and schema name with
the UDF name, such as example.default.power
for the power
UDF in the
default
schema of the example
catalog.
Invocation must use the fully qualified name, such as example.default.power
.
SQL environment configuration for UDFs#
Configuration of the sql.default-function-catalog
and
sql.default-function-schema
SQL environment properties allows you
to set the default storage for UDFs. The catalog and schema must be added to the
sql.path
as well. This enables users to call UDFs and perform all
User-defined function management without specifying the full path to the UDF.
Note
Use the Memory connector in a catalog for simple storing and testing of your UDFs.
Recommendations#
Processing UDFs can potentially be resource intensive on the cluster in terms of memory and processing. Take the following considerations into account when writing and running UDFs:
Some checks for the runtime behavior of queries, and therefore UDF processing, are in place. For example, if a query takes longer to process than a hardcoded threshold, processing is automatically terminated.
Avoid creation of arrays in a looping construct. Each iteration creates a separate new array with all items and copies the data for each modification, leaving the prior array in memory for automated clean up later. Use a lambda expression instead of the loop.
Avoid concatenating strings in a looping construct. Each iteration creates a separate new string and copying the old string for each modification, leaving the prior string in memory for automated clean up later. Use a lambda expression instead of the loop.
Most UDFs should declare the
RETURNS NULL ON NULL INPUT
characteristics unless the code has some special handling for null values. You must declare this explicitly sinceCALLED ON NULL INPUT
is the default characteristic.