Do you ❤️ Trino? Give us a 🌟 on GitHub

Trino Community Broadcast

68: Year of the Snake - Python UDFs

Jan 16, 2025

Introduction

Manfred and Cole are joined by David Phillips to talk about the new support of user-defined functions written in Python. We discuss motivation, development history, dive into implementation details, and explore some examples.

Video

Audio

 

Host

Guests

Releases

Follow are some highlights of the Trino releases since episode 67:

Trino 465

  • Add support for customer-provided SSE key in S3 file system relevant for Hive, Iceberg, Delta Lake and Hudi connectors.
  • Deterministic data, locale support, and random_string function for the Faker connector.
  • Add support for extra_properties in the Iceberg connector.
  • Add support for the geometry type in the PostgreSQL connector.

Trino 466

  • Remove Python requirement for Trino by replacing the launcher script.
  • Improve client protocol throughput by introducing the spooling protocol and ship it with documentation, including implementation in the JDBC driver and the CLI.
  • Add support for data access control with Apache Ranger, including support for column masking, row filtering, and audit logging.

Trino 467

  • Change default for internal communication to HTTP/1.1.
  • Add support for OpenTelemetry tracing to the HTTP, Kafka, and MySQL event listeners.
  • Remove the microdnf package manager from the Docker image.
  • Add the $all_manifests metadata tables in the Iceberg connector.
  • Add the $transactions metadata table in the Delta Lake connector.

Trino 468

  • Add Python user-defined functions.
  • Rename SQL routines to SQL user-defined functions.
  • Add cluster overview to the Preview Web UI.
  • Improve bucket execution for Hive and Iceberg.
  • Add support for non-transactional MERGE statements for PostgreSQL.

As always, numerous performance improvements, bug fixes, and other features were added as well.

Other news

User-defined functions in Trino

First there were custom plugins with user defined functions, and for a long time, that was all there is.

In 2023, David contributed SQL user-defined functions, also known as SQL routines, and we ran a competition for examples. Manfred wrote the docs and did a training session with Dain and Martin. And even back then, David had plans to add other languages, and started working on Python.

At Trino Summit in 2024 Martin Traverso announced the new upcoming feature in the keynote, and with Trino 468 we shipped support for Python user-defined functions.

Motivation

Why support Python for user-defined functions, as compared to just SQL? Simply put, more is better, and Python is everywhere. We chat with David about the details.

Development history and collaboration

David tell us more about figuring out how to make it all work at all. He touches on topics such as security, performance, deployment, monitoring, and collaboration with other projects. We also talk about why other approaches like using local CPython were discarded.

Architecture and consequences

In this discussion we talk try to cover the following topics:

  • How does it all work?
  • What are some restrictions?
  • What performance can users expect?

Let’s chat about this nesting:

Examples and demo

A simple example from the documentation:

FUNCTION python_udf_name(input_parameter data_type)
  RETURNS result_data_type
  LANGUAGE PYTHON
  WITH (handler = 'python_function')
  AS $$
  ...
  def python_function(input):
      return ...
  ...
  $$

David shows us more, and we talk about the details.

Feedback and future work

We are looking for feedback:

  • More examples for the documentation for our users
  • Use cases and experience testing the feature
  • Production deployment experiences

Future work depends on the feedback but definitely includes the following:

  • Performance improvements
  • Fine-tuning of available Python packages

Resources

Rounding out

  • You are all invited to chat with us about development at the Trino contributor call on the 23rd of January.
  • Join us on the 30th of January with Mateusz Gajewski to learn about client protocol improvements.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.