Official highlights from Martin Traverso:
table_changesfunction in Delta Lake connector.
We’ve put out nearly 50 Trino Community Broadcast episodes, but we haven’t yet done the simplest, most obvious topic of them all - an exploration of what Trino is, how Trino works, and how you can run it. This week, we’re taking a step back and doing a broader overview of those things, because the world needs to know… what is Trino?
If you check the Trino documentation, it starts with a definition of what Trino isn’t. But we’ll start with what Trino is: a distributed SQL query engine written in Java. If you have a SQL query, Trino can process and run it on an extremely wide variety of data sources and return a result to you that you’d expect from that SQL query. It can run queries on traditional relational databases like Oracle, MySQL, and PostgreSQL; it works on data likes like Hive, Iceberg, Delta Lake, and Hudi; and it runs on no-SQL databases like Cassandra and MongoDB. You give Trino a query, Trino gives you results. And the best part is that it doesn’t just work, it works blazing fast.
The key thing to point out is that Trino does not store data, and it is not a database on its own. It is a query engine, designed to sit on top of databases and provide an ANSI-standard SQL interface to query whatever you’re storing your data in. In order to use Trino, you need to start by having data stored somewhere else. Of course, Trino can write data to those underlying sources with the same SQL syntax, so for the end user, it can be an all-in-one interface to those underlying data sources, an abstraction that saves users from needing to understand the differences between data being stored in Iceberg and data being stored in Oracle.
Trino uses a distributed architecture, with a singular coordinator node that schedules and orchestrates the workload, as well as many worker nodes that carries out tasks and processes data.
The better question might be “how can’t you run Trino?” As the project has matured, it’s been added to various third-party tools and integrated into different apps that help make it easier to run than ever before. We have some exciting news to share on that front soon, but for now, the biggest ways to run Trino include:
You can directly download the Trino server, manually configure it, and start it up like any other program. Clients can connect to the server from there, utilizing the web interface or the CLI to run queries. This is the most manual way to set up Trino, but it works, and it doesn’t depend on anything else. Our docs go into a ton of detail on this process.
Trino provides a Docker image that can be run through the Docker software. You start by downloading and installing Docker, create a container from the Trino image, and then you can run that image to immediately get Trino up and running. No manual configuration needed, no messing around with creating directories or files, it just works. It’s perhaps the simplest way to get Trino off the ground, and recommended for anyone trying to run it independently just to fiddle around with it. As always, you can refer to the docs for more information.
Trino provides a Helm chart for use with Kubernetes, so after setting up Kubernetes, kubectl, and Helm, you can install Trino on your Kubernetes cluster with Helm. It comes with the same pre-configured image as Docker, so there’s no need to manually set that up, but in order to run queries, you’ll also need to set up a tunnel between the coordinator pod within Kubernetes and whatever machine you want to run those queries on. If this is the right setup for you, you probably already know that, and you don’t need us to go into more detail. More info is in the Trino docs.
On the most basic side of things, Trino provides a command-line interface and a web UI. If you want something more robust, a couple open source clients have been made in the community - one written for Python and one written in Go. There’s a couple other Python clients that will be even easier to run coming soon, and we’ll be hearing from them at Trino Fest in just two weeks.
On the not-so-free side of things, Starburst Galaxy and AWS Athena offer Trino as a cloud service, which can make life even easier.
We’ve got a page on the website dedicated to the contribution process, though we’d like to welcome anyone and everyone listening to take a crack at contributing to Trino if it’s something you’re interested in. Open source projects can always use more help, and we’d like to see community contributions whenever. From that process page, the steps are:
Nessie is a transactional catalog designed for use with data lakes like Iceberg and Delta Lake. Its key selling point is git-like version control, making it easy to view history, roll back, and see who made what adjustments when. PR #11701 allows Trino’s Iceberg connector to query Nessie, adding yet another tool and opportunity for query federation to Trino’s belt.
And though we hate to say it, Nessie might just be the only other project in the world with a mascot that can compete with Commander Bun Bun.
Coming up in just two weeks, Trino Fest is a two-day event that will feature talks from a wide range of speakers surrounding the Trino ecosystem. As already hinted at, we’ll be hearing from a couple new Python clients, from Trino users sharing tips and tricks to maximize the utility of the software, and from community contributors adding exciting new features and extensions to Trino.
Register to attend if you’re interested and want to tune in to an awesome speaker lineup! It’s virtual and completely free to attend, so all you’ve got to do is sign up.
If you have an event that is related to Trino, let us know so we can add it to the Trino events calendar.
If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.
Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.