If “Wrapped” is good enough for Spotify, it’s good enough for Trino, right? As we look forward to a bright 2024, we can also take a moment to get sentimental, look back at everything we’ve accomplished, and reflect on the progress we’ve made. Commander Bun Bun has been hard at work, so if you haven’t been paying close attention to Trino or want an idea of all that went down in 2023, we’re happy to present you with an end of year recap. We’ll be exploring what’s gone on in the community, on development, the events we’ve hosted, and discuss the cool new features and technologies you can use when you’re running Trino.
2023 by the numbers #
- 64,288 views 👀 on YouTube
- 5,872 hours watched ⌚on YouTube
- 5,018 new commits 💻 in GitHub
- 2,985 new stargazers ⭐ in GitHub
- 2,494 pull requests merged ✅ in GitHub
- 1,227 issues 📝 created in GitHub
- 704 new subscribers 📺 in YouTube
- 45 videos 🎥 uploaded to YouTube
- 30 Trino 🚀 releases
- 39 blog ✍️ posts
- 10 Trino Community Broadcast ▶️ episodes
- 2 Trino ⛰️ Summits
We’re excited to say that Trino continued to grow in 2023:
- GitHub stars increased by nearly 50% total and by 8% more than last year
- Commits increased by 7%
- Slack usage picked up dramatically
- YouTube viewership was up 7% despite a lack of Pokemon-themed musical content compared to 2022 (our bad)
- 30 releases kept new versions of Trino coming out more than every other week.
Thanks in part to all that growth, it’s more important than ever to be on our Slack. If you’re a Trino user or community member and aren’t already on there, you’re missing out! Make sure to join up for community announcements, release statuses, the shared expertise of the entire Trino community, and event-specific channels for discussion when we’re hosting things like Trino Fest and Trino Summit. Speaking of those…
Trino events #
One of the best parts of being an open source community is that it’s easy to be excited and connect with others about using such a cool piece of technology. Whether that’s bringing Trino to new users who can take advantage of it, or sharing our learnings with other Trino users to make the most, events are one of the best ways to distribute that knowledge. So what were we up to this year?
Trino Fest and Trino Summit #
Trino Fest and Trino Summit are becoming mainstays on the Trino calendar each year, and 2023 was no different. Formerly “Cinco de Trino,” we ditched the Cinco de Mayo theme and went with the simpler “Trino Fest” in June, opting to theme it around Commander Bun Bun’s Lake House Summer Camp, with a focus on integrating Trino with lakehouse and data lake architectures. Trino Summit only wrapped up a little over a month ago, rounding out the year and highlighting some amazing developments that we’ll be talking about later in this blog post.
Trino Fest has historically been the smaller event, but it did some catching up in 2023, as both Trino Fest and Trino Summit were made virtual and expanded to 2 days this year. Easier to attend than ever before, we reached a combined total of about 1,200 live attendees, with thousands more views on demand.
The lineups were packed with 34 talks across both events, featuring speakers from huge Trino users like Salesforce, Stripe, Apple, and Lyft, as well as from major Trino contributors like Starburst, Tabular, and Bloomberg. You can view recordings of every Trino Fest talk and every Trino Summit talk on the Trino YouTube channel if you missed out.
Meetups and international events #
One of the more exciting developments was our a major event in Japan - Trino Conference Tokyo. A virtual event with four sessions, it brought Trino to a Japanese-speaking audience and further pushed our favorite query engine across language borders. On top of that, Starburst co-hosted a Trino meetup in Bengaluru, and the community organized the first-ever Korean Trino meetup (pictured below).
And last but not least, Trino, the Definitive Guide, 2nd Edition was translated into Mandarin and Polish.
The Trino Gateway #
One of the biggest announcements in the Trino community this year was the launch of the Trino Gateway. A proxy and load-balancer, it’s a crucial piece of Trino infrastructure for organizations that need more than one Trino cluster to suit their needs.
Why would you want more than one Trino cluster? Maybe you want one cluster with fault-tolerant execution enabled for ETL workloads and another cluster for speedy ad-hoc analytics. Perhaps you have analysts performing wildly differently-sized queries, and high-volume compute-intensive queries are proving to be bad neighbors for lightweight and low-latency queries that shouldn’t take more than milliseconds. Historically, users would have to manually manage swapping between clusters, establish a new connection, and try not to get a headache in the process.
Enter the Trino Gateway! By routing all of your Trino traffic automatically, it’s never been easier to manage, maintain, and query multiple Trino clusters at once. Load balancing ensures that no one cluster gets overworked, and it’s the perfect way to stop large queries from getting in the way of the little guys. Add in the fact that you can seamlessly shut down an individual cluster for updates or maintenance while the Trino Gateway routes traffic elsewhere, and it’s easy to see why this is such a game-changer. We’re super excited for it to be out there in the world, and we hope it makes running Trino at the largest scales simpler and faster than ever before.
For more information on the Trino Gateway, check out:
New features #
With more development on Trino than ever before, there were obviously a ton of new things being added to it. Let’s go over some of the biggest adds in 2023.
SQL routines #
Whether you want to refer to them as SQL routines or as user-defined functions, they’re a big deal. Fresh off the presses and only a few months old, they do exactly what you’d expect them to do: you, a user, can define and re-use your own functions! Define and use them inline as part of a query to make that query cleaner, easier, and simpler to understand. Or, if you’re really cooking, you can run a query that defines the routine in the schema of the catalog. This allows other Trino users to access the same routine time and time again as part of their other queries. It’s a level of customization that we’ve never had before in Trino, and no longer do you need to write your own Java plugins to create and re-use functions that do exactly what you need them to do.
If you want to learn more about SQL routines, you can check out the introduction to SQL routines in our documentation, as well as a video from our SQL training series and a few example routines which give a good look at how they can be used.
Schema evolution and dynamic catalogs #
While we’re providing more power, customization, and flexibility to Trino users, it’s also important to highlight just how much has been added this year to make it easier to adjust things on the fly.
Schema evolution in Hive was a big addition, allowing you to alter columns’ data types, rename columns, and handle nested fields when dropping columns. Instead of needing to use the underlying database or modify it some other way and reboot Trino, Trino can handle the adjustments on the fly.
But if you don’t use Hive and are feeling left out, we’ve experimentally taken things one step further in 2023, adding dynamic catalogs to Trino. Rather than adjusting your schema one column at a time, what about adding or dropping an entire catalog in one go? You can do that now. Though it’s currently still bleeding-edge and not ready for widespread use on your important production data sources, we’re looking forward to improving it and making it resilient and stable in 2024.
Project Hummingbird #
Trino has always been about squeezing out every ounce of performance that you can get. Check out our release notes and you’ll see that every version includes at least a couple performance improvements. Over time, these performance improvements add up to a substantial gain, meaning that version-over-version, year-over-year, Trino is always getting faster. Project Hummingbird was a concerted effort this year to take a look at the core engine and make a number of architectural changes paired with small improvements that would add up to something very substantial. The GitHub issue tracking it lists a ton of work that’s been accomplished already, with a lot of that work done in 2023. Though stay tuned for more, because that’s only scratching the surface…
Lakehouse improvements #
Want to leverage the historical log of all actions taken on a table in Hudi? The
new $timeline
system table has you covered. How about in Delta Lake? We’ve got
the table_changes
function for that, and views were added there, too. Too many
metadata tables to list were added to Iceberg, along with the REST, JDBC, and
Nessie catalogs for metadata.
Java 21! #
Java 21. It’s required to run version Trino versions 436 and later. With the upgrade from Java 17 to 21 comes a ton of improvements that will make development on Trino easier and better than ever, which will in turn make it faster and smoother than ever. Though not as huge of a deal as our upgrade to Java 17 last year, expect to see the benefits coming down the pipeline as the engineers working on Trino are able to take advantage of the latest and greatest features in Java.
Trino ecosystem updates #
There’s more to Trino than Trino itself! With community updates and other technologies integrating with Trino, the number of ways you can access and use Trino are always growing. And the number of people taking care of Trino is growing, too.
Python clients #
Trino’s own Python client saw heavy development in 2023. It was updated to support SQLAlchemy 2.0 and had type support fully fleshed out, making it a robust, free, and open-source tool for running your Trino queries.
Elsewhere in the Python ecosystem, we heard from both Fugue and Ibis at Trino Fest, two different Python clients that integrate Trino with Python in new ways. Fugue is a wrapper that helps integrate with other Python tools and clients, and Ibis can help convert your Python code into SQL queries, making it feasible to be a 100% Python-based organization that still leverages the speed and power of a SQL query engine like Trino. We had Phillip Cloud from Voltron Data on for an episode of the Trino Community Broadcast to talk about Ibis in even more detail.
And other clients, too! #
Also on the Trino Community Broadcast repping new client support for Trino in 2023 were Dolphin Scheduler, PopSQL, and Coginiti. Dolphin Scheduler is a workflow orchestrator - and scheduler! - that can be used to routinely run and coordinate Trino queries. PopSQL is like Google Drive for SQL, providing a suite of collaborative tools for editing and working on queries as a team, including synchronous query editing, storing query history, and a robust commenting and feedback system. Coginiti is a high-powered data workspace that connects to Trino among many other things, supporting a host of powerful features that make it easier to reuse code and snippets of queries, as well as featuring embedded variables to minimize redundancy. If you want to learn more about any of these clients, click in on the links above to check out the Trino Community Broadcast where we went in-depth with them!
Oh, and don’t forget the Trino Typescript client, for when you want to work at the beautiful intersection of web development and accessing tons of data.
New maintainers #
Trino saw three new maintainers added to its ranks this year:
Manfred even took the liberty of updating the website’s roles page to list out all our maintainers. Thank you to them for their dedication to making Trino the best it can be, and congratulations to them on their shiny maintainer titles!
Conclusion #
2022 had been the busiest year in Trino’s history, but 2023 has managed to surpass it. If you’re interested in contributing to Trino, make sure to check it out on GitHub. Even if you’re not interested in contributing, give us a star on GitHub, anyway! It’s been a great year for Commander Bun Bun, and we can’t wait to show you what 2024 has in store for everyone’s favorite data rabbit.