Official highlights from Martin Traverso:
DROP COLUMN
.DISTINCT
.LIKE
with dynamic patterns.sheet
table function in Google Sheets.The Trino Ignite connector was added a couple releases ago in Trino 408. It’s not every day that we add a new connector to Trino, and so the topic of today’s episode is exploring the connector, what it does, and what its use cases are. After that, we are going to talk about the process of coming in as an outside engineer and contributing an entirely new connector to Trino.
Apache Ignite is an in-memory distributed database, comparable to others you may be familiar with like Redis and SingleStore. If you’re not familiar with them or with in-memory computing, the gist is that by focusing on using RAM instead of disk storage, you can create a database system which is much faster - the Ignite website advertises 10-1000x improvements. Of course, this is more expensive, too, so it thrives in settings where performance is critical.
With an initial release 7 years ago, Ignite is still a relative newcomer among in-memory databases, coming with modern bells and whistles that has it positioned to become a successor to those other, comparable databases mentioned above. It also has some key functionality that sets it apart, including a fully-distributed architecture which can use disk storage, allowing it to scale horizontally.
The Trino community and developers try their best to be active reviewers, collaborators, and participants on pull requests coming in from outside contributors. Massive contributions like the Ignite connector can take a lot of round trips, back-and-forth discussion, and work from both the contributor and the project’s maintainers to get it into a state where it is ready to merge and go live for users to try out.
To give you an idea, the pull request (PR) to contribute Ignite was opened in mid-June, 2021. It received immediate feedback from a couple maintainers, went through a few round trips with amendments, re-reviews, more edits, and then other reviews. But in an open source environment, each round trip can tend to take longer and longer. Progress stalled in November 2021, and neither Jason nor the maintainers poked the Ignite PR for nearly a year. In October 2022, as part of Trino DevRel’s roundup of stale and out-of-date pull requests, we bumped back into the work that Jason had done. The wheels began to turn again, starting slow but picking up the pace, until it returned to full and active development, with several maintainers checking in frequently until the connector was ready to go. But that’s the story from an observer, and we’ve got Jason here to go into more detail.
migrate
procedure in IcebergIf you’ve been in the data space for a while, you may know that there’s a bit of
a prevailing current in migrating from Hive to Iceberg. Out with the old, in
with the new, and in with the performance gains. Yuya Ebihara,
one of the Trino maintainers,
has added a table procedure to Trino’s Iceberg connector
to make that process much, much simpler. Rather than a slow, manual, and arduous
process, if you have a Hive table stored in a file format supported by Iceberg,
it’s now as simple as calling the migrate
table procedure and letting it run.
The procedure copies the schema, partitioning, properties, and location of the
source table, then streams in all the data files from the source table to
re-build it all in the Iceberg format. Neat, right?
If you have an event that is related to Trino, let us know so we can add it to the Trino events calendar.
Kevin Haley will be hosting an in-person event, Getting to Know Trino, in Boston, Massachusetts on Wednesday, April 5. You need to register in advance, so if you’re in the Boston area and interested in attending, go sign up!
Check out the in-person and virtual Trino Meetup groups.
If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.
Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.