Do you ❤️ Trino? Give us a 🌟 on GitHub

Trino Community Broadcast

46: Trino heats up with Ignite

Mar 15, 2023

Audio

 

Video

Video sections

Hosts

Guests

  • Jason, Senior Data Engineer at Shopee.

Releases 408-410

Official highlights from Martin Traverso:

Trino 408

  • New Apache Ignite connector!
  • Add support for writing decimal types to BigQuery.
  • Improve performance when reading structural types from Parquet files in Delta Lake.

Trino 409

  • Support for nested fields in DROP COLUMN.
  • Support for sorted tables in Iceberg.
  • Support for time type in Cassandra.
  • Faster aggregations containing DISTINCT.
  • Faster LIKE with dynamic patterns.

Trino 410

  • Support for the sheet table function in Google Sheets.
  • Better file pruning in Iceberg.

Introducing the Ignite connector to Trino

The Trino Ignite connector was added a couple releases ago in Trino 408. It’s not every day that we add a new connector to Trino, and so the topic of today’s episode is exploring the connector, what it does, and what its use cases are. After that, we are going to talk about the process of coming in as an outside engineer and contributing an entirely new connector to Trino.

What is Ignite?

Apache Ignite is an in-memory distributed database, comparable to others you may be familiar with like Redis and SingleStore. If you’re not familiar with them or with in-memory computing, the gist is that by focusing on using RAM instead of disk storage, you can create a database system which is much faster - the Ignite website advertises 10-1000x improvements. Of course, this is more expensive, too, so it thrives in settings where performance is critical.

With an initial release 7 years ago, Ignite is still a relative newcomer among in-memory databases, coming with modern bells and whistles that has it positioned to become a successor to those other, comparable databases mentioned above. It also has some key functionality that sets it apart, including a fully-distributed architecture which can use disk storage, allowing it to scale horizontally.

Contributing the Ignite connector

The Trino community and developers try their best to be active reviewers, collaborators, and participants on pull requests coming in from outside contributors. Massive contributions like the Ignite connector can take a lot of round trips, back-and-forth discussion, and work from both the contributor and the project’s maintainers to get it into a state where it is ready to merge and go live for users to try out.

To give you an idea, the pull request (PR) to contribute Ignite was opened in mid-June, 2021. It received immediate feedback from a couple maintainers, went through a few round trips with amendments, re-reviews, more edits, and then other reviews. But in an open source environment, each round trip can tend to take longer and longer. Progress stalled in November 2021, and neither Jason nor the maintainers poked the Ignite PR for nearly a year. In October 2022, as part of Trino DevRel’s roundup of stale and out-of-date pull requests, we bumped back into the work that Jason had done. The wheels began to turn again, starting slow but picking up the pace, until it returned to full and active development, with several maintainers checking in frequently until the connector was ready to go. But that’s the story from an observer, and we’ve got Jason here to go into more detail.

Questions for Jason

  • How was the Trino review process?
  • Were there any major lessons you picked up along the way?
  • What tips would you give to someone else looking to add something into Trino?

PR of the episode: #13493: Add support for migrate procedure in Iceberg

If you’ve been in the data space for a while, you may know that there’s a bit of a prevailing current in migrating from Hive to Iceberg. Out with the old, in with the new, and in with the performance gains. Yuya Ebihara, one of the Trino maintainers, has added a table procedure to Trino’s Iceberg connector to make that process much, much simpler. Rather than a slow, manual, and arduous process, if you have a Hive table stored in a file format supported by Iceberg, it’s now as simple as calling the migrate table procedure and letting it run. The procedure copies the schema, partitioning, properties, and location of the source table, then streams in all the data files from the source table to re-build it all in the Iceberg format. Neat, right?

More about Ignite

Trino events

If you have an event that is related to Trino, let us know so we can add it to the Trino events calendar.

Kevin Haley will be hosting an in-person event, Getting to Know Trino, in Boston, Massachusetts on Wednesday, April 5. You need to register in advance, so if you’re in the Boston area and interested in attending, go sign up!

Rounding out

Check out the in-person and virtual Trino Meetup groups.

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.