Do you ❤️ Trino? Give us a 🌟 on GitHub

Trino Community Broadcast

49: Trino, Ibis, and wrangling Python in the SQL ecosystem

Jul 6, 2023

Audio

 

Video

Video sections

Hosts

Guest

Releases 419-420

Official highlights from Martin Traverso:

Trino 419

  • New array_histogram function.
  • Faster reading and writing of Parquet data.
  • Support for Nessie catalog in Iceberg connector.

Trino 420

  • Underscores in numeric literals (e.g. 1_000_000)
  • Hexadecimal, binary and octal numeric literals (e.g., 0x1a, 0b1010, 0o12)
  • Support for comments on view columns in Delta Lake connector.
  • Support for RENAME COLUMN in MongoDB connector.
  • Support for mixed case table names in Druid connector.
  • Faster queries when statistics are unavailable.

Question of the episode: What is Ibis?

Taken straight from the Ibis website, Ibis is a dataframe interface to execution engines with support for 15+ backends (including Trino!). Ibis doesn’t replace your existing execution engine, it extends it with powerful abstractions and intuitive syntax.

For those who love doing all their data-related work in Python, this allows you to write Python code that leverages the speed and power of Trino without needing to become a SQL master. For the die-hard SQL users out there, they have a guide on Ibis for SQL users that explains how it fully replaces SQL with Python code that is:

  • Type-checked and validated as you go.
  • Easier to write. Pythonic function calls with tab completion in IPython.
  • More composable. Break complex queries down into easier-to-digest pieces.
  • Easier to reuse. Mix and match Ibis snippets to create expressions tailored for your analysis.

Even if you’ve been writing SQL queries since day 1 and swear by it, opening the door to using Python for analytics creates many new possibilities, widens the possible talent pool you can work with, and gives you an entire second ecosystem to integrate with.

And ultimately, at the end of the day, the idea is that you get the ease of writing Python code with the power and performance of a blazing fast SQL engine. You get the best of both worlds, and using Ibis doesn’t lock you out of rolling up your sleeves and writing some SQL when a situation calls for it.

And you don’t need to learn different SQL dialects

Trino more or less adheres to ANSI SQL, but it implements some ANSI features that are rarely seen in other query engines, and other query engines choose to deviate in a variety of ways. This can be a headache if you’re migrating to Trino, as queries need to be re-written, re-structured, and tested to make sure they return the same results. If you got set up with Ibis, first, it would do that thinking for you, and a Python query could be converted to whatever dialect of SQL you need without any issue. It can save time, effort, headaches, or a sense of being locked into a specific SQL dialect, freeing you up to move between query engines without any pain points… because of course, you want to move to Trino, which is the best query engine.

It also needs pointing out that this allows you to federate your queries while you federate your queries.

Concept of the episode: Converting Python to SQL

Take some Python like so:

>>> import ibis
>>> movies = ibis.examples.ml_latest_small_movies.fetch()
>>> rating_by_year = movies.group_by('year').avg_rating.mean()
>>> q = rating_by_year.order_by(rating_by_year.year.desc())

And Ibis can automatically turn it into SQL that executes on Trino:

>>> con.compile(q)

SELECT year, avg(avg_rating)
FROM movies t1
GROUP BY t1.year
ORDER BY t1.year DESC

Obviously, this example is lightweight, but as queries grow more complex and sophisticated, the conversion becomes more and more worthwhile. And we mentioned that the Python code is easier to re-use, but it really is - if you want to run a similar query in conjunction with the query above, those movies and rating_by_year variables still exist, and writing some code to leverage them is a lot easier and more intuitive than setting up SQL sub-queries and aliases.

Questions for Phillip

  • Why is it called Ibis?
  • How much of a normal SQL workload do you think could be handled and run by Ibis?
  • How much can Ibis optimize SQL queries for performance?
  • Which SQL dialect has been the worst to deal with?

PR of the episode: #15026: Support INSERT in Google Sheets connector

Google Sheets is one of our not-as-talked-about connectors in Trino, but it still sees use and community updates, and we want to give that a shoutout in today’s Trino Community Broadcast. #15026 from Sebastien Bernauer adds INSERT support to the connector, so now you can read and write from Google Sheets in Trino, empowering the world of SQL-on-spreadsheets.

PR of the episode: #477 on trino.io: Add Mateusz Gajewski to maintainer list

We’ve added another maintainer to Trino! We just spend an episode introducing Manfred and James Petty as maintainers, and Mateusz is right behind them after years of effort helping Trino as a contributor and reviewer.

Trino events

Trino Fest wrapped up a few weeks ago, and we’re publishing recaps of all the talks to the Trino blog! Keep an eye on our YouTube channel and the Trino website to catch up on everything you missed.

If you have an event that is related to Trino, let us know so we can add it to the Trino events calendar.

Rounding out

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.