Trino in 2020 - An amazing year in review

Jan 8, 2021 • Martin Traverso, Manfred Moser, Brian Olsen

Wow! If you would have to sum up what happened in the last year in this great community, wow would be it. It is truly awe-inspiring to be part of this incredible journey of Trino. Oh yeah, on that note. Our community and project chose the new name Trino, to be able to continue to innovate and develop freely as a community of peers. Presto® and Presto® SQL are a thing of the past.

Now that is out of the way, let’s dive right in and see what all our community members across the globe have created with us!

2019 was a big year for us, but check out how 2020 eclipsed even that!

By the numbers #

Even the size and growth of our community on Slack is impressive:

Started in January 2020 with ~1600 members and 280 weekly active
Over 3200 members by December 2020
560 members active weekly

The innovation and change of the source code on GitHub is a result of the hard work of the community:

Over 4000 commits merged
More than 2800 pull requests received
23 releases, nearly every two weeks basically!

As you can see, much of the excitement around the name change has quickly increased the number of stars we have on GitHub. While some of this certainly stems from an initial buzz around a shiny new name, we also believe that this name change has brought clarity to the community. Trino is an improved version, supported by the founders and creators of Presto®, along with the major contributors.

And if you have not done so already, make sure to star the repository and join us on slack.

Features and code #

While everything mentioned is already exciting, the true work is visible in the new features and improvements in Trino. It is a long list, but read on. You won’t want to miss anything.

Improvements to ANSI SQL support #

A core feature of Trino is the ability to use the same standard SQL for any connected data source. These improvements empower all users.

Variable-precision temporal types, with precision down to picoseconds (10⁻¹²s). This a very important feature for any time critical systems such as financial transactions processing
Correct, and now SQL specification compliant timestamp semantics, making migration of SQL statements from other compliant systems such as many RDBMSs easier
Implicit coercions for INSERT clause
Support for RANGE and GROUPS-based window frames
More support for various shapes of correlated subqueries
Support for INTERSECT ALL and EXCEPT ALL
Parameter support in LIMIT, FETCH FIRST, and OFFSET clause
Experimental support for recursive queries
Enforcement of NOT NULL constraints when inserting data
Quantified comparisons (e.g., > ALL (...)) in aggregation queries

Other query improvements #

A number of other features were added to make querying your data sources with Trino even more powerful:

T-digest data type and functions for approximate quantile computations
Support for setting and reading column comments
Numerous new functions including concat_ws(), regexp_count(), regexp_position(), contains_sequence(), murmur3(), from_unixtime_nanos(), from_iso8601_timestamp_nanos(), human_readable_seconds(), bitwise operations, luhn_check(), approx_most_frequent(), translate(), starts_with()

Performance #

Trino is already ludicrously fast. But then again, even faster is better, so we worked on that:

Improved pushdown of complex operations into connectors, including aggregation pushdown and TopN pushdown.
Dynamic filtering and partition pruning, which can improve performance of highly selective joins manyfold.
Cost-based decisions for queries containing IN <subquery> in WHERE clause.
Information_schema performance improvements, which benefit third-party BI tools that need to inspect table metadata, for example DBeaver, Datagrip, Power BI, Tableau, Looker, and others.
Faster queries on nested data in Parquet and ORC.
Faster and more accurate approx_percentile, based on t-digest data structure.
Support of Bloom filters in ORC.
Experimental, optimized Parquet writer.

Security #

The more data you access with Trino, the more it becomes critical to secure it. With that in mind we added a lot of improvements:

The Web UI now requires authentication. Various actions such as viewing query details, killing queries, etc., are protected with authorization checks based on the identity of the user. Additionally, the UI now supports OAuth2 for user identification.
External and internal APIs are now properly secured with authentication and authorization checks. Importantly, this fixes a CVE reported vulnerability that affects all older versions of Presto®.
A new mechanism to externalize secrets in configuration files that makes it easier to integrate with third-party secret managers and deployment tools.
Support for JSON Web Key (JWK) authentication and pluggable certificate authenticators.
Add new Salesforce authenticator.
The query engine and access control SPIs now support injecting row filters and column masks.
New syntax for managing permissions (GRANT/REVOKE on schema, ALTER TABLE/SCHEMA/VIEW ... SET AUTHORIZATION).

Data sources #

Trino empowers you to use one platform to access all data sources. Connectors enable this and we added numerous new connectors:

All other connectors received a large host of improvements. Let’s just look at two popular connectors:

Hive connector for HDFS, S3, Azure and cloud object storage systems #

Complex Hive views, allows integration with Hive or simplifying migration from Hive
ACID transactional tables with INSERT and DELETE support
Built-in storage caching and support for external caching with Alluxio
New procedures: system.drop_stats(), register_partition(), unregister_partition()
Support for Azure object storage
Support for S3 encrypted files, flexible S3 security mappings and Intelligent-Tiering S3 storage

Elasticsearch connector #

The Elasticsearch connector received numerous powerful improvements:

Password authentication
Support for index aliases
Support for array types, Nested, and IP type
Support for Elasticsearch 7.x

Runtime improvements #

Operating and maintaining a Trino cluster takes a significant amount of resources. So any work to improve the runtime needs have a significant positive impact:

Requirement to use Java 11, with better GC performance, overall performance, and improved container support
Support for ARM64-based processors to run Trino
Support for minimum number of workers before query starts, useful for implementing autoscaling
Data integrity checks for network transfers to prevent data corruption during processing

Everything else #

There is so much more to capture, and you really would have to read all the release notes in detail to know it all. To safe you from that, here are a few more noteworthy changes:

Experimental support for materialized views in Iceberg connector
JDBC driver backward compatibility tests
Support for multiple event listeners
Added Python client support for exec with parameters
New look and navigation for the documentation, and lots of new content

Community resources and events #

Beyond the raw code and helping each other, the community collaborated on other helpful resources like books and in-depth video tutorials.

Matt, Manfred, and Martin published the book Trino: The Definitive Guide with O’Reilly. Over 5000 readers took advantage of the free digital copy.

Brian and Manfred launched the live streaming event Trino Community Broadcast, and grew their audience and back catalog to include some very useful material. If you have not seen it yet, go and watch some old episodes and join us in the next ones.

We also had a number of other online events and presentations, with direct participation of our community members:

A dedicated conference event for the community in Japan was very successful.
The Argentina Big Data Meetup had a large audience from the community in South America

A series of virtual events around the project started with a roadmap and overview meeting and included a number real world use case examples at scale:

Another series of training classes with the project founders was hugely successful. It includes very valuable content for any Trino user, from beginners to experts, that you should not miss:

Conclusion #

2020 was a wild ride for us all. Trino and the Trino community definitely emerged as a winner, and we are looking forward to a very bright future with you all.

A couple of ongoing work is already underway and very promising:

Optimized Parquet reader, on par with ORC reader support
Support for SQL UPDATE and MERGE statements
Oauth2 support for JDBC
Support for SQL WINDOW clause and MATCH_RECOGNIZE usage

We’re starting the new year with a shiny new name, a cute little bunny, and a very vibrant community. The future is looking great for Trino!

Don’t hesitate and miss out on all the benefits of Trino. Join us on Slack to get started!

Do you ❤️ Trino? Give us a 🌟 on GitHub

Trino blog