Wow! If you would have to sum up what happened in the last year in this great community, wow would be it. It is truly awe-inspiring to be part of this incredible journey of Trino. Oh yeah, on that note. Our community and project chose the new name Trino, to be able to continue to innovate and develop freely as a community of peers. Presto® and Presto® SQL are a thing of the past.
Now that is out of the way, let’s dive right in and see what all our community members across the globe have created with us!
2019 was a big year for us, but check out how 2020 eclipsed even that!
By the numbers
Even the size and growth of our community on Slack is impressive:
- Started in January 2020 with ~1600 members and 280 weekly active
- Over 3200 members by December 2020
- 560 members active weekly
The innovation and change of the source code on GitHub is a result of the hard work of the community:
- Over 4000 commits merged
- More than 2800 pull requests received
- 23 releases, nearly every two weeks basically!
As you can see, much of the excitement around the name change has quickly increased the number of stars we have on GitHub. While some of this certainly stems from an initial buzz around a shiny new name, we also believe that this name change has brought clarity to the community. Trino is an improved version, supported by the founders and creators of Presto®, along with the major contributors.
Features and code
While everything mentioned is already exciting, the true work is visible in the new features and improvements in Trino. It is a long list, but read on. You won’t want to miss anything.
Improvements to ANSI SQL support
A core feature of Trino is the ability to use the same standard SQL for any connected data source. These improvements empower all users.
- Variable-precision temporal types, with precision down to picoseconds (10−12s). This a very important feature for any time critical systems such as financial transactions processing
- Correct, and now SQL specification compliant timestamp semantics, making migration of SQL statements from other compliant systems such as many RDBMSs easier
- Implicit coercions for
- Support for
GROUPS-based window frames
- More support for various shapes of correlated subqueries
- Support for
- Parameter support in
FETCH FIRST, and
- Experimental support for recursive queries
- Enforcement of
NOT NULLconstraints when inserting data
- Quantified comparisons (e.g.,
> ALL (...)) in aggregation queries
Other query improvements
A number of other features were added to make querying your data sources with Trino even more powerful:
- T-digest data type and functions for approximate quantile computations
- Support for setting and reading column comments
- Numerous new functions including
Trino is already ludicrously fast. But then again, even faster is better, so we worked on that:
- Improved pushdown of complex operations into connectors, including aggregation pushdown and TopN pushdown.
- Dynamic filtering and partition pruning, which can improve performance of highly selective joins manyfold.
- Cost-based decisions for queries containing
- Information_schema performance improvements, which benefit third-party BI tools that need to inspect table metadata, for example DBeaver, Datagrip, Power BI, Tableau, Looker, and others.
- Faster queries on nested data in Parquet and ORC.
- Faster and more accurate
approx_percentile, based on t-digest data structure.
- Support of Bloom filters in ORC.
- Experimental, optimized Parquet writer.
The more data you access with Trino, the more it becomes critical to secure it. With that in mind we added a lot of improvements:
- The Web UI now requires authentication. Various actions such as viewing query details, killing queries, etc., are protected with authorization checks based on the identity of the user. Additionally, the UI now supports OAuth2 for user identification.
- External and internal APIs are now properly secured with authentication and authorization checks. Importantly, this fixes a CVE reported vulnerability that affects all older versions of Presto®.
- A new mechanism to externalize secrets in configuration files that makes it easier to integrate with third-party secret managers and deployment tools.
- Support for JSON Web Key (JWK) authentication and pluggable certificate authenticators.
- Add new Salesforce authenticator.
- The query engine and access control SPIs now support injecting row filters and column masks.
- New syntax for managing permissions (
ALTER TABLE/SCHEMA/VIEW ... SET AUTHORIZATION).
Trino empowers you to use one platform to access all data sources. Connectors enable this and we added numerous new connectors:
All other connectors received a large host of improvements. Let’s just look at two popular connectors:
Hive connector for HDFS, S3, Azure and cloud object storage systems
- Complex Hive views, allows integration with Hive or simplifying migration from Hive
- ACID transactional tables with
- Built-in storage caching and support for external caching with Alluxio
- New procedures:
- Support for Azure object storage
- Support for S3 encrypted files, flexible S3 security mappings and Intelligent-Tiering S3 storage
The Elasticsearch connector received numerous powerful improvements:
- Password authentication
- Support for index aliases
- Support for array types, Nested, and IP type
- Support for Elasticsearch 7.x
Operating and maintaining a Trino cluster takes a significant amount of resources. So any work to improve the runtime needs have a significant positive impact:
- Requirement to use Java 11, with better GC performance, overall performance, and improved container support
- Support for ARM64-based processors to run Trino
- Support for minimum number of workers before query starts, useful for implementing autoscaling
- Data integrity checks for network transfers to prevent data corruption during processing
There is so much more to capture, and you really would have to read all the release notes in detail to know it all. To safe you from that, here are a few more noteworthy changes:
- Experimental support for materialized views in Iceberg connector
- JDBC driver backward compatibility tests
- Support for multiple event listeners
- Added Python client support for exec with parameters
- New look and navigation for the documentation, and lots of new content
Community resources and events
Beyond the raw code and helping each other, the community collaborated on other helpful resources like books and in-depth video tutorials.
Brian and Manfred launched the live streaming event Trino Community Broadcast, and grew their audience and back catalog to include some very useful material. If you have not seen it yet, go and watch some old episodes and join us in the next ones.
We also had a number of other online events and presentations, with direct participation of our community members:
- A dedicated conference event for the community in Japan was very successful.
- The Argentina Big Data Meetup had a large audience from the community in South America
A series of virtual events around the project started with a roadmap and overview meeting and included a number real world use case examples at scale:
Another series of training classes with the project founders was hugely successful. It includes very valuable content for any Trino user, from beginners to experts, that you should not miss:
- Advanced SQL in Trino with David
- Understanding and Tuning Trino Query Processing with Martin
- Securing Trino with Dain
- Configuring and Tuning Trino with Dain
2020 was a wild ride for us all. Trino and the Trino community definitely emerged as a winner, and we are looking forward to a very bright future with you all.
A couple of ongoing work is already underway and very promising:
- Optimized Parquet reader, on par with ORC reader support
- Support for SQL
- Oauth2 support for JDBC
- Support for SQL
We’re starting the new year with a shiny new name, a cute little bunny, and a very vibrant community. The future is looking great for Trino!
Don’t hesitate and miss out on all the benefits of Trino. Join us on Slack to get started!