Jordan Zimmerman and Pablo Arteaga introduce us to a new Trino subproject and its use cases with AWS S3 and other storage systems, metastores, and query engines - aws-proxy.
Trino 478 is in the final staging of getting to release. We will talk about the details in the next episode.
Manfred chats with Pablo and Jordan about their involvement in the Trino community. We end up chatting a bunch about the Airlift framework that is a foundation for Trino since Jordan has been involved in that project for a long time. Pablo has been involved in Trino itself and worked on the OPA plugin and the Trino Gateway, among other things.
The AWS Proxy is an open-source Java toolkit and library, not a standalone application, designed to act as a transparent proxy for AWS Simple Storage Service (S3) compatible object storage protocols.
It was created by developers from Starburst, Bloomberg and other organizations in the Trino community to address the need for enhanced governance and security with tools like Apache Spark that lack security controls. It also supports direct data access to S3 or S3-compatible systems, like MinIO or Dell ECS.
In essence, it takes standard S3 requests from data tools and mediates them, applying security, control, and abstraction before forwarding them to the actual data lake storage.