Do you ❤️ Trino? Give us a 🌟 on GitHub

Ecosystem

Data lake components

Trino is designed as a query engine for data lakes and data lake houses. A complete lake or lake house uses numerous components including object storage systems, table formats, metastores often also called metadata catalogs, file formats, and other tools.

Configure a catalog with the required Trino connector for the specific data table format, and configure object storage and metastore to access the data. With Trino you are ready to use any supported client to query the lake using SQL and the features of your client.

Official data lake components #

Support for the following data lake or lake house components is developed and maintained by the Trino community.

Alluxio #

Integration developed and maintained by the Trino community

Alluxio provides a single pane of glass for enterprises to manage data and AI workloads across diverse infrastructure environments with ease. Alluxio Data Platform has two product offerings, Alluxio Enterprise Data and Alluxio Enterprise AI.

Alluxio provides an open source object storage caching solution that is the base of the file system cache and the Alluxio file system support in Trino. The commercial platform with its distributed block-level read/write caching functionality can be used for further integration.

Amazon S3 #

Integration developed and maintained by the Trino community

Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Millions of customers of all sizes and industries store, manage, analyze, and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and easy-to-use management features, you can optimize costs, organize and analyze data, and configure fine-tuned access controls to meet specific business and compliance requirements.

Use the S3 file system support with the Delta Lake, Hive, Hudi or Iceberg connectors to access data in S3.

Apache Hive #

Integration developed and maintained by the Trino community

Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale and facilitates reading, writing, and managing petabytes of data residing in distributed storage using SQL. It is part of the larger Apache Hadoop project.

Use an Apache Hive data warehouse as a data source in Trino by configuring a catalog with the Hive connector. Use the Hive metastore service as a metastore with the Delta Lake, Hive, Hudi, and Iceberg connectors. Use the Hadoop Distributed File System (HDFS) file system support as file system with the Delta Lake, Hive, Hudi, and Iceberg connectors.

Apache Hudi #

Integration developed and maintained by the Trino community

Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics.

Use an Apache Hudi data lake as a data source in Trino by configuring a catalog with the Hudi connector.

Apache Iceberg #

Integration developed and maintained by the Trino community

Apache Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time.

Use an Apache Iceberg data lakehouse as a data source in Trino by configuring a catalog with the Iceberg connector.

Apache ORC #

Integration developed and maintained by the Trino community

Apache ORC is a self-describing type-aware columnar file format designed for Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly. Storing data in a columnar format lets the reader read, decompress, and process only the values that are required for the current query.

Access ORC files from Trino with the built-in readers and writers used by the Hive or Iceberg connector.

Apache Parquet #

Integration developed and maintained by the Trino community

Apache Parquet Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming language and analytics tools…

Access ORC files from Trino with the built-in readers and writers used by the Delta Lake, Hive, Hudi or Iceberg connectors.

Apache Polaris #

Integration developed and maintained by the Trino community

Apache Polaris is an open-source, fully-featured catalog for Apache Iceberg™. It implements Iceberg’s REST API, enabling seamless multi-engine interoperability across a wide range of platforms,

Use Lakekeeper with the Trino Iceberg connector and the configuration for a Iceberg REST catalog.

Azure Storage #

Integration developed and maintained by the Trino community

Azure Storage is a scalable, durable, and secure cloud storage solution from Microsoft. It offers a variety of storage options, including Blob Storage, which is ideal for data lakes and lakehouses. Azure Storage provides high availability, strong consistency, and disaster recovery capabilities, making it suitable for storing and managing large volumes of unstructured data. With Azure Storage, you can easily integrate with other Azure services and third-party tools to build comprehensive data solutions.

Use Azure Storage with the Delta Lake, Hive, Hudi, or Iceberg connectors to access data in Azure Storage.

Delta Lake #

Integration developed and maintained by the Trino community

Delta Lake is an open source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.

Use a Delta Lake data lakehouse as a data source in Trino by configuring a catalog with the Delta Lake connector.

Google Cloud Storage

Google Cloud Storage #

Integration developed and maintained by the Trino community

Google Cloud Storage is a scalable, secure, and durable object storage service provided by Google Cloud. It offers high availability and performance, making it suitable for a wide range of use cases, including data lakes and lakehouses. Google Cloud Storage integrates seamlessly with other Google Cloud services and third-party tools, providing a comprehensive solution for storing, managing, and analyzing large volumes of unstructured data.

Use the Trino Google Cloud Storage file system support with the Delta Lake, Hive, Hudi, or Iceberg connectors to access data in Google Cloud Storage.

MinIO #

Integration developed and maintained by the Trino community

MinIO is a high-performance, distributed object storage system designed for data lake and lakehouse use cases. It is compatible with the Amazon S3 API, making it easy to integrate with existing applications and tools that support S3. MinIO is optimized for large-scale data storage and retrieval, providing high availability, durability, and performance.

Use the Trino S3 file system support with the Delta Lake, Hive, Hudi, or Iceberg connectors to access data in MinIO.

Other data lake components #

Support for the following data lake or lake house components is developed and maintained by other communities and vendors.

Backblaze #

Backblaze B2 Cloud Storage is a scalable and cost-effective cloud storage solution designed for storing and managing large amounts of data. It offers high durability, availability, and performance, making it suitable for a wide range of use cases, including backups, data archiving, and serving media files. Backblaze B2 provides easy integration with various tools and services, allowing seamless data access and management.

Use the Trino S3 file system support with the Delta Lake, Hive, Hudi, or Iceberg connectors to access data in Backblaze B2.

Gravitino #

Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages the metadata directly in different sources, types, and regions. It also provides users with unified metadata access for data and AI assets, and is available as an open source project.

Use Gravitino as a data source in Trino by configuring a catalog with the Gravitino connector.

Lakekeeper #

Lakekeeper is an Apache-Licensed, secure, fast, and easy-to-use Apache Iceberg REST Catalog written in Rust. It provides a scalable solution for managing Lakehouses with enterprise-grade governance. Key features include Kubernetes and OpenID integration, fine-grained access control, and multi-tenancy. Designed for interoperability, it supports major cloud providers as well as on-premise deployments. Lakekeeper emits change events, ensures high availability, and offers a lightweight, easily deployable architecture. Prioritizing ecosystem-wide compatibility, it empowers organizations to manage their Lakehouse infrastructure without vendor lock-in.

Use Lakekeeper with the Trino Iceberg connector and the configuration for a Iceberg REST catalog.

Unity Catalog #

Note that the support for Unity Catalog via the HTTP protocol is deprecated by Unity Catalog and is removed from Trino 473 and later. Users must retrieve and update the code from the source code repository, use an old Trino version, or provide an implementation for a new Unity Catalog integration. The Iceberg REST catalog support can be used with the Iceberg connector.

Unity Catalog is a unified governance solution for data and AI assets in the cloud. It provides a centralized metadata store, fine-grained access controls, and audit logging to help manage and secure your data. Unity Catalog simplifies data discovery, access, and sharing across your organization, ensuring compliance and enhancing collaboration.

Use Unity Catalog with the Trino Iceberg connector and the configuration for a Iceberg REST catalog with Trino.

Table of contents

Official data lake components
Other data lake components