Data sources

Data sources provide the data for Trino to query. Configure a catalog with the required Trino connector for the specific data source to access the data. With Trino you are ready to use any supported client to query the data sources using SQL and the features of your client.

Amazon Kinesis #

Amazon Kinesis cost-effectively processes and analyzes streaming data at any scale as a fully managed service. With Kinesis, you can ingest real-time data, such as video, audio, application logs, website clickstreams, and IoT telemetry data, for machine learning (ML), analytics, and other applications.

Use an Amazon Kinesis stream as a data source in Trino by configuring a catalog with the Kinesis connector.

Amazon Redshift #

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning to deliver the best price performance at any scale.

Use an Amazon Redshift data warehouse as a data source in Trino by configuring a catalog with the Redshift connector.

Apache Accumulo #

Apache Accumulo® is a sorted, distributed key-value store that provides robust, scalable data storage and retrieval.

Use an Apache Accumulo key-value store as a data source in Trino by configuring a catalog with the Accumulo connector.

Apache Cassandra #

Apache Cassandra is an open source NoSQL distributed database trusted by thousands of companies for scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.

Use an Apache Cassandra database as a data source in Trino by configuring a catalog with the Cassandra connector.

Apache Druid #

Druid is a high performance, in-memory, real-time analytics database that delivers sub-second queries on streaming and batch data at scale and under load.

Use an Apache Druid database as a data source in Trino by configuring a catalog with the Druid connector.

Apache Hive #

Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale and facilitates reading, writing, and managing petabytes of data residing in distributed storage using SQL.

Use an Apache Hive data warehouse as a data source in Trino by configuring a catalog with the Hive connector.

Apache Hudi #

Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics.

Use an Apache Hudi data lake as a data source in Trino by configuring a catalog with the Hudi connector.

Apache Iceberg #

Apache Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time.

Use an Apache Iceberg data lakehouse as a data source in Trino by configuring a catalog with the Iceberg connector.

Apache Ignite #

Apache Ignite is a distributed in‑memory database for high‑performance applications. It scales across memory, disk, and multiple machines without compromise.

Use an Apache Ignite database as a data source in Trino by configuring a catalog with the Apache Ignite connector.

Apache Kafka #

Apache Kafka is an open source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Use an Apache Kafka event stream as a data source in Trino by configuring a catalog with the Kafka connector.

Apache Kudu #

Apache Kudu is an open source distributed data storage engine that makes fast analytics on fast and changing data easy.

Use an Apache Kudu data storage as a data source in Trino by configuring a catalog with the Kudu connector.

Apache Phoenix #

Apache Phoenix enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds:

The power of standard SQL and JDBC APIs with full ACID transaction capabilities and
The flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store

Use a Apache Phoenix key value store as a data source in Trino by configuring a catalog with the Phoenix connector.

Apache Pinot #

Apache Pinot is a real-time distributed OLAP datastore, designed to answer OLAP queries with low latency

Use an Apache Pinot datastore as a data source in Trino by configuring a catalog with the Pinot connector.

Clickhouse #

ClickHouse is the fastest and most resource efficient open source real-time database for applications and analytics.

Use a Clickhouse database as a data source in Trino by configuring a catalog with the Clickhouse connector.

Delta Lake #

Delta Lake is an open source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.

Use a Delta Lake data lakehouse as a data source in Trino by configuring a catalog with the Delta Lake connector.

Elasticsearch #

Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data for lightning fast search, fine‑tuned relevancy, and powerful analytics that scale with ease.

Use an Elasticsearch index as a data source in Trino by configuring a catalog with the Elasticsearch connector.

Git #

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

Use a git repository as a data source in Trino by configuring a catalog with the git connector.

Google BigQuery #

BigQuery is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data. Use built-in ML/AI and BI for insights at scale.

Use a Google BigQuery data warehouse as a data source in Trino by configuring a catalog with the BigQuery connector.

Google Sheets #

Google Sheets enables you to ceate and collaborate on online spreadsheets in real-time and from any device.

Use a Google Sheets spreadsheet as a data source in Trino by configuring a catalog with the Google Sheets connector.

Gravitino #

Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages the metadata directly in different sources, types, and regions. It also provides users with unified metadata access for data and AI assets, and is available as an open source project.

Use Gravitino as a data source in Trino by configuring a catalog with the Gravitino connector.

MariaDB #

MariaDB Server is one of the most popular open source relational databases. It’s made by the original developers of MySQL and guaranteed to stay open source. It is part of most cloud offerings and the default in most Linux distributions.

Use a MariaDB database as a data source in Trino by configuring a catalog with the MariaDB connector.

Microsoft SQL Server #

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. Microsoft provides different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

Use a Microsoft SQL Server database as a data source in Trino by configuring a catalog with the SQL Server connector.

MongoDB #

MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas.

Use a MongoDB database as a data source in Trino by configuring a catalog with the MongoDB connector.

MySQL #

MySQL is the world’s most popular open source relational database management system (RDBMS).

Use a MySQL database as a data source in Trino by configuring a catalog with the MySQL connector.

OpenAPI #

OpenAPI is a specification language for REST APIs that provides a standardized means to define your API.

Use any REST API that publishes an OpenAPI specification as a data source in Trino by configuring a catalog with the OpenAPI connector, and avoid having to generate a client.

OpenSearch #

OpenSearch is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications. OpenSearch offers a vendor-agnostic toolset you can use to build secure, high-performance, cost-efficient applications. OpenSearch includes a data store and search engine, a visualization and user interface, and a library of plugins you can use to tailor your tools to your requirements.

Use an OpenSearch index as a data source in Trino by configuring a catalog with the Elasticsearch connector.

Oracle #

Oracle database services and products offer customers cost-optimized and high-performance versions of Oracle Database, the world’s leading converged, multi-model database management system.

Use an Oracle database as a data source in Trino by configuring a catalog with the Oracle connector.

PostgreSQL #

PostgreSQL is the world’s most advanced open source relational database. PostgreSQL is a powerful system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

Use a PostgreSQL database as a data source in Trino by configuring a catalog with the PostgreSQL connector.

Prometheus #

Prometheus is an open source systems monitoring and alerting toolkit with a very active developer and user community. Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.

Use a Prometheus database as a data source in Trino by configuring a catalog with the Prometheus connector.

Trino also supports observability with OpenTelemetry, and therefore Prometheus.

Redis #

Redis is an open source, in-memory data store used by millions of developers as a database, cache, streaming engine, and message broker.

Use a Redis data store as a data source in Trino by configuring a catalog with the Redis connector.

SingleStore #

SingleStoreDB is a unified data engine for transactional and analytical workloads, used to power fast, real-time analytics and applications.

Use a SingleStore database as a data source in Trino by configuring a catalog with the SingleStore connector.

Snowflake #

Snowflake is a Data Cloud platform provider. Snowflake easily enables governed access to near-infinite amounts of data, cutting-edge tools, applications, and services. With the Data Cloud, you can collaborate locally and globally to reveal new insights, create previously unforeseen business opportunities, and identify and know your customers in the moment with seamless and relevant experiences.

Use a Snowflake data cloud as a data source in Trino by configuring a catalog with the Snowflake connector.

TPC #

TPC is a non-profit corporation focused on developing data-centric benchmark standards and disseminating objective, verifiable data to the industry.

The Trino TPC-H and TPC-DS connectors are data generators that provide the benchmark data sets for direct querying or copying into other data sources for testing and benchmarking.

VAST #

VAST is a data platform that includes storage and database services.

Use a VAST data store as a data source in Trino by configuring a catalog with the VAST Trino connector.