Trino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low latency analytics.
The largest organizations in the world use Trino to query exabyte scale data lakes and massive data warehouses alike.
Trino is an ANSI SQL compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset and many others.
Supports diverse use cases: ad-hoc analytics at interactive speeds, massive multi-hour batch queries, and high volume apps that perform sub-second queries.
You can natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data.
Access data from multiple systems within a single query. For example, join historic log data stored in an S3 object storage with customer data stored in a MySQL relational database.
Trino is optimized for both on-premise and cloud environments such as Amazon, Azure, Google Cloud, and others.
Trino is used for critical business operations, including financial results for public markets, by some of the largest organizations in the world.
The Trino project is community driven project under the non-profit Trino Software Foundation.
A primary driver for Trino usage is interactive analytics. A user enters the query either directly using SQL or generated through a user interface, and is waiting for the results to come back as quickly as possible. Trino returns results to the user as soon as they are available. This offers data analysts and data scientists the ability to query large amounts of data, test hypotheses, run A/B testing, and build visualizations or dashboards.
The original use case for the development of Trino, is enabling SQL-based analytics of HDFS/Hive object storage systems. Trino is so performant that it enables analytics that used to be impossible or take hours. Migrating from Hive-based systems and querying cloud object storage systems is still a major use case for Trino.
The ability to query many disparate datasource in the same system with the same SQL greatly simplifies analytics that require understanding the large picture of all your data. Federated queries in Trino can access your object storage, your main relational databases, and your new streaming or NoSQL system, all in the same query. Trino completely changes what is possible in this central data consumption layer.
Large Extract, Transform, Load (ETL) processes running in batches are generally very resource intensive. Routinely run by engineers, they are low priority to return as long as they eventually finish. Trino is able to tremendously speed up ETL processes, allow them all to use standard SQL statement, and work with numerous data sources and targets all in the same system.
Get a digital copy of the definitive guide about the Trino distributed query engine. Useful for beginners and existing users.
For technical background, read our paper: Presto: SQL on Everything
The community is very active and helpful on Slack, with users and developers from all around the world. If you need help using or running Trino, this is the place to ask.Join us on Slack
Curious to learn new insights into the community behind this incredible query engine? Subscribe to our blog where the project maintainers, contributors, and users share updates, stories, knowledge, and lessons learned.Subscribe