Trino RSS Feed

Introducing the NUMBER data type

2026-03-25T00:00:00+00:00

One of Trino’s core strengths is breaking down data silos—enabling data engineers to query diverse data sources through a single SQL interface. However, when those sources use high-precision numeric types beyond Trino’s 38-digit DECIMAL limit, that promise breaks down. Users faced an impossible choice: skip the columns entirely and lose access to critical data, or accept lossy rounding that compromises data integrity.

This challenge required a new approach: a dedicated data type for high-precision, variable-scale decimals.

Adding a new built-in data type to Trino is exceptionally rare. The last time we introduced a new type was the UUID type in May 2019—nearly seven years ago. Types are fundamental building blocks that touch many parts of the system, from the type registry, through coercion rules to connectors, functions, and the protocol. They require careful design and long-term commitment.

With Trino 480, we’re excited to introduce the NUMBER type—a high-precision decimal type that breaks down these data silos and enables seamless access to numeric data across diverse database systems. This addition is particularly powerful for data engineers working with Oracle, PostgreSQL, MySQL, MariaDB, and SingleStore, which support numeric precision beyond the traditional 38-digit DECIMAL limit.

Let’s explore why NUMBER matters, how it works, and how it will simplify your data integration workflows.

The challenge: precision beyond 38 digits

Trino’s DECIMAL type has long supported exact numeric values with precision up to 38 decimal digits, which covers the vast majority of use cases. However, many database systems support higher precision:

Oracle NUMBER: when declared as NUMBER(p, s), precision must be in [1, 38] and scale in [-84, 127]. When declared as NUMBER without precision/scale, each value can have different scale, and actual precision can reach 40 decimal digits. Oracle can store values from 10^-130 to (but not including) 10^126.
PostgreSQL NUMERIC: supports precision and scale in range from -1000 to 1000; supports very high precision numbers with up to 131,072 digits before the decimal point. When declared without precision/scale constraints, each value can have different scale.
MySQL, MariaDB, SingleStore DECIMAL: up to 65 digits of precision (scale 0-30)

Before Trino 480, accessing these high-precision numeric columns required choosing between two unsatisfying options:

Skip the columns entirely and lose access to potentially critical data. This was the default behavior.
Accept lossy conversions - Use decimal-mapping=ALLOW_OVERFLOW with decimal-default-scale=S to force values into DECIMAL(38, S), losing precision through rounding and failing for numbers greater than or equal to 10^(38-S). For example, with scale 10, values ≥ 10^28 would fail.

Neither option is ideal for data federation and warehousing scenarios where preserving data fidelity is essential.

Enter NUMBER: arbitrary-precision decimals in Trino

The NUMBER type solves this problem by supporting floating-point decimal numbers of high precision and flexible scale. In practice, NUMBER supports values with up to 200 digits of precision – far exceeding what most database workloads require. Each value can have a different scale, allowing for values as small as 10^-16000 (or even smaller) and as large as 10^16000 (or even larger) within the same column.

Here’s what NUMBER looks like in action:

-- High-precision literal (50+ digits)
SELECT NUMBER '3.1415926535897932384626433832795028841971693993751';

 3.1415926535897932384626433832795028841971693993751

-- Scientific notation with extreme precision
SELECT NUMBER '12345678901234567890123456789012345678901234567890e30';

 1.234567890123456789012345678901234567890123456789E+79

-- Verify the type
SELECT typeof(NUMBER '123.456');

 number

Special values

NUMBER also supports special values similar to IEEE 754 floating-point types:

SELECT
  NUMBER 'Infinity' as positive_infinity,
  NUMBER '-Infinity' as negative_infinity,
  NUMBER 'NaN' as not_a_number;

 positive_infinity | negative_infinity | not_a_number
-------------------+-------------------+--------------
 +Infinity         | -Infinity         | NaN

These special values follow intuitive comparison and ordering semantics that follow DOUBLE behavior. NaN compares as inequal to all values, including itself. Any comparison with NaN returns false. When sorting, values are ordered as follows: -Infinity, all finite values, +Infinity followed by NaN.

The special values are particularly useful for handling edge cases in source data. In particular, PostgreSQL’s NUMERIC type can represent NaN and Infinity, and these values are now seamlessly mapped to NUMBER when queried through the PostgreSQL connector.

Seamless connector integration

The real power of NUMBER becomes apparent when querying external databases. Five connectors now automatically map high-precision numeric types to NUMBER, requiring no configuration changes:

Oracle connector

Oracle’s NUMBER type supports variable precision and scale. The Oracle connector now maps:

NUMBER(p, s) where p > 38 → Trino NUMBER
NUMBER without precision/scale → Trino NUMBER
NUMBER with extreme scale values → Trino NUMBER

-- Query an Oracle table with high-precision columns
SELECT order_id, unit_price, extended_price
FROM oracle.sales.orders
WHERE extended_price > NUMBER '1000000000000000000000000';

PostgreSQL connector

PostgreSQL’s NUMERIC type supports very high precision and even “unconstrained” precision. The connector automatically handles:

NUMERIC(p, s) where p > 38 → Trino NUMBER
NUMERIC without precision/scale → Trino NUMBER

-- Access PostgreSQL scientific data without precision loss
SELECT measurement_id, precise_value -- a NUMERIC column
FROM postgresql.lab.measurements

MySQL, MariaDB, and SingleStore connectors

These MySQL-compatible databases support DECIMAL precision up to 65 digits. The connectors now map:

DECIMAL(p, s) where p > 38 → Trino NUMBER

-- Join across different databases with high precision
SELECT
  m.account_id,
  m.balance as mysql_balance,
  o.balance as oracle_balance
FROM mysql.banking.accounts m
JOIN oracle.banking.accounts o ON m.account_id = o.account_id
WHERE abs(m.balance - o.balance) > NUMBER '0.01';

Backwards compatibility and migration

The NUMBER type integration is designed to be seamless and backward compatible:

Automatic mapping

If you previously relied on the default behavior (no decimal-mapping configuration), your queries now automatically use NUMBER for high-precision columns. No configuration changes needed.

Legacy configurations still work

If you explicitly configured decimal-mapping=ALLOW_OVERFLOW or decimal-mapping=STRICT, your existing configuration continues to work. The NUMBER mapping is disabled when these options are set, ensuring no surprises.

However, the decimal-mapping configuration and related session properties (decimal_mapping, decimal_default_scale, decimal_rounding_mode) are now deprecated and will be removed in a future Trino release. We recommend migrating to NUMBER-based workflows:

Before (with lossy conversion):

# catalog/postgresql.properties
connection-url=jdbc:postgresql://host:5432/database
connection-user=user
connection-password=password
decimal-mapping=ALLOW_OVERFLOW
decimal-default-scale=10
decimal-rounding-mode=HALF_UP

After (lossless with NUMBER):

# catalog/postgresql.properties
connection-url=jdbc:postgresql://host:5432/database
connection-user=user
connection-password=password
# No decimal-mapping needed - NUMBER is used automatically!

For Oracle, if you previously used oracle.number.rounding-mode to handle high-precision NUMBER columns, you can now remove this configuration to enable native NUMBER mapping.

Working with NUMBER

Type conversions

NUMBER integrates naturally with Trino’s type system:

-- Convert from other numeric types
SELECT
  CAST(DECIMAL '123.45' AS NUMBER) as from_decimal,
  CAST(12345 AS NUMBER) as from_integer,
  CAST(123.45e0 AS NUMBER) as from_double;

 from_decimal | from_integer | from_double
--------------+--------------+-------------
 123.45       | 12345        | 123.45

-- Convert NUMBER to other types
SELECT
  CAST(NUMBER '123.456' AS BIGINT) as to_bigint,
  CAST(NUMBER '123.456' AS DOUBLE) as to_double,
  CAST(NUMBER '123.456' AS DECIMAL(10,2)) as to_decimal;

 to_bigint | to_double | to_decimal
-----------+-----------+------------
 123       | 123.456   | 123.46

Aggregate functions

Common aggregate functions work naturally with NUMBER:

-- Aggregate high-precision values
SELECT
  department,
  sum(revenue) as total_revenue,
  avg(revenue) as average_revenue,
  min(revenue) as min_revenue,
  max(revenue) as max_revenue
FROM oracle.sales.transactions
GROUP BY department;

Creating tables with NUMBER columns

The Oracle and PostgreSQL connectors support creating tables with NUMBER columns:

-- Create a PostgreSQL table with NUMBER column
CREATE TABLE postgresql.schema.measurements (
  id BIGINT,
  precise_value NUMBER
);

-- Create an Oracle table with NUMBER column
CREATE TABLE oracle.schema.scientific_data (
  experiment_id VARCHAR(50),
  measurement NUMBER
);

Technical characteristics and limitations

While NUMBER provides high precision, it’s important to understand its characteristics:

Precision and scale

Trino’s NUMBER type characteristics:

Supported precision: currently 200 decimal digits. While we consider this an implementation detail that may change in future releases, it is unlikely that maximum precision will be decreased.
Scale range: -16,384 to 16,383
Variable scale: each value can have a different scale, similar to PostgreSQL NUMERIC and Oracle NUMBER
Special values: supports NaN, Infinity, and -Infinity

Comparison of decimal numeric types across database systems:

Database	Max Precision	Scale Range	Variable Scale
Oracle NUMBER(p, s)	38	-84 to 127	No
Oracle NUMBER	40	Approximately -130 to 126	Yes
PostgreSQL NUMERIC(p, s)	38	-1000 to 1000	No
PostgreSQL NUMERIC	131,072	-1000 to 1000	Yes
MySQL/MariaDB/SingleStore DECIMAL	65	0 to 30	No
Trino DECIMAL	38	0 to 38	No
Trino NUMBER	200	-16,384 to 16,383	Yes

Storage and representation

NUMBER uses a variable-width binary format optimized for flexibility:

2-byte header encoding sign and scale
Variable-length magnitude in big-endian format
The binary format is considered unstable and may evolve in future releases to enable optimizations and performance improvements

This flexibility allows Trino to improve NUMBER’s internal representation over time without breaking connector compatibility. Trino SPI provides a stable API for connectors to read and write NUMBER values, abstracting away the internal format.

Performance considerations

NUMBER uses Java’s BigDecimal for arithmetic operations, which provides exact precision at the cost of being slower than fixed-precision types like BIGINT, DOUBLE or DECIMAL. For this reason, NUMBER is designed for scenarios where precision is more important than computational speed:

Best for: reading and storing high-precision data from source systems, data federation, reporting, data warehousing
Not optimal for: computational heavy-lifting, complex mathematical operations, high-performance analytics on numeric columns

If your workload involves extensive numeric computation, consider whether DECIMAL (for up to 38 digits), DOUBLE (for approximate arithmetic), or BIGINT (for integer arithmetic) might be more appropriate.

Function support

NUMBER supports essential operations:

Arithmetic: +, -, *, /
Aggregations: sum(), avg(), min(), max()
Rounding functions: abs(), sign(), ceiling(), floor(), truncate(), round()
Special value checks: is_nan(), is_finite(), is_infinite()

Many advanced mathematical functions (trigonometric, logarithmic, etc.) do not work with NUMBER directly and require explicit type conversions to DOUBLE or DECIMAL.

What’s next

The NUMBER type support will continue to evolve. Additional connectors are planned for future releases:

ClickHouse: for Decimal256 type mapping
Apache Ignite: for high-precision numeric support

We’re also exploring performance optimizations and expanding function support based on community feedback.

Getting started

NUMBER support is available now in Trino 480. To start using it:

Upgrade to Trino 480 - NUMBER is available out of the box
Remove deprecated configs - If you used decimal-mapping configurations, consider removing them to enable automatic NUMBER mapping
Query your data - High-precision columns are now accessible without configuration

For detailed documentation, refer to:

Have questions or feedback? Join the discussion on the Trino community Slack in the #dev channel, or open an issue on GitHub.

The NUMBER type represents a significant milestone in Trino’s evolution, eliminating precision loss barriers and making high-precision numeric data from diverse sources readily accessible for analytics and reporting. We’re excited to see how the community uses this powerful new capability!

□

78: A view with a view with a view

2026-01-16T00:00:00+00:00

Host

Manfred Moser, Sr. Principal DevRel Engineer at Chainguard, open source hacker at simpligility
Cole Bowden, Senior Developer Advocate at InfluxData

Guest

Rob Dickinson

Releases and news

Trino 478

Add support for multiple plugin directories.
Propagate queryId to the Open Policy Agent authorizer.
Add support for reading encrypted Parquet files with the Hive connector.
Add numerous performance improvements and bug fixes for the Iceberg connector.
Update Docker container to use Java 25.

Trino 479

Require Java 25 to build and run Trino.
Publish processing time for a query in the FINISHING state to event listeners.
Deprecate EXPLAIN type LOGICAL and DISTRIBUTED.
Add a extraHeaders option to support sending arbitrary HTTP headers to the JDBC driver and the CLI.
Add APPLICATION_DEFAULT authentication type for GCS.
Remove support for unauthenticated access when GCS authentication type is set to SERVICE_ACCOUNT.
Add support for setting and dropping column defaults via ALTER TABLE ... ALTER COLUMN to the memory connector.

View Manfred mentors 10 for a more detailed discussion.

As always, numerous performance improvements, bug fixes, and other features were added as well.

Other releases and news

Trino Contributor Call minutes are available:
- October 2025
- November 2025
Trino query UI
- v0.1.1 successfully released
- Now blocked by npm process change and necessary work to adapt to it
OpenText and Vertica connector
- OpenText is looking for expression of interest from users - contact Manfred or comment on the PR for potential removal
- Working on collaboration to set up test environment with Trino project
PowerBI connector for Trino
- Manfred working with Microsoft and others to figure out future plans
- Microsoft is looking for your votes for a Trino Fabric connector
Trino 480 and Trino Gateway 17 are hopefully coming soon
Manfred mentors videos up to episode 10 now about various Trino topics

Introducing Rob

Rob tells us about his history with Trino, software engineering, and management.

A view with a view with a view

We recap Rob’s past presentation and concepts from Trino Summit 2024 about views and hierarchies of views. Then we move on to discuss all his recent development and work. There include the virtual-view-manifesto and the viewmapper and viewzoo projects.

We also chat about Rob’s journey with AI tooling.

A comparison of application code access to database storage with the different approaches of an ORM layer, a micro service and API layer, and query engine and view layer approach:

A detailed topology of an application taking advantage of virtual view hierarchies:

A concrete example of a view hierarchy for events – two swappable layers, one for mapping to physical databases, and one for calculating event priority:

Resources

virtual-view-manifesto
viewmapper for view storage
viewzoo for view visualization

Rounding out

28 Jan 2026 - Trino Contributor Call
7 Feb 2026 - Trino meetup in Bangalore
Looking for guests and topics for Trino Community Broadcast 79 and beyond

77: One tool to proxy them all

2025-10-29T00:00:00+00:00

Host

Manfred Moser, Sr. Principal DevRel Engineer at Chainguard, open source hacker at simpligility

Guest

Jordan Zimmerman, Senior Staff Engineer at Starburst
Pablo Arteaga, Software Engineer at Bloomberg

Releases and news

Trino 478 is in the final staging of getting to release. We will talk about the details in the next episode.

Other releases and news

August contributor call recap and recording is available.
New video tutorials for working on Trino and other open source projects Manfred mentors is live now and looking for sponsors. Details about the tasks are available in the contribution tracker project.

Introducing Jordan and Pablo

Manfred chats with Pablo and Jordan about their involvement in the Trino community. We end up chatting a bunch about the Airlift framework that is a foundation for Trino since Jordan has been involved in that project for a long time. Pablo has been involved in Trino itself and worked on the OPA plugin and the Trino Gateway, among other things.

aws-proxy

The AWS Proxy is an open-source Java toolkit and library, not a standalone application, designed to act as a transparent proxy for AWS Simple Storage Service (S3) compatible object storage protocols.

It was created by developers from Starburst, Bloomberg and other organizations in the Trino community to address the need for enhanced governance and security with tools like Apache Spark that lack security controls. It also supports direct data access to S3 or S3-compatible systems, like MinIO or Dell ECS.

Key functionality and use cases

Security and governance layer: The primary goal is to prevent client applications from bypassing governance systems by accessing S3 directly. It ensures all data access is channeled through the proxy, where custom business logic can be applied.
Signature handling: It handles the complex AWS Signature Version 4 (SIGv4) protocol used for authenticating requests, which was the most challenging part of its development.
Emulated credentials: Clients are configured to use fake, worthless credentials that are only recognized by the proxy. The proxy then validates the user’s identity and request against security policies (like OPA), signs the request with the real, secure AWS keys (kept safe behind the firewall), and forwards it to the real S3 store.
Extensibility: It’s built on the Airlift framework and uses a simple Service Provider Interface (SPI) plugin mechanism. This allows users to add custom logic authorization, object storage abstraction from buckets to tables, redirection, and other use cases.

In essence, it takes standard S3 requests from data tools and mediates them, applying security, control, and abstraction before forwarding them to the actual data lake storage.

Resources

Rounding out

Looking for guests and topics for Trino Community Broadcast 78
26 November 2025 - Trino Contributor Call

76: Triple platform treat

2025-09-26T00:00:00+00:00

Hosts

Manfred Moser, Sr. Principal DevRel Engineer at Chainguard, open source hacker at simpligility
Cole Bowden, Developer Advocate

Guest

Jo Perez, Founding Solutions Engineer at Collate
Shawn Gordon, Sr. Developer Advocate at Collate

Releases and news

Finally shipped a huge new release:

Trino 477

Add Lakehouse connector.
Add SQL language features including ALTER MATERIALIZED VIEW ... SET AUTHORIZATION, default column values, and ALTER VIEW ... REFRESH.
Add new SQL functions like cosine_distance() and to_geojson_geometry().
Add lots of new features to the preview UI.

There are too many connector improvements to list them all. Check out the release notes. Also inspect the changes on the SPI since there are quite a few.

Importantly, this release also includes some breaking changes.

As always, numerous performance improvements, bug fixes, and other features were added as well.

And before Trino 477 we also shipped Trino Gateway:

Trino Gateway 16

Add numerous UI improvements and fixes.
Require Java 24 and PostgreSQL 17 or higher.
Allow default routing group configuration.
Improve error propagation with external routing service.

Other releases and news

trino-1.41.0 and trino-gateway-1.16.0 Helm charts
trino-python-client 0.336.0
July contributor call recap and recording is available.
The August contributor call recap and recording from Wednesday is in the works.
Java 25 shipped and adoption in Trino is on the way.
The new trino-odbc project was contributed by Riley McDowell.
Erik Anderson is stepping up as subproject maintainer for the ODBC driver.
Pablo Arteaga will lead the new efforts for better OPA tooling and support in the trino-opa-tools repository.
We send our thanks to Cristian Osiac for his contributions as subproject maintainer for aws-proxy. He is unfortunately stepping down from this work.
Trino recently overtook the old Presto in the DB-Engines ranking.

Introducing Jo and Shawn

We chat with Jo and Shawn about their background in the big data and data lake community and beyond.

Collate

We talk about the OpenMetadata open source project as a unified platform for data discovery, observability, and governance, with 80+ data connectors and a collaborative interface.

Jo and Shawn teach us about how OpenMetadata can help build and manage high quality data assets at scale, with case studies, documentation, and community resources and we dive into how Collate offers a platform around OpenMetadata and more.

Triple platform treat

Building a modern data platform isn’t just about picking tools—it’s about creating a unified ecosystem where performance, governance, and trust work seamlessly together. See how the power trio of Trino, Collate, and Apache Ranger transforms your data operations:

Trino: Lightning-fast analytics at scale. Query across any data source, any format, anywhere—without the complexity of data movement or vendor lock-in.
Collate: Intelligent data trust and discovery AI-powered profiling, automated quality testing, and smart alerting that keeps your data reliable and discoverable.
Apache Ranger: Enterprise-grade security and governance, fine-grained access controls, policy management, and audit trails that keep your data secure and compliant.

The integration advantage: Watch these three platforms work together to deliver what every data team needs—fast queries, trusted data, and bulletproof security—all in one cohesive stack.

Jo and Shawn tell us more about “Trino + Collate + Apache Ranger = Data Platform Excellence”, talk about the components and value provided by each of them, and dive in with a demo, while Manfred and Cole ask more questions to dive deeper.

Resources

Rounding out

Trino Community Broadcast 77: One tool to proxy them all (aws-proxy) planned for October

Let us know if you want to be a guest in a future broadcast.

75: Your app sees clearly into Trino

2025-07-05T00:00:00+00:00

Hosts

Manfred Moser, Dev Rel Engineer at Chainguard, open source hacker at simpligility
Cole Bowden, Developer Advocate at Firebolt

Guest

Trevor Denning, Solutions Engineer at insightsoftware

Releases

What’s going on with our releases?

Summer slump
Reduced maintainer work
Necessary migration for Maven Central as release blocker

Other announcements:

June contributor call recap and recording
Trino Software Foundation and documentation for supporting the project on the website.

Introducing Trevor

Trevor has been developing software for over 20 years and has deep knowledge of ODBC and JDBC drivers for databases. He tells us more about his experience and how he came to learn about Trino.

More about insightsoftware

We untangle the long history of Simba, Logi Symphony, and insigtsoftware with the Trino project to the current status, before we dive into the technical details.

ODBC and JDBC

After talking a bit about Trino, Iceberg, data lakes and related topics, we get into the details about Simba Trino data connectivity with the ODBC and JDBC drivers.

Demo

Trevor shows us how you can use the ODBC driver to query Trino catalogs from Microsoft Excel, which arguably the most widely used reporting and analytics tool, despite really being a spreadsheet application. After that demo he moves on to some business intelligence analytics with PowerBI.

Resources

Rounding out

We give a quick update on where to see Cole or Manfred next, and talk about upcoming Trino events:

Meet Manfred at the Chainguard booth at the Black Hat conference in Las Vegas
Trino Contributor Call planned for the 23rd of July
Trino Community Broadcast: One tool to proxy them all (aws-proxy)

Let us know if you want to be a guest in a future broadcast.

74: Insights from a Norse god

2025-06-06T00:00:00+00:00

Hosts

Manfred Moser, Dev Rel Engineer at Chainguard @simpligility
Cole Bowden, Developer Advocate at Firebolt

Guest

Karsten Jeschkies from Grafana Labs

Releases

Following are some highlights of the recent releases:

Trino 475

Add support for the CORRESPONDING clause in set operations.
Add support for the AUTO grouping set that includes all non-aggregated columns in the SELECT clause.
Allow cross-region data retrieval when using the S3 native filesystem.
Add support for all storage classes when using the S3 native filesystem for writes.
Numerous improvements on Iceberg, Hive, and Delta Lake connectors.
SPI - Remove the LazyBlock class.

Trino 476

Another big release with lots of changes:

Require JDK 24 as runtime.
Add support for comparing values of geometry type.
Remove Example HTTP connector from binaries.
New required JVM config for BigQuery and Snowflake connectors.
Fix regression with graceful shutdown from Trino 474.
Improve performance of selective joins for federated queries for nearly all connectors.
Add columns to the $all_manifests metadata tables for Iceberg tables.
Add support for user-assigned managed identity authentication for AzureFS for object storage connectors.
Add support for the FOR TIMESTAMP AS OF clause in Delta Lake connector.

As always, numerous performance improvements, bug fixes, and other features were added as well.

Other releases and announcements:

Trino Gateway 16 still delayed, but Trino Gateway Helm chart 1.15.2
Trino Helm chart with 475 -> 1.39.1
Trino Python client 0.334.0

Introducing Karsten and Grafana Labs

Karsten Jeschkies is an experienced software engineer:

2013 - 2016 Engineer at the Core Machine Learning team at Amazon
2016 - 2020 Mesosphere and D2IQ, maintainer of Marathon, a container orchestrator for Mesos
2020 - now Maintainer of Loki for two years and now Cloud Provider observability engineer at Grafana Labs

Grafana Labs is the home of the well-known Grafana for visualizations and dashboard and other powerful products such as Grafana Tempo, Grafana Mimir, and Grafana Loki. Grafana is also involved in well-known projects such as Prometheus and OpenTelemetry.

Log management with Loki

Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It helps you to drill into petabytes of logging data.

Analytics with Trino

Karsten tells about the motivation to create a Trino connector, how the two tools work together, what features are there, and what his plans are for the future.

Resources

Rounding out

Quick update on where to see Cole or Manfred next, and then join us for the upcoming Trino events:

Trino Contributor Call - May skipped, June edition to be determined
Trino Community Broadcast: Visualizing with Logi Symphony and ODBC
Trino Community Broadcast: One tool to proxy them all (aws-proxy)

Let us know if you want to be a guest in a future broadcast.

73: Wrapping Trino packages with a bow

2025-04-09T00:00:00+00:00

Hosts

Manfred Moser, Dev Rel Engineer at Chainguard @simpligility
Cole Bowden, Developer Advocate at Firebolt

Releases

Following are some highlights of the recent releases:

Trino 473

Add support for array literals.
Add LDAP-based group provider.
Remove the deprecated glue-v1 metastore type.
Remove the deprecated Databricks Unity catalog integration.
Remove the Kudu connector.
Remove the Phoenix connector.

But don’t use 473 since there were some breaking changes, fixed in…

Trino 474

Fix a correctness bug in GROUP BY or DISTINCT queries with a large number of unique groups.
Add originalUser and authenticatedUser as resource group selectors.
Use JDK 24 as the runtime in the Docker container.

As always, numerous performance improvements, bug fixes, and other features were added as well. Java 24 is coming as requirement soon - test the container!

Releases continue to be slower. Trino needs your help.

Other releases and announcements:

Trino Gateway 16 delayed, but Trino Gateway Helm chart 1.15.1
Trino Helm chart with 474 -> 1.38.0
New book: Core Principles and Design Practices of OLAP Engines from Yiteng Xu and Gary Gao
Massive new contribution looking for helpers - trino-query-ui

Let’s explore the query ui repo a bit more…

Application packaging and Trino

Manfred and Cole muse about the package artifacts from Trino, their history, scope and pain points:

RPM
tarball
Docker container

All of them have and had issues, and everyone knew about them. Manfred documented a lot the usage in Trino: The Definitive Guide. Finally some time in 2024 Manfred put some ideas down and in the last months implemented a lot of it.

We discuss a few aspects such as the following:

Plugin architecture of Trino
What plugins are core or optional?
Are artifacts ready to use or not?
How painful is configuration?

Demo time

In our demo session we look at some of the changes and the new trino-packages repository:

RPM removal from Trino, and replacement module
trino-server-core tarball in Trino and plugin selection
trino-server-custom module
trinodb/trino-core:latest Docker container in Trino
custom-docker module

Manfred runs a build, shows the results, and walks through the packages repository structure and instruction. To finish of we talk about next steps such as removing plugins from the default binaries and therefore making them optional.

Resources

Rounding out

Quick update on where to see Cole or Manfred next, and then join us for the upcoming Trino events:

Trino Contributor Call - 23rd of April
Trino Community Broadcast 74: One tool to proxy them all (aws-proxy)
Trino Community Broadcast 75: Insights from a Norse god (Loki connector)
Trino Community Broadcast 76: Visualizing with Logi Symphony and ODBC

Let us know if you want to be a guest in a future broadcast.

Core Principles and Design Practices of OLAP Engines

2025-03-27T00:00:00+00:00

Yiteng Xu and Yingju Gao are proudly announcing the new book “Core Principle and Design Practices of OLAP Engines” from China Machine Press. This is great news for the Trino community, since the book is based on the open source project Trino, specifically Trino 350. It took more than four years for the two authors to finish writing. All concepts and details are explained with Trino falvor and generalized to all OLAP engines. Let us walk throught the chapters and you will find out the two author dive deep into the source code layer and bring you so many treasures.

Author introduction

Yiteng (Ivan) Xu: is a data security engineer and is currently utilizing Trino, Spark, and Calcite for SQL analysis. His work encompasses various scenarios, including data warehouse metrics, SQL auto-rewriting, SQL purpose detection, and the development of SQL-based Purpose-Aware Access Control System.

Yingju (Gary) Gao is an Apache Seatunnel PMC member and the lead of the time series database team. He currently serves as the technical lead for the observability-engine team, and is responsible for building the ecosystem for observability data, including metrics, trace, log, and event data, providing a high-performance, high-throughput data pipeline from ingestion to consumption, storage, querying, and data warehousing. Additionally, he oversees metrics stability, multi-tenant access, and user requirement integration.

Both authors are passionate about sharing their technical knowledge. They have delved deep into source code and excel in technical writing, breaking down complex underlying principles into a linear and comprehensible format for readers. They firmly believe that sharing is a virtue and are committed to continuing their technical contributions.

So now it is time to get the book, or read on for a walk through of the content:

Get the book from dangdang.com Get the book from jd.com

Walk through

Let’s have a look at the different chapters in a high-level walk through.

Part 1: Background knowledge

Chapter 1: Introduce the concept of OLAP (Online Analytical Processing), provide comparsion among different engines like Trino, Impala, Doris and others.

Chapter 2: Provides a comprehensive introduction to the Trino engine, covering its principles, architecture, enterprise use cases, compilation, and execution. It also compares Trino with the Presto project and introduces the SQL statements that are referenced throughout the book.

Part 2: Core principles

Chapter 3: Offers an overview of the distributed SQL query process, serving as a high-level introduction to the subsequent chapters.

Chapter 4: Begins with the generation of query execution plans, including the transformation of SQL into abstract syntax trees, semantic analysis, and the creation of initial logical plans. It then delves into the theoretical knowledge of optimizers and the overall framework of the Trino optimizer.

Part 3: Classic SQL

Chapter 5: Explains the generation and optimization of execution plans for SQL statements involving only TableScan, Filter, and Project operations, along with their scheduling and execution processes.

Chapter 6: Focuses on SQL statements with Limit and Sort operations, detailing the generation and optimization of execution plans, as well as their scheduling and execution.

Chapter 7: Introduces the basic principles of aggregate queries. It then covers the generation and optimization of execution plans for grouped and non-grouped aggregate SQL statements, along with their scheduling and execution processes.

Chapter 8: Discusses SQL statements with count distinct and multiple aggregate operations, explaining the generation and optimization of execution plans, as well as their scheduling and execution. This includes the Scatter-Gather model and MarkDistinct optimization. Finally, a complex SQL statement is used to tie together the concepts from Chapters 5 to 8.

Part 4: Data exchange mechanism

Chapter 9: Introduces the overall concept of data exchange mechanisms and how data exchange is incorporated during the query optimization phase via the AddExchanges optimizer, along with the design principles for scheduling and execution.

Chapter 10: Explains how tasks establish connections during the query scheduling phase and the mechanisms for upstream and downstream data flow during execution. It also covers the principles of intra-task data exchange, RPC interaction mechanisms, and analyzes backpressure, Limit semantics, and out-of-order request handling.

Part 5: Plugin mechanisms and connectors

Chapter 11: Begins with an introduction to Trino’s plugin system and SPI mechanism, including plugin loading and JVM’s class loading principles. It then dissects connectors, covering metadata modules, read modules, pushdown optimization, and providing in-depth insights into connector design.

Chapter 12: Uses the example-http connector to help readers understand connector design and implements a simple data source using Python’s Flask framework.

Part 6: Function principles and development

Chapter 13: Provides an overview of Trino’s function system, including function types, lifecycle, and several function development methods. It delves into the data structures and annotations related to functions and explains the function registration and parsing process during semantic analysis.

Chapter 14: Focuses on how to write a udf in practice. It covers annotation-based development methods for scalar functions, as well as low-level development methods using codeGen or methodHandle APIs. For aggregate functions, it introduces annotation-based development methods and low-level methods where developers handle serialization and state on their own.

Why Trino?

In 2020, one of the authors, Yiteng Xu, encountered a scenario at work where data needed to be read from two Hive instances, each modified by different internal teams. The company’s infrastructure team attempted a simple solution by registering virtual tables and using MapReduce for federated queries. However, this approach proved inadequate for the agile analysis needs of data analysts, with complex queries taking nearly 12 hours to complete. One mistake per SQL meant an entire day was wasted.

Later, another team researched and adopted Presto (before Trino became independent). By adapting the Hive engine at the connector level, they enabled federated queries across the two Hive instances without data migration or extensive code changes. Users only needed to be aware of a catalog prefix, making the process incredibly convenient. The author later had the opportunity to participate in the project and developed a strong interest in its source code. The elegance of the open-source project, its plugin design, and the inner workings of connectors and Airlift framework sparked a deep curiosity, leading the author on a journey of source code exploration. As the PrestoSQL project was more active and receptive to developer feedback, the author chose to continue following the Trino project when it emerged in late 2020.

Get your copy

Now it is time for you to get your copy of Core Principles and Design Practices of OLAP Engines:

Get the book from dangdang.com Get the book from jd.com

72: Keeping the lake clean

2025-03-17T00:00:00+00:00

Hosts

Manfred Moser, Dev Rel Engineer at Chainguard @simpligility
Cole Bowden, Developer Advocate at Firebolt

Guests

Viktor Kessler, Co-founder at Vakamo
Christian Thiel, Co-founder at Vakamo

Releases

Following are some highlights of the recent releases:

Trino 472

Color the server console output for improved readability.
Fix initialization failure for the DuckDB connector on Docker container.
Add support for the row type and generate empty values for array, map, and json types in the Faker connector.
Add the $partition hidden column in the Iceberg connector.

As always, numerous performance improvements, bug fixes, and other features were added as well.

Trino Gateway 15

Pop up messages in UI
Consistent use of config.yaml
Use of OpenMetrics data from Trino clusters
Fix query errors when adhoc routing group has no healthy backends.

Introducing Viktor and Christian

We talk with Viktor and Christian about there experience in software engineering and the world of big data, and what led them to start Vakamo together.

Metastores and catalogs

We talk about data lakes, data lakehouses, object storage and the role of metadata. Details we cover include the Hive Metatstore Service, the Thrift protocol, Amazon Glue, and the new wave of catalogs. Specifically we also talk about Apache Iceberg and the Iceberg REST catalog standard as a basis for Lakekeeper, and then learn all the details about Lakekeeper.

Demo time

In their demo Viktor and Christian show a multi-user Trino cluster secured by OAuth 2, Open Policy Agent, and Lakekeeper.

Resources

Rounding out

Join us for upcoming events and let us know if you want to a guest:

Trino Community Broadcast 73: Wrapping Trino packages with a bow

Twenty four

2025-03-03T00:00:00+00:00

Six month ago we adopted Java 23 as requirement, following our standard procedure to upgrade with each Java version as soon as it becomes available. This allows us to take advantage of all the great improvement each release brings. The upgrade to 23 was pretty easy since the changes from 22 to 23 were not that big. The story turns out to be a bit different now with our upgrade to Java 24.

Java 24 features

We have been planning and working towards the upgrade consistently since the 23 bump in September. Java 24 is set to be released in March 2025 and the list of changes is quite significant:

JEP 450 Compact Object Headers (Experimental)
JEP 472 Prepare to Restrict the Use of JNI
JEP 475 Late Barrier Expansion for G1
JEP 478 Key Derivation Function API (Preview)
JEP 483 Ahead-of-Time Class Loading & Linking
JEP 484 Class-File API
JEP 485 Stream Gatherers
JEP 486 Permanently Disable the Security Manager
JEP 487 Scoped Values (Fourth Preview)
JEP 488 Primitive Types in Patterns, instanceof, and switch (Second Preview)
JEP 489 Vector API (Ninth Incubator)
JEP 490 ZGC: Remove the Non-Generational Mode
JEP 491 Synchronize Virtual Threads without Pinning
JEP 492 Flexible Constructor Bodies (Third Preview)
JEP 494 Module Import Declarations (Second Preview)
JEP 495 Simple Source Files and Instance Main Methods (Fourth Preview)
JEP 496 Quantum-Resistant Module-Lattice-Based Key Encapsulation Mechanism
JEP 497 Quantum-Resistant Module-Lattice-Based Digital Signature Algorithm
JEP 498 Warn upon Use of Memory-Access Methods in sun.misc.Unsafe
JEP 499 Structured Concurrency (Fourth Preview)

The list of new features is also quite large. You can find more details in the release notes and each individual JEP.

Trino perspective

From a Trino perspective we want to specifically take advantage of performance improvements to MemorySegment (mismatch, copy, fill), “JEP 491 Synchronize Virtual Threads without Pinning” and “JEP 475 Late Barrier Expansion for G1”. On the other hand JEP 486 Permanently Disable the Security Manager turned out to be the most impactful.

Since Trino and its connectors have a large footprint of dependencies there was a high chance that some projects as not keeping up with the security manager removal, although it was first deprecated with Java 17 in 2021.

At this stage the Kafka, Kudu, and Phoenix connectors are affected. The Kafka project is planning to make a new compatible release available in time and we will adopt that version.

The Kudu and Phoenix connectors however will be removed, since it is not possible to use them with Java 24 as requirement. Both connectors are not heavily used in our community as we learned from our communication with numerous users, integrators, and the results from our user survey. We are tracking progress for each removal in the issues #24419 Phoenix connector and #24417 Kudu connector. If either of these communities ends up supporting Java 24, or a newer version as required by Trino, in the future, we can potentially add the connectors back in if community members contribute updated versions.

Release plans

In terms of shipping the changes we follow our established pattern:

Clean up codebase and get it ready, specifically this include the removal of the Kudu and Phoenix connectors.
Cut a release that is completely ready to be used with Java 24, but does not yet make it a hard requirement
Allow for community testing and feedback using Java 24.
Introduce Java 24 as hard requirement in another release.
Adopt Java 24 features and bring the benefits to our users with following releases.

As you see, there is a bunch of work waiting, we we better back to it. As usual, if you have questions or comments, chime in on the relevant issue or chat with us on Trino Slack in the core-dev channel.

71: Fake it real good

2025-02-27T00:00:00+00:00

Hosts

Manfred Moser, Director/Open Source Engineering at Starburst - @simpligility
Cole Bowden, Developer Advocate at Firebolt

Guest

Jan Waś, Software Engineer at Starburst

Releases

Following are some highlights of the recent releases:

Trino 471

Add AI functions for textual tasks on data using OpenAI, Anthropic, or other LLMs using Ollama as backend.
Add support for logging output to the console in JSON format (useful in containers..).
Support additional Python libraries for use with Python user-defined functions.
Remove the RPM package.
Add local file system support.
Add support for S3 Tables in Iceberg connector.

As always, numerous performance improvements, bug fixes, and other features were added as well.

Trino Gateway 14

Our first Trino Gateway release of 2025 shipped, and it is packed with great new features and fixes. Some examples are the following:

Rules editor in the web interface
Automatic database schema update and support for Oracle
Trino cluster monitoring with JMX and OpenMetrics

Introducing Jan Waś

Jan, also known as nineinchnick on GitHub, is a very active Trino contributor with a wide range of his own plugins and projects. He is subproject maintainer for the Helm charts and the Grafana plugin, and is heavily involved in GitHub actions setup and numerous other efforts. Jan resides in Poland. When he is not working on Trino, you can find him at metal, electronics, and even opera concerts across Europe or at home playing video games.

Datafaker, Faker connector, and Trino

We talk about using simulated data from the TPC-H and TPC-DS connectors to learn SQL and use it for other scenarios such as benchmarking, testing for SQL support, and validating other connectors and data sources. This leads us to the limitations of these connectors and how the Faker connector is the next step.

Jan tells us about the Datafaker library and his motivation to create a connector, and how it eventually landed in Trino itself.

Demo time

Jan shows us how to configure the connector and then demoes a number of use cases from learning SQL to populating and testing other data sources.

Resources

Rounding out

Watch the recording of the Trino contributor call or read the minutes.

Join us for upcoming events and let us know if you want to a guest:

Trino Community Broadcast 72: Keeping the lake clean, all about Lakekeeper
Trino Community Broadcast 73: Wrapping Trino packages with a bow

70: Previewing a new UI

2025-02-13T00:00:00+00:00

Host

Manfred Moser, Director/Open Source Engineering at Starburst - @simpligility

Guests

Peter Kosztolanyi, Analytics Platform Lead at Wise

Releases

Following are some highlights of the Trino releases since episode 69:

Trino 470

New DuckDB connector
New Grafana Loki connector
Support for WITH SESSION for SELECT queries
Raise minimum runtime requirement to Java 11 for JDBC driver and CLI
Remove Kinesis connector
Deprecate use of the legacy file system support for Azure Storage, Google Cloud Storage, IBM Cloud Object Storage, S3 and S3-compatible object storage systems - check out the blog post

As always, numerous performance improvements, bug fixes, and other features were added as well.

Introducing Peter Kosztolanyi

Peter Kosztolanyi is the Analytics Platform Lead at Wise and he presented about their data lake with Abdullah Alkhawatrah at Trino Summit 2024. Peter has a lot of experience in the data and business intelligence fields.

He also contributes to the Trino Python client, and worked on his own phone and messaging app for iOS and Android in the past.

Trino legacy web UI

The existing main web UI for Trino has been around for a long time, and sees very limited development and maintenance. It lacks documentation, a modern look, a clean codebase, and is inconsistent across screens. It is also very technical and developer focussed, and lacks features like a SQL console to run queries.

Efforts for a new UI

While we all knew about the problems of the old UI, nobody with enough UI coding knowledge or time and motivation ever took up the banner to change the situation. We did however get a great new UI contributed in Trino Gateway, and that motivated some people in the community, especially Peter.

Peter started with the same stack, pulled in maintainers like Mateusz Gajewski and Manfred Moser, and kept working on improvements. We talk more about the following aspects:

Problems with the old UI and its technology stack
Trino Gateway UI
Roadmap issue and discussion around the new UI
What is the stack now?
Look at the codebase, tools, development, and documentation
Current status and next steps
What do we need from others?

Demo time

Peter shows us the new UI from his development setup - the latest and greatest set of features.

Resources

Rounding out

Join us for upcoming events and let us know if you want to be the next guest.

Trino contributor call, 27th of February
Trino Community Broadcast 71 with Jan Waś about the new Faker connector, 27th of February

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.

Out with the old file system

2025-02-10T00:00:00+00:00

What a long journey it has been! From the start Trino supported querying Hive data and used libraries from the Hive and Hadoop ecosystem. With the release of Trino 470 we mark another milestone to more features and better performance for data lake and lakehouse querying with Trino. We deprecated the legacy file system support, and will permanently remove them in an upcoming release.

Background

Trino always had a focus on performance and security. As a result we implemented custom readers for file formats like Apache ORC and Apache Parquet many years ago. We also have improved libraries for compression and decompression of files from object storage and and implemented our own support for other table formats with the Apache Iceberg, Delta Lake and Apache Hudi connectors.

For the underlying object storage solutions and file systems, we originally extended the libraries around the Hive system and added implementations for Amazon S3, Azure Storage, Google Cloud Storage and others. Over time the mismatch of the HDFS libraries and the cloud-centric usage with modern file systems became more and more of a maintenance headache. It also represented an unnecessary complexity overhead, resulted in performance problems, and forced us to carry the Hadoop dependencies with all their baggage of old Java code and security issues.

In the end David Philips, as our file system lead, decided in 2022 that it was time to write our own file system support as needed for Trino. By summer of 2023 and with Trino 419 a first support for S3 became available for the Iceberg and Delta Lake connectors. Over a year later in September 2024 and with Trino 458, we declared the old file system support on top of the Hadoop libraries legacy and advised users to migrate.

Since then you are required to declare what file system you want to enable in each catalog with fs.native-azure.enabled=true,fs.native-gcs.enabled=true or fs.native-s3.enabled=true. If you are truly using HDFS, or if you insist on using the old legacy support you can also use fs.hadoop.enabled=true.

Trino 470

With the recent Trino 470 release from February 2025, we took the next step. All catalog configuration properties for using the old, legacy support for accessing Azure Storage, Google Cloud Storage, S3, and S3-compatible file systems are now deprecated.

These properties include all names starting with hive.azure, hive.cos, hive.gcs, and hive.s3. The result of this deprecation is that Trino emits warnings during the startup for each of these properties in the server log.

We also removed all documentation for the old properties, leaving only relevant migration guides in place.

Next steps

Within the next weeks or months we will completely remove all these properties and the underlying code. We therefore renew our call out from numerous contributor calls, Trino Community Broadcast episodes, and our Trino Fest and Trino Summit events:

Stop using the old legacy file systems today.

If you need help, have a look at the documentation for your connector, the file system you use, and the migration guide for each file system:

The new systems are more stable and performant, and save you time and money. Migrate today, and if you encounter any issues, or find that there are features missing, ping us on Slack and chime in on the roadmap issue for the removal of the legacy file system support.

69: Client protocol improvements

2025-01-30T00:00:00+00:00

Host

Manfred Moser, Director/Open Source Engineering at Starburst - @simpligility
Cole Bowden, Developer Advocate at Firebolt

Guests

Mateusz Gajewski, Sr. Staff Software Engineer at Starburst

Releases

Follow are some highlights of the first release of 2025. It took us a bit longer to work through release blockers this time:

Trino 469

Add support for the FIRST, AFTER, and LAST clauses to ALTER TABLE ... ADD COLUMN for Iceberg, MySQL, and MariaDB.
SSE-C in S3 security mapping for Delta Lake, Hive, Hudi, and Iceberg
Allow configuration for Google Cloud Storage endpoint with object storage connectors.
Allow connection validation and add more stats for JDBC driver.
Remove support for connector-level event listeners.
Misc improvements for the Faker connector.

As always, numerous performance improvements, bug fixes, and other features were added as well.

Other news

Trino Python client 0.332.0 with spooling support
Trino contributor call

Introducing wendigo

What can we say? Top contributor and maintainer, and all around hacker on Trino, numerous Trino subprojects, Airlift, and beyond.

Main topic

Let’s talk about the Trino client protocol. Following are some topics we cover:

What is the client protocol for?
History of the client protocol
Available client drivers and client applications
Architecture and flow
Motivation to improve the protocol
Direct and spooling modes

Mateusz walks through the presentation and Cole and Manfred ask a lot of questions:

Presentation

Demo time

Mateusz show us his example and testing setup with Starburst Galaxy clusters configured for spooling protocol use and shares some of the performance gains he observes.

Resources

Rounding out

Join us for upcoming events and let us know if you want to be the next guest.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.

68: Year of the Snake - Python UDFs

2025-01-16T00:00:00+00:00

Host

Manfred Moser, Director/Open Source Engineering and Trino maintainer at Starburst - @simpligility
Cole Bowden, Developer Advocate at Firebolt

Guests

David Phillips, Trino co-creator and maintainer

Releases

Follow are some highlights of the Trino releases since episode 67:

Trino 465

Add support for customer-provided SSE key in S3 file system relevant for Hive, Iceberg, Delta Lake and Hudi connectors.
Deterministic data, locale support, and random_string function for the Faker connector.
Add support for extra_properties in the Iceberg connector.
Add support for the geometry type in the PostgreSQL connector.

Trino 466

Remove Python requirement for Trino by replacing the launcher script.
Improve client protocol throughput by introducing the spooling protocol and ship it with documentation, including implementation in the JDBC driver and the CLI.
Add support for data access control with Apache Ranger, including support for column masking, row filtering, and audit logging.

Trino 467

Change default for internal communication to HTTP/1.1.
Add support for OpenTelemetry tracing to the HTTP, Kafka, and MySQL event listeners.
Remove the microdnf package manager from the Docker image.
Add the $all_manifests metadata tables in the Iceberg connector.
Add the $transactions metadata table in the Delta Lake connector.

Trino 468

Add Python user-defined functions.
Rename SQL routines to SQL user-defined functions.
Add cluster overview to the Preview Web UI.
Improve bucket execution for Hive and Iceberg.
Add support for non-transactional MERGE statements for PostgreSQL.

As always, numerous performance improvements, bug fixes, and other features were added as well.

Other news

Trino Gateway 13
Trino Summit recap
Trino in 2024 and beyond, answer our survey!
December 2024 Trino maintainer and contributor calls took place virtually.
Trino Python client 0.332.0 includes support for spooling mode of client protocol.

User-defined functions in Trino

First there were custom plugins with user defined functions, and for a long time, that was all there is.

In 2023, David contributed SQL user-defined functions, also known as SQL routines, and we ran a competition for examples. Manfred wrote the docs and did a training session with Dain and Martin. And even back then, David had plans to add other languages, and started working on Python.

At Trino Summit in 2024 Martin Traverso announced the new upcoming feature in the keynote, and with Trino 468 we shipped support for Python user-defined functions.

Motivation

Why support Python for user-defined functions, as compared to just SQL? Simply put, more is better, and Python is everywhere. We chat with David about the details.

Development history and collaboration

David tell us more about figuring out how to make it all work at all. He touches on topics such as security, performance, deployment, monitoring, and collaboration with other projects. We also talk about why other approaches like using local CPython were discarded.

Architecture and consequences

In this discussion we talk try to cover the following topics:

How does it all work?
What are some restrictions?
What performance can users expect?

Let’s chat about this nesting:

Examples and demo

A simple example from the documentation:

FUNCTION python_udf_name(input_parameter data_type)
  RETURNS result_data_type
  LANGUAGE PYTHON
  WITH (handler = 'python_function')
  AS $$
  ...
  def python_function(input):
      return ...
  ...
  $$

David shows us more, and we talk about the details.

Feedback and future work

We are looking for feedback:

More examples for the documentation for our users
Use cases and experience testing the feature
Production deployment experiences

Future work depends on the feedback but definitely includes the following:

Performance improvements
Fine-tuning of available Python packages

Resources

Rounding out

You are all invited to chat with us about development at the Trino contributor call on the 23rd of January.
Join us on the 30th of January with Mateusz Gajewski to learn about client protocol improvements.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.

Trino in 2024 and beyond

2025-01-07T00:00:00+00:00

Wow, what an amazing year 2024 was for Trino! Martin Traverso presented about the achievements and progress of the project at the recent Trino Summit 2024. Let me dive deeper into the content of his keynote and elaborate some more about our amazing plans for the future.

Statistics

In his first slide of the presentation Enduring with persistence to reach the summit Martin presented some of the amazing statistics of the year:

Over 30 releases packed with features and improvements - Trino releases 436-467
5,000+ additional commits to the 40,000+ total commits since project start
225+ unique contributors in 2024, 925+ total
10.5k+ stars on GitHub
13,500+ Slack members
Trino Community Broadcast episodes 54-67

Improvements

Some of the major improvements in Trino are:

Access controls with Open Policy Agent and Apache Ranger
Improved observability with OpenLineage, OpenTelemetry, OpenMetrics, and Kafka
Significant client protocol improvements
Python user-defined functions
New connectors such as Faker, Snowflake, or Vertica
Numerous improvements on object storage connectors and integrations

Of course we also paid a lot of attention to bug fixes and shipped tremendous performance improvements.

Slides and video

If you want to find out all the details, have a look at the slides and the video recording:

Other projects

Martin also talked about the many improvements in other Trino projects such as Trino Gateway, trino-python-client, the new trino-js-client, and the new trino-csharp-client.

Plans for 2025

For 2025, we have some pretty big plans in addition to our continued software supply chain attention, performance improvemsnts and bug fixes.

Secrets management and dynamic catalogs
Client protocol improvements for all client drivers
Packaging improvements
More connectors such as DuckDB, LanceDB, HsqlDB, Loki, …
Continued and even increased work on performance improvements
Research and prototype towards a next generation optimizer
SQL language improvements such as PIVOT, ASOF joins, …

Of course, what really happens in 2025 and Trino depends on you all. The project lives and breathes only thanks to the efforts of all our contributors and maintainers and we look forward to working with you all.

Trino survey

Besides filing issues, sending pull requests, and discussing topics on Slack and GitHub, we also have some specific questions and would really appreciate your feedback. Answering should take less than a minute.

Help by answering the Trino survey

With Trino as a huge collaborative effort only one thing is for certain:

2025 will be an exciting year for Commander Bun Bun, Trino, and the Trino project.

Trino Summit 2024 resources

2024-12-18T00:00:00+00:00

What a view we had at the summit! Over 700 live attendees enjoyed the sessions and learned more about Trino-related use cases and projects. Now it is time for the additional 1000 registrants, our 13000+ Trino users on Slack, and everyone else in the Trino community and beyond to enjoy the presentations and recordings at their leisure.

Day 1 sessions

Enduring with persistence to reach the summit
Presented by Martin Traverso, co-creator of Trino and CTO at Starburst
Video recording | Slides

Running Trino as exabyte-scale data warehouse
Presented by Alagappan Maruthappan from Netflix
Video recording | Slides

Data lake at Wise powered by Trino and Iceberg
Presented by Peter Kosztolanyi and Abdullah Alkhawatrah from Wise
Video recording

Using Trino as a strangler fig
Presented by Trevor Kennedy from Fanduel
Video recording | Slides

A lakehouse that simply works
Presented by Vincenzo Cassaro from Prezi
Video recording | Slides

Empowering self-serve data analytics with a text-to-SQL assistant at LinkedIn
Presented by Gaurav Ahlawat, Albert Chen, and Manas Bundele from LinkedIn
Video recording | Slides

How Trino and dbt unleashed many-to-many interoperability at Bazaar
Presented by Shahzad Siddiqi, Siddique Ahmad, and Usman Ghani from Bazaar
Video recording | Slides

Maximizing cost efficiency in data analytics with Trino and Iceberg
Presented by Gopi Bhagavathula from Branch
Video recording | Slides

Lessons and news from the AI world for Trino

Manfred Moser, panel moderator and Trino maintainer at Starburst
Gunther Hagleitner, CEO and Co-founder at Waii
Rong Rong, Software Engineer at CharacterAI
William Chang, Co-founder and CTO of Canner and WrenAI
Mustafa Sakalsiz, Founder and CEO at Peaka
Dain Sundstrom, Trino co-creator and CTO at Starburst

Video recording

Day 2 sessions

Trino for observability at Intuit
Presented by Ujjwal Sharma and Riya John from Intuit
Video recording | Slides

Hassle-free dynamic policy enforcement in Trino
Presented by Ramanathan Ramu and Pratham Desai from LinkedIn
Video recording | Slides

Empowering HugoBank’s digital services through Trino
Presented by Mustafa Mirza and Razi Moosa from HugoBank
Video recording | Slides

Optimizing Trino on Kubernetes: Helm chart enhancements for resilience and security
Presented by Sebastian Daberdaku from CardoAI and Jan Waś from Starburst
Video recording | Slides

Virtual view hierarchies with Trino
Presented by Rob Dickinson from Graylog
Video recording | Slides

Opening up the Trino Gateway
   Presented by Manfred Moser and Will Morrison from Starburst,
   Vishal Jadhav from Bloomberg, and Jaehoo Yoo from Naver
    Video recording | Slides

Wvlet: A new flow-style query language for functional data modeling and interactive analysis
Presented by Taro L. Saito from Treasure Data
Video recording | Slides

Securing data pipelines at the storage layer
Presented by Andrew MacKay from Superna.
Video recording | Slides

Empowering pharmaceutical drug launches with Trino-powered sales data analytics
Presented by Harpreet Singh from Gilead
Video recording

Connecting to Trino with C# and ADO.net
Presented by George Fischer from Microsoft
Video recording | Slides

Our thanks go out to all our speakers as well as our event sponsor:

See you at Trino Fest 2025, one of our other events and meetings, and on Trino Slack.

Manfred, Monica, and Anna

The long journey to Apache Ranger

2024-12-02T00:00:00+00:00

Apache Ranger has arrived! With the new Trino 466 you all get another jam-packed release of Trino awesomeness. One of the goodies is a new plugin for access control for your data with Apache Ranger, and it has gone through a long story to get here.

Apache Ranger has a long history and wide adoption as an access control system for data lakes using Hadoop and Hive. Since Trino brings fast analytics to this space, and also supports modern data lakehouses and other data sources, Apache Ranger is a natural fit for access control on a Trino-powered data platform.

The beginnings

Apache Ranger has been in use with Trino for a long time - in fact there are early, rudimentary pull requests from 2019 that implemented some support. And even before then, various hacks existed. In 2020, a plugin for PrestoSQL was added to Apache Ranger. Aakash Nand blogged about Integrating Trino and Apache Ranger in 2021 to adjust for the changes to Trino. Jeff Xu followed up with Integrating Trino and Apache Ranger in a Kerberos-secured enterprise environment in 2022, followed quickly by the addition of the Trino support to the Apache Ranger repository.

Testing and container images

However that was only half of the needed support. The Trino project moves very fast with nearly weekly releases, so the best approach is to have the supporting plugin in Trino directly so every release includes the relevant updates. Erik Anderson created a more mature plugin that was in production use for quite a while for Trino. His pull request from July 2022 included great background reasoning for having the plugin in Trino. One of the issues that Erik solved for the Trino project is testing. Trino plugins require the availability of a container image for testing whatever integration. Apache Ranger did still not ship a container in 2022, but thanks to the lobbying efforts of Erik this changed and a container image became available over the months.

A long sprint

Unfortunately, focus changed and while the PR from Erik existed and was useful, it never made it to merge due to waning priorities. That changed when Madhan Neethiraj from the Apache Ranger project stepped up and created new PR in July 2024.

We knew this could be another shot at it, and it would require a lot of work to get it done, since we put a high focus on quality so that we can maintain the Trino codebase for the long run. Monitoring all PRs regularly I (Manfred Moser) noticed it and jumped in with first help.

Erik and other interested users chimed in. lozbrown and Manfred helped with documentation and getting other developers interested. The heavy technical reviews and lots of guidance came from Krzysztof Sobolewski and Grzegorz Kokosiński.

During the whole process, Madhan had to react to comments, update the code, and also regularly rebase his PR to adjust for the constantly changing Trino codebase in the master branch. Starburst recognized Madhan’s effort and featured him as Starburst Trino Champion. Interestingly, the container image ended up not being used for testing, however it will be crucially important for many users deploying Apache Ranger on Kubernetes anyway. Nearly 400 comments and over four months later we all got to celebrate. The Trino maintainer Grzegorz took on the responsibility and merged the PR. Yuya Ebihara and Martin Traverso followed up with minor cleanups, and we finally shipped the plugin as part of Trino 466.

A huge congratulations and thank you goes out to everyone involved.

Now it is your turn to have a look at the documentation, learn more about Trino and Apache Ranger, and maybe even proceed to help us improve the integration.

Next steps

Beyond our celebration, more tasks are waiting for all of us:

Test it out in your usage and migrate from any old or custom versions.
Help us improve the documentation significantly to allow easier adoption.
Work with lozbrown on adding support to the Helm chart.
Check out the codebase and help us fix bugs and add features.

And last, but not least - join us all to celebrate Trino at the upcoming Trino Summit 2024 for two days of amazing sessions and interaction with your peers from the Trino community and the Trino Contributor Call for more open community chat and discussion.

The glorious lineup for Trino Summit 2024

2024-11-22T00:00:00+00:00

We just wrapped up our mini training series SQL basecamps before Trino Summit, and now Trino Summit 2024 is less than three busy weeks away. It’s a good thing that we have also been working hard on all the preparations for the summit. Everything is coming together, and we are excited to share the full lineup for the free, virtual, two day event today.

In our first glimpse at the summit we were able to share a few sessions with more details. Now have a look at the whole lineup with speakers from all these and many other companies:

Make sure you register to get up to date information and more details for all the sessions. It will allow you to join us live, chat with the speakers during the event. You will also get important session follow up information, including recordings and slide decks becoming available, so you can review, watch anything you missed, and share sessions with your peers.

Keynote

In the keynote Enduring with persistence to reach the summit Martin Traverso, co-creator of Trino and CTO at Starburst, covers the developments from 2024 in the Trino projects and the Trino community. Martin also reveals details about new features, new projects, and plans for 2025.

Panel discussion

The hype and reality of AI has swept through the industry. In the panel discussion Lessons and news from the AI world for Trino, Manfred Moser is moderating experts from the community:

Gunther Hagleitner, CEO and Co-founder at Waii
Rong Rong, Software Engineer at CharacterAI
William Chang, Co-founder and CTO of Canner and WrenAI
Mustafa Sakalsiz, Founder and CEO at Peaka
Dain Sundstrom, Trino co-creator and CTO at Starburst

All panelists have have extensive experience with AI and Trino, and will share their knowledge and different perspectives.

Sessions

The following sessions allow our speakers to really dig into the details of their topic:

Optimizing Trino on Kubernetes: Helm chart enhancements for resilience and security presented by Sebastian Daberdaku from CardoAI and Jan Waś from Starburst
Trino for Observability at Intuit presented by Ujjwal Sharma and Riya John from Intuit
Opening up the Trino Gateway presented by the Trino Gateway maintainers
Data Lake at Wise powered by Trino and Iceberg presented by Peter Kosztolanyi and Abdallah Alkhawatrah from Wise
Hassle-free dynamic policy enforcement in Trino presented by Ramanathan Ramu and Pratham Desai from LinkedIn
Empowering self-serve data analytics with a text-to-SQL assistant at LinkedIn presented by Gaurav Ahlawat, Albert Chen, and Manas Bundele from LinkedIn
A Lakehouse that simply works presented by Vincenzo Cassaro from Prezi
Securing data pipelines at the storage layer presented by Andrew MacKay from Superna.
Maximizing cost efficiency in data analytics with Trino and Iceberg presented by Gopi Bhagavathula from Branch
Wvlet: A new flow-style query language for functional data modeling and interactive analysis presented by Taro L. Saito from Treasure Data
Running Trino as exabyte-scale data warehouse presented by Alagappan Maruthappan from Netflix

Lightning talks

Our lightning talks provide inspiration with some great examples of Trino adoption and usage:

Using Trino as a strangler fig presented by Trevor Kennedy from Fanduel
Virtual view hierarchies with Trino presented by Rob Dickinson from Graylog
Empowering HugoBank’s digital services through Trino presented by Mustafa Mirza and Razi Moosa from HugoBank
How Trino and dbt unleashed many-to-many interoperability at Bazaar presented by Shahzad Siddiqi, Siddique Ahmad, and Usman Ghani from Bazaar
Connecting to Trino with C# and ADO.net presented by George Fischer from Microsoft

Our special thanks go out to all our speakers as well as our event sponsor:

See you on the summit.

Manfred, Monica, and Anna

View the SQL basecamps before Trino Summit

2024-11-21T00:00:00+00:00

Trino Summit is inching closer fast, and we are busy with all the preparation. Nevertheless, we thought we bring you some more SQL and Trino-related training. The two live classes from our SQL basecamps before Trino Summit are now available for you all to enjoy, just in case you missed it.

In the two classes I teamed up with Dain Sundstrom and Martin Traverso, and created a interview-style training classes. Hopefully you learned something from their insights, and my guidance and questions.

Check out the two session recordings and the supporting material:

Moving supplies

In the first episode SQL basecamp 1 – Moving supplies Dain and I discussed the core concepts of a Trino-powered lakehouse, getting data in and maintaining the lakehouse.

Look at the slides

Getting ready to summit

The second episode SQL Basecamp 2 – Getting ready to summit builds on the foundation established in episode 1. Martin and I discussed some further details for lakehouse usage and then looked at structural data types and views.

Look at the slides

Next up, Trino Summit

If you think those two sessions were great, how about two days worth of great presentations at Trino Summit?

Trino and Javascript?! YES!

2024-11-18T00:00:00+00:00

Trino is written in Java. Trino contributors and maintainers are often veterans in the Java ecosystem and community, and Trino is very modern when it comes to Java. For example, Trino now requires the latest Java version and actively uses new features.

When it comes to JavaScript however, the story is a bit more complicated. Of course, JavaScript is commonly used in the Trino ecosystem and codebase. Let’s look at some of the specifics.

Client driver and applications

Client applications that allow users to submit queries to Trino, and then receive the results are written in numerous languages. Trino has good support for many of them.

Thanks to the collaboration with Filipe Regadas and the contribution of his JavaScript client driver to the Trino community, we now have an official trino-js-client project. After his initial donation we have applied numerous improvements and recently cut our first release.

The client is already used in the VisualCode support, the Emacs support, the example project discussed in Trino Community Broadcast episode 63, and numerous other applications.

And we have big plans as well:

Add support for more authentication methods supported in Trino
Improve documentation and example projects
Add support for the new spooling client protocol from Trino
Test with Trino Gateway and adjust as needed

While this project is a great addition for many users of Trino and their custom web applications, there are numerous other usages of JavaScript in the project.

User interfaces

Web-based user interfaces are one important use of JavaScript. Trino includes the Trino Web UI and the ongoing effort to replace it with a more modern and feature rich UI - currently called the Preview UI. It was inspired by the replacement of the legacy UI for Trino Gateway with a new UI based on current tools and libraries.

All three user interfaces require constant work in terms of upkeep to current libraries, bug fixes, and addition of new features.

Other projects

Beyond the user interfaces we also provide a plugin for Grafana that is mostly written in Javascript, and there might be more projects on the way.

What’s next?

The skills and experience needed for all these JavaScript-based efforts are different enough to ensure that there are developers out there who can help in these efforts without knowing much about Trino and Java.

If that is you, we want to hear from you. And if you are also knowledgable in Trino, Java, and many other things, and also interested to help on the JavaScript stuff, we also want to hear from you. There is always more stuff we want to get done and we need your help.

So have a look at the codebase that interests you the most, chat with us on Trino Slack, join an upcoming Trino contributor call and Trino Summit, and let me know if you would be interested in a regular Trino JavaScript call - for example monthly?

And if you don’t want to code in Java or JavaScript? Well, you can help us write documentation in Markdown, work on the Python client, the Go client, or maybe even contribute a client we don’t even have yet.

In all cases, we look forward to your help.

67: Extra speed with Exasol and Trino

2024-10-30T00:00:00+00:00

Host

Manfred Moser, Director of Trino Community Leadership at Starburst - @simpligility
Cole Bowden, Developer Advocate at Firebolt

Guests

Thomas Bestfleisch, Senior Product Manager at Exasol

Releases and news

Follow are some highlights of the recent Trino releases:

Trino 461

Add support for the add_files and add_files_from_table procedures in the Iceberg connector.

Trino 462

Add support for read operations when using the Unity catalog as Iceberg REST catalog in the Iceberg connector.
Improve performance and memory usage when decoding data in the CLI.

Trino 463

Enable HTTP/2 for internal communication by default.
Add timezone() functions.
Include table functions with SHOW FUNCTIONS output.
Add support for writing change data feed when deletion vector is enabled to the Delta Lake connector.

Trino 464

Require JDK 23 to run Trino.
Add the Faker connector.
Add the Vertica connector.
Remove the Accumulo connector.

As always, numerous performance improvements, bug fixes, and other features were added as well.

Trino maintainer call - great sync with some exciting news coming to the community soon.
Trino contributor call - recording and minutes available now.
Trino Kubernetes operator meeting - minutes coming soon.
Trino Summit call for speakers closed - stay tuned for announcements and don’t forget to register.

Introducing Thomas and Exasol

Exasol is a lightning fast, in-memory database for analytics. And this is not just a marketing slogan. Exasol has been at the top of the TPC-H benchmarks for a long time now. Thomas tells more about the database and his role.

Exasol and Trino

Trino and Exasol bridge the gap between extreme performance with in-memory usage from Exasol, and massive scale from a lakehouse with Trino.

We learn more about Exasol as Thomas guides us through his presentation about Exasol and Trino, and take the opportunity to question him for more details.

The pull request for the Exasol connector has been a long time in the works and was finally merged for Trino 452. We talk about the motivation, the process, the results, and the future for the connector.

Resources

Rounding out

SQL basecamps before Trino Summit
Trino Summit 2024: Information about first sessions and more available. Call for speakers closed. Announcements coming soon.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.

A glimpse at the summit

2024-10-17T00:00:00+00:00

Our efforts around Trino Summit 2024 are ramping up and the event is creeping closer and closer. We are really looking forward to the two-day, free, virtual event in December about all things Trino.

While we are working hard to put together the SQL basecamps before Trino Summit training sessions and other community events, a number of your awesome peers from the Trino community submitted session proposals, and we are excited to share that glimpse on the agenda for Trino Summit 2024.

First batch of sessions

Let’s see what already settled on the agenda.

Running Trino as exabyte-scale data warehouse

Presented by Alagappan Maruthappan from Netflix

Netflix operates over 15 Trino clusters, efficiently handling more than 10 million queries each month. As the initial creator of the Apache Iceberg, Netflix has over 1 million Iceberg tables extensively using the Trino Iceberg connector. In this session we talk about the operational challenges faced, internal efficiency improvements, and our experience with upgrading to the latest Trino version.

A Lakehouse that simply works

Presented by Vincenzo Cassaro from Prezi

With the billions of tech and vendor proposal, it’s easy to loose track of what truly matters. Vincenzo would like to show how a simple combination of established, maintained, open source technologies can make a lakehouse that truly works for a 150M users company.

How Trino and dbt unleashed many-to-many interoperability at Bazaar

Presented by Shahzad Siddiqi, Siddique Ahmad, and Usman Ghani from Bazaar

Learn how Bazaar leveraged the combined power of Trino and dbt to scale their data platform effectively. This talk delves into the strategies and technologies used to enable many-to-many integration, fueling data-driven decision-making across the organization.

Maximizing cost efficiency in data analytics with Trino and Iceberg

Presented by Gopi Bhagavathula from Branch

At Branch, we realized that our existing architecture, was not only expensive but also becoming unsustainable as data volumes grew for one of our business units and we decided to adopt Trino and Apache Iceberg. Our journey of migrating from Apache Druid to Trino and Iceberg taught us that the right combination of tools can transform data analytics for one of our internal business units, offering the perfect balance between cost savings, performance, and scalability. Learn more how we achieved 7-figure savings with a few “compromises”.

Using Trino as a strangler fig

Presented by Trevor Kennedy from Fanduel

This talk discusses how FanDuel uses Trino to migrate analysts from Redshift to Delta Lake using Martin Fowler’s Strangler Fig pattern. Trino slowly took roots after initial trials, started replacing parts of the legacy system, and eventually will be a complete replacement with a shadow of the original system.

Enduring with persistence to reach the summit

Presented by Martin Traverso from Starburst

In the keynote Martin presents the latest and greatest news from the Trino project and the Trino community. With more contributors, more maintainers, and a larger community we got a lot done since Trino Fest in June. Find out the details from the co-creator of Trino.

Surely, you don’t need any more convincing and you are ready to proceed to

Continued call for speakers

Now that you registered and saw what others have submitted and got accepted, we are sure you are thinking:

Well, thats interesting, but I can submit a talk like that and even better!

We agree and know you are up to it, so go ahead and submit a proposal:

Submit a talk!

And if necessary, check the original announcement for more tips and ideas.

To make the event a smashing hit, we are also looking for more sponsors. Starburst, as the organizing sponsor of the event, is excited and interested to collaborating with other organizations from the Trino community. If you are interested in sponsoring, email events@starburstdata.com for information.

A Kubernetes operator for Trino?

2024-10-10T00:00:00+00:00

Trino is deployed everywhere – on-premise, in private data centers, in the cloud with hosting providers, on bare metal servers, on virtual machines, and with containers. With all these options for deployments, a Kubernetes-based platform with a container emerged as the most widely used approach.

The Trino project caters for this usage with our container images for every release and our Helm chart. However we keep hearing from people who want to use a Kubernetes operator…

Existing operators

We know that various companies have Kubernetes operators developed internally, and we also know that open source ones exist, for example:

trino-operator from Stackable with integration in trino-lb
Charmed Trino K8s Operator from Canonical

Ideally these separate efforts can combine their work, and create a great operator in the Trino project that is closely aligned with Trino itself, and also suitable for future integration with Trino Gateway. In fact, the Trino Gateway is a good example where different parties came together and considerably innovated together. Hopefully we can achieve the same with the operator. It can still be expandable and modular to suite for specific needs on different platforms and for different users.

We also know that this is a long standing community wish from the issue and various discussions with users.

Discussing next steps

However there are some complications such as choice of programming language or commitment to help within the Trino project as subproject maintainer. We kicked off some of these discussion in the past at Trino contributor meetings, and hope that now is a good time to continue.

To that end we are arranging a community meeting:

Virtual video call
30th of October 2024
8:00 PDT / 11:00 EDT / 15:00 GMT / 16:00 CET
Invite available from Manfred on Trino Slack or via email:

Tell Manfred you want to join

We will also post connection details on the #kubernetes channel and we are collecting related discussion points on our contributor meeting page.

Looking forward to a great discussion.

SQL basecamps before Trino Summit

2024-10-07T00:00:00+00:00

Later in December your knowledge of our Trino SQL query engine will certainly peak again at Trino Summit 2024. To reach those heights and absorb all there is to learn at Trino Summit, you need to get ready.

That is why I teamed up with our Trino creators and BDFLs – Martin Traverso, Dain Sundstrom, and David Phillips. We aim to be your coaches and trainers to get you ready and get to the summit without the need for oxygen masks and sherpas. Join us for the “SQL basecamps before Trino Summit”, where we expand on our past SQL training series with two new episodes.

Both planned sessions provide a high-level overview and some practical tips and tricks over the course of an hour. The sessions are completed by an open questions and answers section with the speakers.

Moving supplies

In the first episode SQL basecamp 1 – Moving supplies David and Dain will help me provide an overview of the wide range of possibilities when it comes to moving data to Trino and moving data with Trino.

We specifically look at the strengths of Trino for running your data lakehouse and migrating to it from legacy data lakes or other systems. SQL skills discussed include tips for creating schemas and tables, adding and updating data, and inspecting metadata. We talk about table procedures for data management and also cover some operational aspects. For example, we talk about the right configuration in your catalogs for your object storage, specifically the new file system support in Trino.

Getting ready to summit

The second episode SQL Basecamp 2 – Getting ready to summit builds on the foundation established in episode 1. Data has moved into the lakehouse, powered by Trino, and more data is added and changed as part of normal operation. In this episode Martin and myself look at maintaining the data in a healthy state and explore some tips and tricks for querying data. For example, we look at data management with procedures, analyzing data with window functions, and examine more complex structural data.

What do want to learn

So there you have it - enough reason to register. Well, if not we can do better: Both sessions are aimed at all of you out there using Trino and we are ready to discuss your questions during class. More importantly though, I would also love to hear your suggestions for these and other topics about SQL and Trino. We can adjust this series, figure out a session for Trino Summit, or bring another SQL training series to you next year.

Submit an idea to Manfred

Trino Summit needs you!

Now with all that in mind, what are you waiting for? Get ready to learn more about SQL with Trino in the series and at Trino Summit.

I am convinced - register now

And of course, we are also interested in your speaker proposals and sponsorships for Trino Summit to make it an awesome event for everyone again.

See you soon,

Manfred

23 is a go, keeping pace with Java

2024-09-17T00:00:00+00:00

Only about ten Trino releases or six months ago, we released Trino 447 with the requirement to use Java 22. In recent releases we started to take more and more advantage of features that are only available with that upgrade. We made some big steps in terms of performance and talked talked about some of those performance enhancements around aircompressor in the recent Trino Community Broadcast 65.

The Java community runs its release processes on a very predictable schedule - March and September mean new Java releases. This time it’s Java 23, and Trino will not be left behind. We are upgrading to use and require Java 23 soon!.

Background and motivation

While the new features and improvements in Java 23 are not as impactful as in Java 22, we still need to keep pace to take advantage of the improvements and avoid any problems in the future. Here are the Java Enhancement Proposals that are included with Java 23:

If you want to learn more you can check out the short summary video or the three hour long launch stream. The Oracle press release as well as the community announcement also bring you a wealth of further information.

Overall our reasoning is unchanged from the upgrade to 21 and the upgrade to 22. So what are we specifically doing now?

Current status and plans

Early access binaries have been in use in our continuous integration builds for months. Java 23 launched today and the various JDK distribution binary packages will become available shortly. We are executing on the same blueprint as last time:

Wait for Eclipse Temurin binaries.
Ensure everything works with Java 23.
Change the container image to use Java 23.
Cut a release and get community feedback from testing with the container.
Adjust to any feedback and available improvements for a few releases.
Switch the requirement for build and runtime to Java 23.
Cut another release and celebrate.

Timing on all the work depends on obstacles we find on the way and how we progress with removing them. We use the Java 23 tracking issue and the linked issues and pull requests to manage progress, discuss next steps, and work with the community.

Feel free to chime in there, find us on the #core-dev channel on the Trino community Slack or join us for a contributor call.

66: Chat with Trino and Wren AI

2024-09-12T00:00:00+00:00

Host

Manfred Moser, Director of Trino Community Leadership at Starburst, (@simpligility)

Guests

Himanshu Mendapra, Software Engineer at Genuin
William Chang, CTO and Co-Founder at Canner
Yadia Colindres, Product Management Advisor at Canner

Releases and news

Trino 458

Deactivate legacy file system support for all catalogs. You must activate the desired file system support with fs.native-azure.enabled, fs.native-gcs.enabled, fs.native-s3.enabled, or fs.hadoop.enabled in each catalog using the Delta Lake, Hive, Hudi, or Iceberg connectors.
Add support for tracing with OpenTelemetry to the JDBC driver.
Reduce data transfer from remote systems for queries with large IN lists in numerous connectors.

Trino 459

Docker container now uses Java 23. Please test this and let us know of any problems since Java 23 is going to be a requirement soon.
Add support for KiB and similar data size units for the Trino CLI output.
Allow configuring maximum concurrent HTTP requests to Azure on every node
Add support for WASB to Azure Storage file system support.
Improve cache hit ratio for the file system cache.
Remove the local file connector.

Trino 460

Add support for using an Alluxio cluster as file system cache.
Add support for WASBS to Azure Storage file system support.
Remove the atop connector.
Remove the Raptor connector.
Numerous performance improvements for the Clickhouse connector.

As usual, numerous performance improvements, bug fixes, and other features have been added as well.

Updated and improved documentation for contributors for Trino, Trino Gateway, and other Trino projects.
Jan Was steps up as subproject maintainer for trino-js-client.
Cristian Osiac, Jordan Zimmermann, and Pablo Arteaga are working on aws-proxy.

Introducing Himanshu

Working at Genuin as software engineer, learning about new technologies, and occasionally contributing to open source projects like Wren AI.

Introducing William and Yadia

William is co-founder at Canner and drives everything about Canner Enterprise and Wren AI as CTO. Yadia works with William at Canner and is product manager for Wren AI.

We talk about the history of Canner and their usage of Trino in Canner Enterprise.

Pivoting to talk about Wren AI, we learn about its architecture, use cases and features, and continue along with an extensive demo of Wren AI.

Resources

Rounding out

A call out to help us clean up and close old issues.
Trino Summit 2024 is coming on the 11th and 12th of December, and registration, call for speakers, and sponsorship opportunities are open.
Join us for the next Trino Community Broadcast 67 about the Exasol database and Trino connector.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.

65: Performance boosts

2024-09-12T00:00:00+00:00

Hosts

Manfred Moser, Director of Trino Community Leadership at Starburst, (@simpligility)
Cole Bowden, Developer Advocate at Starburst

Releases and news

Trino 455

Add query starting time in QueryStatistics in all event listeners, including the new Kafka event listener.
Allow configuring endpoint for the native Azure filesystem.

Trino 456

Invalid - release process errors resulted in invalid artifacts.

Trino 457

Improve performance of queries involving joins when fault-tolerant execution is enabled.
Improve performance for LZ4, Snappy and ZSTD compression and decompression.
Publish a JDBC driver JAR without bundled, third-party dependencies.
Improve performance for concurrent write operations on S3 by using lock-less Delta Lake write reconciliation, made possible with the release of the AWS SDK with S3 conditional write support.

As usual, numerous performance improvements, bug fixes, and other features have been added as well.

Performance boosters

We chat about some of the following aspects and projects and their impact on Trino:

Role and history of Aircompressor.
Foundation from Airlift.
Relation to Java 22, and soon 23.
Status and next steps for improved and modernized file system support.
A quick glance at client protocol improvements.

Resources

Rounding out

We chat about the recent cleanup of unused Slack channels.
A call out to help us clean up and close old issues.
Check out our new video call background images.
Trino Summit 2024 is coming on the 11th and 12th of December, and registration, call for speakers, and sponsorship opportunities are open.
Join us for the next Trino Community Broadcast 66 about Wren AI and Trino.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.

64: Control with Open Policy Agent OPA

2024-08-22T00:00:00+00:00

Hosts

Manfred Moser, Director of Trino Community Leadership at Starburst, (@simpligility)
Cole Bowden, Developer Advocate at Starburst

Guests

Sebastian Bernauer, Software Developer at Stackable
Sönke Liebau, Co-Founder and CPO at Stackable

Releases and news

Trino 454

Improve performance for queries that contain multiple aggregate functions, including DISTINCT.
Add Kafka event listener plugin (yet to be documented).
Add configuration for fetch size with JDBC-based connectors (yet to be documented).
Add support for writing Deletion Vectors with the Delta Lake connector.
Add new Resources tab in the web interface with data from the new light-weight query endpoint /v1/query?pruned=true.
Add new Preview Web UI (help us test and develop!).
Add S3 security mapping for the native S3 filesystem.

As usual, numerous performance improvements, bug fixes, and other features have been added as well.

Stackable, OPA, and more

We chat with Sönke and Sebastian about the following agenda topics:

What is Stackable?
Open Policy Agent (OPA) authorization plugin
- History
- Recent development
- Compatibility layer to Trino’s file-based access control
- Quick demo on row filtering and column masking
Auto-scaling Trino clusters using trino-lb
- Differences between Trino Gateway and trino-lb

Other aspects we discuss include the following:

Performance considerations
Aspects of Trino on Kubernetes such as graceful shutdown, PodDisruptionBudgets, and anti-affinity
Plans for next steps

Other resources

Presentation slide deck
Video for Trino OPA Authorizer - Stackable and Bloomberg at Trino Summit 2023 presented by Sönke from Stackable and Pablo Arteaga from Bloomberg
Source code repo for compatibility layer between Trino classic file-based access control JSON and OPA/Trino
Longer demo video for row filtering and column masking

Rounding out

Trino Summit 2024 is coming on the 11th and 12th of December, and registration, call for speakers, and sponsorship opportunities are open.
Next Trino Community Broadcast 65 about the new Exasol connector.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.

63: Querying with JS

2024-08-01T00:00:00+00:00

Hosts

Manfred Moser, Director of Trino Community Leadership at Starburst, (@simpligility)
Cole Bowden, Developer Advocate at Starburst

Guest

Emily Sunaryo, DevRel Intern at Starburst

Releases and news

Trino 452

Add Exasol connector.
Add support for the euclidean_distance(), dot_product(), and cosine_distance() functions.
Add support for using the BigQuery Storage Read API when using the query table function with the BigQuery connector.
Add query table function for full query pass-through to the ClickHouse connector.
Numerous improvements on the Delta Lake, Hive, Hudi, and Iceberg connectors and the related file system support in Trino.

Trino 453

Improved performance for non-equality joins.
Support for setting the SQL path for JDBC driver and CLI.
New execute procedure to run arbitrary statements in the underlying data source.
Support for reading pgvector vector types in PostgreSQL connector.
Support for views when using the Iceberg JDBC catalog.

As usual, numerous performance improvements, bug fixes, and other features have been added as well.

Guest Emily Sunaryo

Emily Sunaryo is a recent UC Berkeley graduate working in the Developer Relations team at Starburst. She has a passion for both technical development and also enablement of developer communities. With her degree in Data Science, she is also interested in learning more about modern approaches to data analytics and how emerging technologies can drive innovation in this space.

Trino clients

Trino clients come in many shapes and forms, but all of them allow users to run SQL queries in Trino and access the results. They all use the Trino client REST API. To make it easier for developers of these applications, as well as any custom application, we provide a number of drivers as language-specific wrappers. These include the JDBC driver, the Python client, the Go client, and others.

JavaScript

Filipe Regadas agreed to transfer his trino-js-client project to trinodb and is now subproject maintainer. We are in the process of getting to a first release ready to ship. We would love for you to help us!

Learning about Trino

Emily’s journey and bringing it all together. From university and Starburst internship to the Trino Community Broadcast, and a working demo web application.

Demo time

Emily talks about her demo web application using React, npm, and various other libraries and tools to build a data application. The data resides in Trino, specifically in Starburst Galaxy to make the management easier, and she uses the trino-js-client in her application to run some pretty complex SQL queries again the NYC rideshare data set.

Find more details in the source code repository.

Rounding out

Trino Summit 2024 is coming on the 11th and 12th of December, and registration, call for speakers, and sponsorship opportunities are open.
Next Trino Contributor Call on the 22nd of August.
Next Trino Community Broadcast 64 with the Stackable team about OPA on the 22nd of August.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.

62: A lakehouse that simply works at Prezi

2024-07-11T00:00:00+00:00

Hosts

Manfred Moser, Director of Trino Community Leadership at Starburst, (@simpligility)
Cole Bowden, Developer Advocate at Starburst

Guest

Vincenzo Cassaro - @viciocassaro, Data Engineer at Prezi

Releases and news

Trino 451

Add support for configuring a proxy for the S3 native file system.
Add t_pdf and t_cdf functions.
Improve performance of certain queries involving window functions.
Lots of Iceberg connector improvements including support for incremental refresh for basic materialized views.

Guest Vincenzo Cassaro

Vincenzo has been working with data in all its forms, from data modeling to analytics and ML, since he completed his masters in computer engineering in Italy. He is joining us from there, more specifically from Sicily, to chat with us about how he got into computers, learned about Trino, and ended up at Prezi now.

About Prezi

Prezi probably doesn’t need any introduction, but just in case: Prezi is a popular and powerful platform to create and show engaging presentations, videos, and infographics.

A Lakehouse that simply works

With so many different technologies and vendors making proposals, it’s easy to lose track of what truly matters. We chat with Vincenzo Cassaro from Prezi about how a simple combination of established, maintained, open source technologies can make a lakehouse that truly works at the scale of a company with 150 million users.

Check out the Prezi slide deck for Vincenzo’s talk.

Rounding out

Trino Summit 2024 is coming on the 11th and 12th of December, and registration, call for speakers, and sponsorship opportunities are open.
Next Trino Contributor Call on the 25th of July.
Next Trino Community Broadcast on 1st of August.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.

Announcing Trino Summit 2024

2024-07-11T00:00:00+00:00

Fresh off the heels of Trino Fest 2024, where Commander Bun Bun was busy meeting the Trino community in-person, we’re already looking forward to another, bigger event to round out the year in Trino. For those who’ve been here a while, you know that can only mean one thing: Trino Summit 2024. Much like last year, it will be a two-day, fully virtual event, hosting a wide range of talks covering all things Trino on the 11th and 12th of December. Read on for more info, or if you’re already convinced…

Join us online

Trino Summit is an event that brings together engineers, analysts, data scientists, and anyone else interested in using or contributing to Trino. As the biggest Trino event of the year, we’re excited to bring together professionals from the big data and analytics community, so they can share experiences and insights, make connections, and learn from each other.

The event will be broadcast live, and speakers will be addressing questions asked in chat, so if you want the full experience, make sure to register and attend while the talks are happening. Even if you can’t make it, registering means you’ll be notified when we post videos of all talks to the Trino YouTube channel after the event, so don’t fret - sign up!

Call for speakers

Interested in speaking? We want to hear from everyone in the Trino community who has something to share. We are looking for full sessions (about 30 minutes) and lightning talks (15 minutes). We welcome beginner to highly advanced submissions for talks that are connected to Trino.

A two-day event means we’ve got room for everything, so if you’re unsure about whether to submit a talk, go ahead and do it! We’ll review all submissions, and we’ll do our best to work with you to turn your talk into a smash hit. Some possible topics include:

Best practices and use cases
Data lake, lakehouse, and data federation architectures
Query federation and data migrations
Table formats, file formats, and metadata catalogs
Optimizations and performance improvements
Data engineering, including data cleaning, batch and streaming architectures, and maintenance
Streaming and other data ingestion and pipelines
Data science workflows and analytics
SQL analytics, business intelligence, dashboarding and other visualizations
Data governance and security
Writing advanced SQL queries and pipelines
Help for Trino deployment on-premise and in the cloud
Developing custom connectors and other plugins
Contributing to Trino

Want to speak?

Submit a talk!

Starburst is the organizing sponsor of the event, but to make Trino Summit a smashing success, they’re excited and interested in collaborating with other organizations within the community. If you are interested in sponsoring, email events@starburstdata.com for information.

And regardless of whether you’re planning on attending, speaking, or sponsoring, we look forward to seeing you soon!

Trino Fest 2024 recap

2024-06-24T00:00:00+00:00

Trino Fest 2024 is successfully in the books! While over 100 enthusiastic members of the community gathered in Boston, over 650 virtual attendees joined us worldwide to learn from our expert speakers as they discussed topics such as table formats, enhancements and optimizations, and use cases with Trino both large and small. And now it is your chance to revisit the presentations or catch up on everything you missed.

Impressions

Judging from early results from attendee and speaker feedback, everyone enjoyed the event. Asked about what sessions the audience liked we got answers like

They were all very insightful.
All of it, but especially the realtime demos to see speed difference on query optimization.
and All of them, nothing was missed!

Just like some attendees, our speakers travelled from Europe, Asia, and other places, and enjoyed the event.

Thanks for organizing the awesome event and inviting me for the talk!
Was great to finally meet you and we had a great time at Trino Fest!
Thanks for a great event last week. It was a pleasure to meet you all.

Many of us also met Commander Bun Bun, and we sent greetings to the remote audience as well.

The keynote, the sessions, and all the talk in the hallways confirmed that Trino continues to thrive and expand in usage. Large companies like Apple, Microsoft, LinkedIn, Amazon, and many other users openly talk about shipping Trino as part of their products and using it for internal usage as well. Smaller companies either run Trino themselves or take advantage of Trino-based products for all their data platform needs. Our sessions for Trino Fest offered something to learn for everyone.

Sessions

Now, following is what you are really looking for. All the talks, speakers, short recaps, slide decks, video recordings, and following Q&A sessions, ready for you. Enjoy!

What’s new in Trino this summer
Presented by Martin Traverso from Starburst

Martin recapped everything that’s happened in Trino over the last six months, taking a look at the biggest new features and how Trino development is going better than ever. He also gave a sneak peek at what we can expect soon in Trino.
Video recording | Slides

Reducing query cost and query runtimes of Trino powered analytics platforms
Presented by Jonas Irgens Kylling from Dune.

Jonas gave a detailed talk about how Dune has improved their performance of Trino with a few key tweaks. That includes leveraging caching with Alluxio, advanced cluster management, and storing, sampling, and filtering query results.
Video recording | Slides

Enhancing Trino’s query performance and data management with Hudi: innovations and future
Presented by Ethan Guo from Onehouse.

Ethan gave a look into development on Hudi and Trino’s Hudi connector, explaining multi-modal indexing and how it can improve query performance. He also gave an overview of the roadmap and future of the connector.
Video recording | Slides

Trino Engineering @ Microsoft
Presented by George Fisher and Ishan Patwa from Microsoft.

George and Ishan gave a deep dive into what’s been going on with Microsoft’s deployment and management of Trino. This included clients and integrations, result caching, a sharded SQL connector, deep debugging and monitoring, and seamless security integration with Azure.
Video recording

Enhancing data governance in Trino with the OpenLineage integration
Presented by Alok Kumar Prusty from Apple.

Alok’s lightning talk is all about how Apple deployed OpenLineage, an open framework for data lineage collection and analysis, and built a Trino plugin to publish OpenLineage complaint events that can be viewed and monitored.
Video recording

Best practices and insights when migrating to Apache Iceberg for data engineers
Presented by Amit Gilad from Cloudinary.

Amit shared how Cloudinary expanded their data lake to use Apache Iceberg. He demonstrated how moving from Snowflake to an open table format allowed them to reduce storage costs and leverage different query and processing engines to run more powerful analytics at scale.
Video recording | Slides

Trino query intelligence: insights, recommendations, and predictions
Presented by Marton Bod from Apple.

Marton’s lightning talk explored how Apple has monitored and stored metadata for every Trino query execution, then used that data for for real-time cluster dashboarding, self-service troubleshooting, and automatic generation of recommendations for users.
Video recording

The open source journey of the Trino Delta Lake Connector
Presented by Marius Grama from Starburst.

Marius went into a deep dive on all the work and collaboration that’s gone into making the Delta Lake connector in Trino a robust, first-class connector. Casual discussions, engineers working together, GitHub issues filed by the community, and innovative contributions have all come together, and Marius’ talk shows why an open source community is so powerful.
Video recording | Slides

Tiny Trino; new perspectives in small data
Presented by Ben Jeter and Thomas Zugibe from Executive Homes.

Ben and Tommy explore how Executive Homes uses Trino’s robust suite of integrations to handle data at a small scale. Instead of petabytes, how about a handful of gigabytes in several different systems? It’s something that Trino is well-equipped to handle thanks to how well-supported it is in the data ecosystem, and they explain why.
Video recording | Slides

Bridging the divide: running Trino SQL on a vector data lake powered by Lance
Presented by Lei Xu from LanceDB and Noah Shpak from Character.ai.

Lei and Noah give an overview of LanceDB, how it works, and what makes it a great database for multimodal AI. Then they dive into a Trino connector for Lance, and explore how Trino slots into Character.AI’s workload to blend analytics with training and generating new models.
Video recording | Slides

How FourKites runs a scalable and cost-effective log analytics solution to handle petabytes of logs
Presented by Arpit Garg from FourKites.

With nearly a petabyte of logs being managed at FourKites, it shouldn’t be a huge surprise that they’ve turned to Trino to handle understanding and analyzing them. Arpit discusses how they’ve scaled log ingestion, strategically used S3 with Parquet to minimize storage costs, transformed and extracted those logs at scale, and leveraged Trino to search and explore the datasets with Superset as a frontend for visualization.
Video recording | Slides

Observing Trino
Presented by Matt Stephenson from Starburst.

Starburst has built a comprehensive observability platform around Trino to better serve its users and customers. Matt explored all the components of it, including how to integrate with Jaeger, Prometheus, and ELK.
Video recording | Slides

Accelerate Performance at Scale: Best Practices for Trino with Amazon S3
Presented by Dai Ozaki from AWS.

Dai’s talk explores best practices to get the most out of using Trino in conjunction with Amazon S3. He discusses partitioning, scaling workloads, reducing latency, and resolving common bottlenecks, providing valuable insights for anyone trying to manage and deploy Trino with S3.
Video recording | Slides

What’s next

While you are busy catching up, we are still working hard on a recap of the Trino Contributor Congregation. We also had a lot of great conversations that lead us to follow up action items such as more pull requests to review, new contributors to onboard, and more projects to work on.

Make sure you to join the community on Slack to learn more in the next little while.

Oh, and one last thing…

Trino Summit 2024 registration is open

See you soon,

Manfred, Cole, and Monica

61: Trino powers business intelligence

2024-06-20T00:00:00+00:00

Hosts

Manfred Moser, Director of Trino Community Leadership at Starburst, (@simpligility)
Cole Bowden, Developer Advocate at Starburst

Guest

Patrick Pichler, Owner and co-founder at Creative Data

Releases and news

Trino 449

Add OpenLineage event listener.
Add support for views when using the Iceberg REST catalog.
Improve write performance for Parquet files in Hive, Iceberg, and Delta Lake connector.
Improve equality delete performance in Iceberg connector.

Trino 450

Improve performance for the first_value(), last_value(), date_trunc(), date_add(), and date_diff() functions.
Add support for concurrent UPDATE, MERGE, and DELETE queries in Delta Lake connector.
Add support for reading UniForm tables in Iceberg connector.
Add support for TRUNCATE in Iceberg and Memory connector.
Automatically configure BigQuery scan parallelism.

First recap from Trino Fest 2024

Cole and Manfred chat a bit about Trino Fest last week, mentioning that all videos are now available, and a blog post with slides and more material is coming as well.

Impression from Trino Contributor Congregation

Manfred and Dain lead the discussions in the congregation. We are excited about a lot of the follow ups for the project and increased collaboration and innovation.

Guest Patrick Pichler

Patrick specializes in providing guidance, designing, and implementing sustainable data, analytics and AI solutions utilizing open architectures at Creative Data. He has a long history of working in the data and data platform space as user, developer, administrator, manager, consultant, and educator.

PowerBI overview

Power BI is an interactive data visualization software product suite developed by Microsoft with a primary focus on business intelligence. We talk about the different available products and features, and their usage in the community.

Trino client support options for Power BI

Typically, Power BI relies on ODBC drivers for connecting to specific data sources. Since there is no open source Trino ODBC driver however, Patrick and other clever developers have created a Power BI client that connects to Trino directly via the client REST API - the PowerBITrinoConnector. We discuss the details and limitation of both approaches, look at the source code, and learn about import and direct query modes.

Demo

Patrick showcases how to install and use the connector in his demo of Trino and Power BI.

Rounding out

Trino Summit 2024 is coming on the 11th and 12th of December, and registration is open now.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

One busy week to go before Trino Fest 2024

2024-06-06T00:00:00+00:00

This week has surely started off with a big bang and another boom in the data platform world. Snowflake introduced the open source Polaris catalog as implementation of the Iceberg REST catalog specification. And Databricks, the main driver of the Delta Lake table format, announced their acquisition of Tabular, a main driver in the Apache Iceberg community.

Interestingly enough, Trino is in the middle of all this with great support for Delta Lake, Hudi, Iceberg, and also the Iceberg REST catalog. And if all that interoperability with Trino is not enough reason to join us next week at Trino Fest 2024, I have some more ideas for you to consider.

Reasons to attend Trino Fest

Trino Fest is happening next week on the 13th of June, and following are all the reasons I can think of why you should tune in.

The event is free for all attendees. It is available as an in-person event in Boston and for virtual attendance across the rest of the world.
You can learn about real world experience with Trino, Delta Lake, Iceberg, Hudi, and many other data sources, clients, and add-ons.
Many Trino friends, users, and contributors from around the world and companies like Amazon, Apple, Bloomberg, character.ai, Dune, LanceDB, Microsoft, Onehouse and Starburst are going to attend and present.
Monica Miller and Manfred Moser will guide you through the event with the help of the awesome Starburst Trino events team.
In-person attendees might just meet our mascot, Commander Bun Bun.
On the following day, the Trino Contributor Congregation will dive super deep into technical details and collaborative efforts.

Convinced yet, or still wondering. In either case, go and have a look at the detailed agenda and then register to attend.

And last, but not least thank you to our sponsors for making this event happen…

60: Trino calling AI

2024-05-22T00:00:00+00:00

Hosts

Manfred Moser, Director of Trino Community Leadership at Starburst, (@simpligility)
Cole Bowden, Developer Advocate at Starburst

Guest

Isa Inalcik, Principal Data Engineer at BestSecret Group

Releases and news

Trino 446

Add support for the Snowflake catalog in the Iceberg connector.
Add support for reading S3 objects restored from Glacier storage in the Hive connector.
Add support for unsupported type handling configuration in the Snowflake connector.

Trino 447

Add support for SHOW CREATE FUNCTION.
Require Java 22.
Add support for concurrent DELETE and TRUNCATE in the Delta Lake connector.
Remove support for Phoenix 5.1.x and earlier.

Trino 448

Improve performance of reading from Parquet files.
Add support for caching Glue metadata with the update to use the V2 REST interface.

Trino Gateway 8 and 9

Add support support for configurable router policies with two new policies available.
Add a Helm chart for deployment.
Add new website.

We also had a new Trino Helm chart release 0.20.0.

Jan Waś is now also subproject maintainer of the go client and the Helm charts.

Impressions from the Iceberg Summit

Last week, Cole attended the Iceberg Summit with a special Trino perspective, and we chat about his impressions and major take-aways.

Guest Isa Inalcik from BestSecret

Isa is a highly skilled data expert with over a decade of hands-on experience in software development lifecycle. He is well versed with many data tools including Trino/Starburst Enterprise Platform, Snowflake, Airflow, Apache Spark, Hive, Apache Iceberg, dbt, and others.

Trino at BestSecret

At BestSecret, a leading online retailer for fashion and lifestyle in Europe, Isa spearheads the development of efficient and resilient ELT/ETL pipelines and the implementation of data and AI-driven solutions. We chat in more details about their setup and use cases, his solutions, and challenges he is facing.

Generative AI interest and use cases

Isa has been following the waves of interest in AI and sees the following use cases related to data and Trino:

Media (Audio,Video,Image): Extract information out of images.
Object categorization: Categorize objects on images, videos.
Data masking: For anonymizing sensitive data from unstructured text.
Data extraction: To pull structured information from unstructured text.
Sentiment analysis: For gauging the sentiment of textual data.
Language detection or translation: For language detection or translating.
Summarization: To generate concise summaries from lengthy texts.

This inspired him to try an integration of the new emerging LLMs with Trino.

Trino SPI

Trino uses a service provider interface (SPI) to allow developers to create plugins for features such as connectors, security integrations and custom functions. This is crucial for business to implement required functionality and enabled Isa to work on a plugin to support custom functions that call LLMs.

The OpenAI API specification also allowed him to create one function that can be used with different LLM backends.

Proof of concept and demo

We look at the concept and implementation that Isa developed with the following architecture:

Isa’s trino-ai repository contains source code and more details as mentioned in his post on LinkedIn and used in the demo.

Other resources

Post from Isa: Maximize Performance: The Secret to Scaling Trino Clusters with KEDA
Post from Isa: Enhancing Security and Observability in Trino with Open Policy Agent and OpenTelemetry
Ollama system used to run LLMs
Trino SPI documentation, including custom function creation

Rounding out

Trino Fest news:

Finalized speaker lineup announced
Register for event and hotel now
Special thanks to our Trino Fest sponsors - Starburst as event host and Alluxio, Cloudinary, Onehouse, Startree, and Upsolver as event sponsors.
Contact us to join the Trino Contributor Congregation the next day.

Big names round out the Trino Fest 2024 lineup

2024-05-08T00:00:00+00:00

We gave a sneak peek of the Trino Fest lineup a month ago, and we’re excited to now bring you the full lineup for the event. We’ve got some major names being added, including Amazon, Microsoft, and another talk from Apple. With Fourkites and a joint talk with LanceDB and CharacterAI also added to the schedule, we’re excited to present the full lineup for Trino Fest 2024.

Trino Fest is barely a month away on the 13th of June, and whether you want to attend live in Boston or tune in virtually, this is a reminder that you should register to attend!

Trino Fest, the contributor congregation, and logistics

In case you missed our announcement of Trino Fest, it’s a hybrid event taking place from 9am-5pm Eastern Time on June 13th. It’ll feature talks from a wide range of Trino users and contributors, with topics ranging from use cases, migrations, cluster management and administration, to lakehouse integrations and more. If you want to join us in-person, we’ll be at the Hyatt Regency Boston. There will also be a meeting for Trino contributors the day after the event at the Starburst office in Boston from 9am-1pm, and if you’d be interested in attending that, please reach out to myself (Cole Bowden) or Manfred Moser on the Trino Slack.

If you still haven’t booked a hotel, we also have a discounted rate at the Hyatt for the event to make life easy - whether that’s waking up and heading downstairs for the start of the event, or being able to quickly duck back to your room for a 30-minute meeting without missing too much. One link will take you to a booking for just the night before the event, while the other allows you to optionally book an extra night prior or include the night after Trino Fest so you can stick around for the contributor congregation or explore Boston.

Book your hotel for June 12-13 Book your hotel for June 11-14

And don’t forget those additional speakers

George Fisher, Ishan Patwa, and Oleg Savin will be diving deep into how Trino is leveraged at Microsoft. While we’ve previously had LinkedIn at Trino events, this is the first time the Trino community is getting to hear about the scale of Trino within Microsoft proper, and with their plans to cover clients, integrations, result caching, a sharded connector, visualization for monitoring, and AKS deployment with Azure, there will be a lot to learn.

Alok Kumar Prusty and Amogh Margoor from Apple will be joining the lineup to discuss Trino query intelligence. With the mountain of query metadata, the team at Apple has been able to better understand Trino usage and use that knowledge to create impactful improvements for their Trino users. With dashboarding, self-service troubleshooting, and automatic recommendations for query optimization, Alok and Amogh will detail how a world-class engineering team can take an awesome tool like Trino and make it even better for the end users.

Also relatively new to the Trino community is discussing AI workloads. Lei Xu from LanceDB and Noah Shpak from character.ai will be highlighting exactly that, using Trino as an analytics engine on top of a LanceDB-powered vector data lake. With AI data so often being in a silo, analyzing it with a traditional SQL workload is often expensive or complicated… but Lei and Noah will be demonstrating how character.ai’s LanceDB/Trino pairing maintains the power of both systems while making it easy.

Dai Ozaki from Amazon will be diving into how to optimize Trino with S3. Given how many people are using Trino with S3 already, hearing directly from Dai, an engineer at Amazon, regarding best practices and optimizations should prove beneficial for a massive chunk of the Trino community. Dai plans on talking about how Trino and S3 interact, and how that knowledge can be used to get the most out of your stack and avoid common bottlenecks.

And last but not least, Aprit Garg from FourKites will be discussing utilizing Trino to handle nearly a petabyte of logs. FourKites is able to ingest massive amounts of logs, use S3 and Parquet to keep storage costs low, transform and extract logs at scale, and then use Trino as the engine to query those logs and reference them in context with other data sets and data stores. Arpit will also touch on using Superset as a frontend for Trino.

And keep in mind - all of that is in addition to the talks we’ve already announced! Register to attend, book your hotel, and the Trino community is looking forward to seeing you there!

Thank you to our sponsors for making this event happen…

59: Querying Trino with Java and jOOQ

2024-04-24T00:00:00+00:00

Host

Manfred Moser, Director of Trino Community Leadership at Starburst, (@simpligility)

Guest

Lukas Eder, Creator of jOOQ, (@lukaseder)

Trino releases

Trino 445

Add support for time travel queries with the Delta Lake connector.
Add support for the REPLACE modifier as part of a CREATE TABLE statement with the Delta Lake connector.
Add support for writing Bloom filters in Parquet files with the Hive connector.
Add support for dynamic filtering to the MongoDB connector.
Expand support for function pushdown in the Snowflake connector.

Lukas Eder and data geekery

Lukas is recognized as a Java Champion and well-known as a very active member of the Java community. We chat about his history and involvement in the community of Java and related open source projects, and how it lead to jOOQ and his company data geekery. Lukas also briefly talks about other products.

jOOQ

jOOQ stands for jOOQ Object Oriented Querying (jOOQ). It generates Java code from your database, and lets you build type safe SQL queries through its fluent API.

All editions of jOOQ since the 3.19 release include support for Trino. The level of support depends on the used catalog and connector, and further Trino-specific enhancements are in progress.

In our conversation and demo session with Lukas, we cover all the following aspects and a few other topics:

What is jOOQ?
What motivated the creation of jOOQ?
Discuss the great reasons for using jOOQ:
- Database first
- Typesafe SQL
- Code generation
- Active records
- Multi-tenancy
- Standardization
- Query lifecycle
- Procedures
How does it compare to ORM system like Hibernate or others like the old MyBatis
What databases are supported by jOOQ and commonly used?
Chat about some customer use cases.
Supported and required Java versions, fun with upgrades, and experience from customers.
How Lukas discovered Trino and decided to add support for it.
Challenges and interesting aspects of supporting different databases
What is next for jOOQ in general, and Trino support specifically?
Cool SQL features in Trino that might be suitable for standardization:
- Higher order functions, partially already supported in jOOQ
- Integration of object-relational database feature, such as nested collections with ARRAY or LIST.
- Potential introduction of new concepts to SQL, such as MAP.
Complexities from Trino having different catalogs and connectors, and the catalog, schema, table hierarchy.

jOOQ resources and further information:

Rounding out

Trino Fest news:

Great speaker lineup announced
More to come
Register for event and hotel now
Contact us to join the Trino Contributor Congregation the next day

Other news and events:

Manfred’s recap of Open Source Summit NA and Data Engineer Things meeting in Seattle.
Trino Contributor Call right after the episode.
Contact us to be a guest in upcoming Trino Community Broadcast episodes.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

A sneak peek of Trino Fest 2024

2024-04-15T00:00:00+00:00

Trino Fest is drawing ever closer. Commander Bun Bun has been hard at work behind the scenes arranging the schedule and making sure that Trino’s trip to Boston is going to be a great one. In case you missed it, we announced Trino Fest a couple months ago, and if you have missed it, make sure to go register to attend! All our speakers will be in person in downtown Boston on the 13th of June, with plenty of opportunities for networking and a happy hour event at the end of the day. But if you can’t make the trip to enjoy the lovely New England summer, we’ll also be live-streaming the event, and you can register to join us virtually.

Still on the fence, though? Read on for a preview of our speaker lineup and brief summaries of their talks. Keep in mind this also isn’t the full lineup, and we’ll follow up soon with the last few talks that round out the schedule.

A brief word from our sponsors…

Thank you to our sponsors for making this event happen…

And now onto what you’re waiting for: a preview of most of the talks coming to Trino Fest this year!

Lakehouses

It’s no secret that using Trino as part of your lakehouse has become one of its major use cases in the past few years. We’re excited to say that at Trino Fest, we’ll have representation for each of the modern big three table formats: Iceberg, Delta Lake, and Hudi.

Iceberg

Apache Iceberg will be covered twice: Amogh Jahagirdar from Tabular will be diving into the world of Iceberg views and how they can be leveraged to coordinate across different query languages and dialects. Amit Gilad from Cloudinary will be covering the story of migrating out of Snowflake to the wonderful world of open table formats and Iceberg.

Delta Lake

Marius Grama, a Trino contributor at Starburst, will be going into detail on the history, development, and improvements to the Delta Lake connector. With time travel for the Delta Lake connector landing in Trino 445, it’s one of the most exciting areas for development in open source Trino, and there’s some interesting stories that Marius is excited to share with the community.

Hudi

Rounding out data lakes, Ethan Guo from Onehouse will be diving into Trino’s Hudi connector, giving an update on what’s landed lately to improve performance and functionality. He’ll also give a preview of what’s coming soon. The features are flying in, and if you’re a current or prospective user of Hudi with Trino, you won’t want to miss out.

Data takes

Of course, there’s more to Trino than querying data lakes, and there’s a wide variety of talks to discuss the other activities going on within the Trino community.

Small scale

Ben Jeter at Executive Homes, who gave a talk at Trino Fest last year while at Datto, is back to discuss running Trino at a more moderate scale than that we’re used to hearing about in the Trino space. Forget petabytes and exabytes, and welcome a tiny cluster querying thousands, not millions, of records that still derives huge value from Trino. It’s a great playbook for smaller startups and enterprises who still need robust, flexible, performant analytics.

Maximizing performance

Jonas Kylling from Dune will be detailing how they’ve managed to optimize Trino and squeeze out every ounce of performance to reduce query costs and runtimes. That includes leveraging the new Alluxio-based file system caching, emulating various cluster sizes to avoid expensive idle cluster time, and storing, sampling, and filtering query results to avoid re-executing queries.

Query intelligence

Marton Bod and Vinitha Gankidi from Apple bring insights to query intelligence. They’ll demonstrate how Apple has understood when their clusters are most utilized and who’s using them, enabling slicing and dicing along different dimensions. Having a query intelligence dataset can be used for real-time cluster dashboarding, self-service troubleshooting, and automatic generation of recommendations for users, all of which can empower Trino to be better than ever.

And more!

Of course, Trino’s own Martin Traverso will be giving a keynote on the latest and greatest in the project, covering everything big that’s landed since Trino Summit, as well as a glimpse at the roadmap for the project in the coming few months. Several other big talks are falling into place that we can’t announce just yet, so stay tuned for more info as the event draws nearer.

Trino contributor congregation

The day after Trino Fest, we’ll also be hosting an in-person meetup for Trino contributors and engineers to catch up, discuss the Trino roadmap, and engage directly with the maintainers in-person. It’s a great opportunity to put faces and voices to those GitHub handles, align on the big ideas or tricky PRs that have been moving slowly, and find more ways to get involved in Trino development. If you’re interested in attending, message Manfred Moser or Cole Bowden on the Trino Slack, and we’ll get you added to the attendee list and share more details.

Time travel in Delta Lake connector

2024-04-11T00:00:00+00:00

Exciting news - time travel capability has finally arrived in the Delta Lake connector! After introducing support for time travel in the Iceberg connector back in 2022, we’re thrilled to announce that the Delta Lake connector now joins the ranks as the second connector offering this feature.

Background and motivation

Time travel as a feature has a number of practical use cases:

Data recovery and rollback: In the event of data corruption or erroneous updates, time travel allows users to roll back to a previous version of the data, restoring it to a known good state.
Auditing and compliance: Time travel enables auditors and compliance teams to analyze data changes over time, ensuring regulatory compliance and providing transparency into data operations.
Historical analysis: Data analysts and data scientists can perform historical analysis by querying data at different points in time, uncovering trends, patterns, and anomalies that may not be apparent in current data.

Time travel SQL example

Start by creating a catalog example with the Delta Lake connector, create a demo schema, and make it the current catalog with the USE statement.

USE example.demo;

Let’s create a Delta Lake table, add some data, modify the table and add some more data using the following SQL statement:

CREATE TABLE users(id int, name varchar) WITH (column_mapping_mode = 'name');
INSERT INTO users VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Mallory');
ALTER TABLE users DROP COLUMN name;
INSERT INTO users VALUES 4;

Use the following statement to look at all data in the table:

SELECT * FROM users ORDER BY id;

 id
----
  1
  2
  3
  4

The $history metadata table offers a record of past operations:

SELECT version, timestamp, operation
FROM "users$history";

 version |             timestamp              |  operation
---------+------------------------------------+--------------
       0 | 2024-04-10 17:49:18.528 Asia/Tokyo | CREATE TABLE
       1 | 2024-04-10 17:49:18.755 Asia/Tokyo | WRITE
       2 | 2024-04-10 17:49:18.929 Asia/Tokyo | DROP COLUMNS
       3 | 2024-04-10 17:49:19.137 Asia/Tokyo | WRITE

You can specify the version using FOR VERSION AS OF. For example, to time travel to version 1, which includes a WRITE operation, the query would look like this:

SELECT *
FROM users FOR VERSION AS OF 1;

As you can see, time travel not only rolls back the data but also the table definition:

 id |  name
----+---------
  1 | Alice
  2 | Bob
  3 | Mallory

Technical details

Delta Lake manages transaction logs in the _delta_log directory located under the table’s specified location.

Last checkpoint: The optional file that manages the last checkpoint version is named _last_checkpoint.
Delta log entries: The JSON file contains an atomic set of actions, for example 00000000000000000000.json
Checkpoints: The Parquet file contains the complete replay of all actions, up to and including the checkpointed table version, for example 00000000000000000010.checkpoint.parquet

More details are available in the Delta Lake protocol documentation.

Following is an example of the _delta_log directory:

00000000000000000000.json
00000000000000000001.json
00000000000000000002.json
00000000000000000003.json
00000000000000000003.checkpoint.parquet
00000000000000000004.json
00000000000000000005.json
...
_last_checkpoint

When the specified version is older than the last checkpoint, such as version 2, the connector reads the transaction log files starting from the initial checkpoint file (00000000000000000000.json) up to the specified version (00000000000000000002.json).

When the specified version is equal to the last checkpoint, in our example version 3, the connector reads only the checkpoint file for that version (00000000000000000003.checkpoint.parquet).

When the specified version is newer than the last checkpoint, so version 4, the connector reads the checkpoint file for the last checkpoint version (00000000000000000003.checkpoint.parquet) and the transaction log file for the specified version (00000000000000000004.json).

The actual logic without the last checkpoint is more complex because the connector cannot determine the checkpoints without listing file names in the _delta_log directory.

Conclusion

Time travel in the Trino Delta Lake connector opens up new possibilities for data exploration and analysis, empowering users to delve into the past and derive insights from historical data. By seamlessly integrating with Delta Lake’s versioning and transaction logs, Trino provides a powerful tool for querying data as it appeared at different points in time. Whether it’s auditing, historical analysis, or data recovery, time travel adds a valuable dimension to data-driven decision-making, making it an indispensable feature for modern data platforms.

Bonus

Join us for Trino Fest 2024 where Marius Grama presents “The open source journey of the Trino Delta Lake connector” and shares more tips and tricks.

58: Understanding your users with Trino and Mitzu

2024-04-04T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Trino Community Leadership at Starburst, (@simpligility)

Guests

István Mészáros, Founder and CEO of Mitzu

Trino releases

Trino 442

Add support for configuring AWS deployment type in OpenSearch connector.
Fix a regression from 440 in Iceberg connector.

Trino 443

Ensure all files are deleted when native S3 file system support is enabled, and some other object storage connector improvements.
Add support for a custom authorization header name in Prometheus connector.

Trino 444

Update Docker image to use Java 22 for runtime.
Numerous performance improvements for the Snowflake connector.
Add support for reading BYTE_STREAM_SPLIT encoding in Parquet files.
Add support for canned access control lists with the native S3 file system.

Other Trino news

Trino Gateway 7 shipped with a new user interface thanks to a contribution from our new Starburst Trino champion Peng Wei
Status of the continuous integration and build setup with Apache Maven improved a lot thanks to our collaboration with the new Starburst Trino champion Tamas Cservenak
Trino Contributor Call recap is now available

Mitzu

Mitzu is a warehouse-native product analytics platform that revolutionizes how companies leverage their product usage data in the data lake.

By directly connecting to Trino, Mitzu eliminates the need for traditional reverse ETL processes to 3rd party applications such as Amplitude or Mixpanel. Mitzu enables real-time self-served product analytics on top of the existing data infrastructure with generated SQL queries.

In our conversation and demo session with István we cover all the following aspects and a few other topics:

What is product analytics?
Discuss some key terms, such as segmentation, funnels, and retention, and discuss what insights and benefit become available.
What are some example use cases?
What kind of products can be analyzed?
Use of Mitzu for marketing.
What other product analytics tools exist, and what sets Mitzu apart?
How is Trino involved to make Mitzu warehouse-native?
What are the advantages of being warehouse-native? What does that mean?
Compare with Mitzu on other data platforms.
Implementation details of the Mitzu and Trino integration, such as connectors, security, and client libraries
How to use Mitzu in terms of deployment and configuration.
Cool features of Mitzu.
Practical experience and customers.

Rounding out

Trino Fest news:

Speakers are selected, contact and announcement coming soon
Register now, and book travel and hotel.
Contact us to join the Trino Contributor Congregation the next day

Other news and events:

Manfred will attend Open Source Summit NA, and present a Big Data Whirlwind Tour at the inaugural Data Engineer Things meeting in Seattle.
Trino Contributor Call is now planned as monthly event with video recordings.
Check out the upcoming Trino Community Broadcast episode about jOOQ.

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

57: Seeing clearly with OpenTelemetry

2024-03-14T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Trino Community Leadership at Starburst, (@simpligility)

Guests

David Phillips, co-creator of Trino and CTO at Starburst
Matt Stephenson, Senior Principal Software Engineer at Starburst

Trino releases

Trino 440

New Snowflake connector
Support for sub-queries inside UNNEST clauses
Support for row filtering and column masking with Open Policy Agent
Improved latency when filesystem caching is enabled in Delta and Iceberg connectors

Trino 441

Remove the default legacy mode for hive.security

And there is a regression for Iceberg, so wait for 442 potentially. (Update: Trino 442 is released.)

Other Trino news

Java 22 is coming to Trino
David Phillips appointed dedicated file system lead
Trino Contributor Call on the 21st of March
Japenese edition of Trino: The Definitive Guide is out

OpenTelemetry

OpenTelemetry is a widely-used collection of APIs, SDKs, and tools that instrument, generate, collect, and export telemetry data such as metrics, logs, and traces to help you analyze application performance and behavior.

In our conversation with Matt and David we cover all the following aspects, and a few other topics:

What is OpenTelemetry?
Some basic concepts like logs, spans, traces
How is this related to JMX and system data and other monitoring
What is OpenMetrics? How is it related to Prometheus?
What tools can you use with OpenTelemetry? Jaeger, Datadog, …
Reasoning to add OpenTelemetry to Trino
Implementation details
Trino documentation with local example usage with Docker containers for Trino and Jaeger
Practical experience
Demo of real world usage with Starburst Galaxy and Datadog
Bonus topic - JSON-format logging via TCP socket

Rounding out

Trino Fest 2024 and Trino Contributor Congregation are happening in June in Boston. Submit your speaker proposals now, and register for the free event as soon as you can, especially for live attendance.

Check out the upcoming Trino Community Broadcast episodes about Mitzu and jOOQ.

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can get the free PDF from Starburst or buy the English, Polish, Chinese, or Japanese edition online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Blazing ahead with 22

2024-03-13T00:00:00+00:00

It was not that long ago that we first announced support for Java 21, and subsequently made it a build and runtime requirement with Trino 436.

Since then, the codebase received some significant improvements in readability, and we have also seen better performance. However, innovation in Trino and Java is not holding still, on the contrary - it’s accelerating. On the Java community side, Java 22 is just about to be released, and we think it is time to drive innovation in Trino even further. Trino is going to use and require Java 22 soon!

Background and motivation

The planned move to use and require Java 22 for build and runtime of Trino is driven by numerous aspects:

Take advantage of performance and runtime improvements of the new JVM version.
Use the newly available language features to further improve readability and maintenance aspects of the codebase.
Enable the use of further performance improvements for Trino under the umbrella of Project Hummingbird.
Attract and motivate more contributors for Trino as an opportunity to work with a modern Java stack on a cutting edge, complex application and work with the relevant language features and APIs.

Speaking about APIs and new features, let’s look at a list of JDK Enhancement Proposals (JEPs) that we are actively looking at. Specifically we plan to experiment, and adopt any non-preview JEPs where we see benefits. We also plan to submit any issues and problems we encounter back upstream to the Java community:

Region Pinning for G1 (JEP 423)
Foreign Function & Memory API (JEP 454)
Unnamed Variables and Patterns (JEP 456)
Class File API in preview (JEP 457)
String Templates in second preview (JEP 459)
Vector API in 7th incubator (JEP 460)
Structured Concurrency in second preview (JEP 462)
Scoped Values in second preview (JEP 464)

Many of these API’s allow us to further modernize the feature set of Trino and adapt it to current hardware and compute power realities. Specifically we can continue with our commitment to the Java ecosystem and avoid many of the complexities and pitfalls of JNI - the traditional, now legacy integrations with native code and specific hardware features.

Another aspect some of you might wonder about is the move from a Java LTS version to a Java STS release – from “long term support” to “short term support”. So far Trino was using Java 8, Java 11, Java 17, and then Java 21 as requirements. Since all of them are LTS releases, some of you might have concluded that we have a policy of only using Java LTS versions. That is not the case, it is only a coincidence.

We always thrived to use up to date source code, dependencies, runtime environments, and so forth. The benefits, including better performance, available and included bug fixes, reduced need for backports, less security issues, and support for modern language features, development environments, and tooling, always far outweighed the effort of staying up to date.

We are now finally at the long planned status where we can move quick enough as a project to use latest tools, dependencies, and Java releases and keep iterating on our frequent releases. And that is exactly what we are doing for the benefit of everyone contributing to Trino and using Trino. Java 22 now. And then later this year we can move to Java 23, and next year to 24 and 25.

So what are we specifically doing now?

Current status and plans

Java 22 is scheduled to ship in March 2024. The various JDK distribution binary packages will become available shortly after the official release.

Early access source and binaries are already available, and our continuous integration builds already use such an EA build successfully.

Overall the transition is going well. Our plan is to follow the same approach as our switch to Java 21:

Ensure everything works with Java 22.
Change the container image to use Java 22.
Cut a release and get community feedback from testing with the container.
Adjust to any feedback and available improvements for a few releases.
Switch the requirement for build and runtime to Java 22.
Cut another release and celebrate.

And then the real fun starts all over. We can update code, libraries, and start working with the new APIs. Timing on all the work depends on obstacles we find on the way and how we progress with removing them.

We use the Java 22 tracking issue and the linked issues and pull requests to manage progress, discuss next steps, and work with the community.

Feel free to chime in there or find us on the #dev channel on the Trino community Slack.

Join us in this exciting next step for Trino.

Update from 8 May 2024: The release of Trino 447 includes the switch to Java 22 as a requirement for running Trino.

A cache refresh for Trino

2024-03-08T00:00:00+00:00

Thinking about our recent work on caching in Trino reminds me of the famous saying, “There are only two hard things in computer science: cache invalidation and naming things.” Well, in the Trino community we know all about caching and naming. With the recent Trino 439 release, caching from object storage file systems got a refresh. Catalogs using the Delta Lake, Hive, Iceberg, and soon Hudi connectors now get to access performance benefits from the new Alluxio-powered file system caching.

In the past

So how did we get here? A long, long time ago, Qubole open-sourced a light light-weight data caching framework called RubiX. The library was integrated into the Trino Hive connector, and it enabled Hive connector storage caching. But over time, any open source project without active maintenance becomes stale. And like a stale cache, a stale open source project can cause issues, or becomes outdated and unsuitable for modern use. Though RubiX had once served Trino well, it was time to remove the dust, and RubiX had to go.

Making progress

Catching back up to 2024, Trino now includes powerful connectors for the modern lakehouse formats Delta Lake, Hudi, and Iceberg:

Hive is still around, just like HDFS, but we consider them both close to legacy status. Yet all four connectors could benefit from caching. Good news came at Trino Summit 2022 when Hope Wang and Beinan Wang from Alluxio presented about their integration with Trino and the Hive connector - Trino optimization with distributed caching on data lake. They mentioned plans to open source their implementation and an initial pull request (PR) was created.

Collaboration

The initial presentation and PR planted a seed in the community. The Trino project had been moving fast in terms of deprecating the old dependencies from the Hadoop and Hive ecosystem, so the initial Alluxio PR was no longer up to date and compatible with latest Trino version. Discussions with David Phillips laid out the path to adjust to the new file system support and get ready for reviews towards a merge.

In the end it was Florent Delannoy who started another PR for file system caching support, specifically for the Delta Lake connector. His teammate Jonas Irgens Kylling, also a presenter from Trino Fest 2023, took over the work on the PR. The collaboration on it was an epic effort. After many months of time, over 300 comments directly on GitHub and numerous hours of coding, reviewing, testing, and discussion on Slack and elsewhere the work finally resulted in a successful merge, and therefore inclusion in the next release.

Special props for their help for Florent and Jonas must go out to David Phillips, Raunaq Morarka, Piotr Findeisen, Mateusz Gajewski, Beinan Wang, Amogh Margoor, Manish Malhorta, and Marton Bod.

Finishing

In parallel to the work on the initial PR for Delta Lake, yours truly ended up working on the documentation, and pulled together an issue and conversations to streamline the roll out.

Mateusz Gajewski had also put together a PR to remove the old RubiX integration already. With the merge of the initial PR we were off to the races. We merged the removal of RubiX and the addition of the docs. Mateusz also added support for OpenTelemetry.

Manish Malhorta and Amogh Margoor sent a PR for Iceberg support. They were also about to add Hive support, when Raunaq Morarka beat them and submitted that PR.

After some final clean up, Cole Bowden and Martin Traverso got the release notes together and shipped Trino 439! Now you can use it, too.

Using file system caching

There are only a few relatively simple steps to add file system caching to your catalogs that use Delta Lake, Hive, or Iceberg connectors:

Provision fast local file system storage on all your Trino cluster nodes. How you do that depends on your cluster provisioning.
Enable file system caching and configure the cache location, for example at /tmp/trino-cache on the nodes, in your catalog properties files.

fs.cache.enabled=true
fs.cache.directories=/tmp/trino-cache

After a cluster restart, file system caching is active for the configured catalogs, and you can tweak it with further, optional configuration properties.

What’s next

What a success! It took many members from the global Trino village to get this feature added. Now our users across the globe can enjoy even more benefits of using Trino, and also participate in our next steps:

Further improvements to the current implementation, maybe adding worker-to-worker connections for exchanging cached files.
Preparation to add file system caching with the Hudi connector is in progress with Sagar Sumit and Y Ethan Guo and implementation is following next.
Adjust to any learnings from production usage.

Our thanks, and those from all current and future users, go out to everyone involved in this effort. What are we going to do next?

Manfred

PS: If you want to share your use of Trino or connect with other Trino users, join us for the free Trino Fest 2024 as speaker or attendee live in Boston, or virtually from your home.

Japanese edition of Trino: The Definitive Guide

2024-02-27T00:00:00+00:00

Do you know where the name ‘Trino’ comes from? It’s actually a shortened form of ‘neutrino’. These fast and lightweight subatomic particles have recently made their way to Japan. You can now reserve your copy of the Japanese edition of Trino: The Definitive Guide!

Today, we are happy to announce that the Japanese translation of the book Trino: The Definitive Guide is available for the communities all across Japan and far beyond. Preorder today and get your copy from the first batch in the middle of March. Hopefully it can lower the barrier to Trino for native speakers. We invite you all to get your own copy:

分散SQLクエリエンジンTrino徹底ガイド秀和システム

Our thanks goes out Masanori Nishida and his teams at Shuwa System. I would also like to thank my great team of translators and collaborators, Kai Sasaki, Akira Ajisaka, Kaname Nishizuka, and Miki Takata for their help in making the book a reality. We hope many readers can benefit from the translated edition.

We look forward to chatting with many of our new readers and Trino users on the general-jp channel in the Trino community Slack, other channels, and direct messaging.

Also, don’t forget to tell us about your usage of Trino in the upcoming Trino Fest 2024 as a speaker. Or just register to attend the free event.

Yuya Ebihara

56: The vast possibilities of VAST and Trino

2024-02-22T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Trino Community Leadership at Starburst (@simpligility)

Guests

Colleen Tartow, Field CTO and Head of Strategy at VAST Data.
Roman Zeyde, Senior Software Engineer at VAST Data.

Release 439

Trino 439

New caching layer for Delta Lake, Hive, and Iceberg!
Documentation for new native file system support.
Fix for setting session properties on catalogs with a . in the name.
Fix for reading Snappy data.

Trino Gateway 6

Docker container setup!

Concept of the episode: The VAST database and data platform

Part database, part data warehouse, part data lake, describing VAST in one sentence is not the easiest undertaking. You can talk about features like deep write buffers with underlying flash columnar storage, the automatic contextual layer added on top of the data, or the similarity-based global compression that more than makes up for the smaller columnar chunks and makes it so much faster to find exactly the data you’re looking for.

So what is VAST? It’s a state-of-the-art data platform. Why are we talking about it on the Trino Community Broadcast? A world-class data storage solution still needs a world-class query engine, and its speed paired with Trino’s makes for a brilliant combination. We’re diving into how it works, why it is designed the way it is, and maybe talk about the really cool performance comparison they have on their website showcasing Trino as their favorite query engine.

Check out our conversation about the VAST database, VAST data platform, the Trino connector, internal workings of the system, use case, customers and much more in the interview.

Also have a look at the presentation from Jason Russler about VAST from Trino Summit 2023.

Rounding out

Trino Fest 2024 has been announced for this summer in Boston! Make sure to check out the announcement blog post and register to attend, submit your talks, or contact Starburst for information on sponsoring!

Check out the upcoming Trino Community Broadcast episodes about OpenTelemetry and Mitzu.

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Trino Fest goes to Boston in 2024

2024-02-20T00:00:00+00:00

After the resounding success of Trino Fest and Trino Summit in 2023, Commander Bun Bun has exciting news to share: we’re taking our biggest events of the year back to being in-person. They’ll be hybrid, to be more specific, so if you can’t travel, don’t fret, you’ll still be able to watch and ask questions in chat. But if you can travel, you won’t want to miss out! Everything you already know and love about Trino Fest is moving to the East Coast for the lovely Boston summer. The event is on the 13th of June in the Hyatt Regency Boston, where we’ll have a full day of talks, time to network, and a happy hour at the end of the day. You may even get to meet Commander Bun Bun, who’s ditching the hiking gear in favor of training for the Olympics. Sound exciting?

Join us in person

Our event will be hosted at the Hyatt Regency in Boston, where we are planning a full day of festivities followed by a happy hour on the Hyatt Regency deck. There is a discounted room block set aside for those interested in attending live and staying with us in Boston. If you are looking to book hotel dates in addition to what is provided on the room block, email events@starburstdata.com, and they will help you coordinate your reservation.

Regardless of whether you plan on attending in person or online, you do need to register, so make sure to click the button above!

Call for speakers

Interested in speaking? We want to hear from everyone in the Trino community who has something to share. If you aren’t sure whether it’s worth it to submit, submit anyway! We’ll review all submissions, and we’ll do our best to work with you to turn your talk into a smash hit. We are looking for both full sessions (about 30 minutes) and lightning talks (10-15 minutes). We welcome intermediate to advanced submissions for talks that are connected to Trino on any of the following topics:

Best practices and use cases
Data migrations
Optimizations and performance improvements
Data governance
Data engineering, including batch and streaming architectures
Data science
SQL analytics and BI
Cloud data lake use cases
Data lake architecture
Query federation
Table formats
Data ingestion

Want to speak?

Submit a talk!

Trino contributor congregation

Starburst is the organizing sponsor of the event, but to make Trino Fest a smashing success, they’re excited and interested in collaborating with other organizations within the community. If you are interested in sponsoring, email events@starburstdata.com for information.

And regardless of whether you’re planning on attending, speaking, or sponsoring, we look forward to seeing you soon!

Open Policy Agent for Trino arrived

2024-02-06T00:00:00+00:00

Trino now ships with an access control integration using the popular and widely used Open Policy Agent (OPA) from the Cloud Native Computing Foundation. The release of Trino 438 marks an important milestone of the effort towards this integration.

Collaboration and history

Open Policy Agent was first released in 2016 and has gained more and more popularity in the ecosystem of cloud native applications and beyond.

Initial efforts for an integration with Trino started at Bloomberg, Stackable, Raft, and other places separately and sometimes in parallel, with only partial collaboration. You might have first heard about it in August 2022 in the Trino Community Broadcast episode 39 with a team from Raft as guests.

Usage and experience with OPA grew. In the end, Pablo Arteaga from Bloomberg and Sebastian Bernauer and Sönke Liebau from Stackable had the initiative to start a pull request to Trino. Their persistence and collaboration led them through many review comments, update commits, and even a second PR, to submit a talk and eventually present at Trino Summit 2023 about the Open Policy Agent access control with Trino and their motivation to move from Apache Ranger to OPA.

OPA at Trino Summit 2023

The presentation from Pablo and Sönke titled “Trino OPA authorizer - An open source love story” received a lot of interest from the audience at the event and on YouTube since then. They detailed the architectural differences of using Ranger and OPA. Sönke detailed the usage of OPA in the Stackable platform and how it enables a single access control platform to apply across many systems. They discussed their collaboration on the pull request, and Pablo showed a migration path from Ranger, and a full demo of OPA with Trino.

They also made the slide deck available for your reference.

Edward Morgan and Bhaarat Sharma from Raft also presented Avoiding pitfalls with query federation in data lakehouses at Trino Summit, and detailed their OPA usage in their Data Fabric platform. It combines Delta Lake, Trino, Apache Kafka, and Open Policy Agent (OPA) into a robust lakehouse data platform. They talked about access control in Trino overall and how important it is for their customers, including the US Department of Defense. Their presentation also included a demo of OPA with Trino.

OPA on the way to Trino

Pablo and Sebastian continued their efforts on the pull request after Trino Summit. They worked successfully with Dain on the code review and necessary changes, and helped Manfred with the documentation.

Finally, with the release of Trino 438, the Open Policy Agent access control is available to all Trino users.

The community is already taking notice with follow up pull requests for further improvements and blog posts such as Enhancing Security and Observability in Trino with Open Policy Agent and OpenTelemetry from Isa Inalcik.

Benefits of OPA

The arrival of OPA support for Trino marks an important step. OPA is a mature and widely used access control system. Its ecosystem includes many integrations, user interfaces, development tools, and other resources.

OPA is a very flexible authorization system, making it an ideal match for Trino. Trino deployments are often part of a diverse data platform, spanning a variety of interconnected data sources, pipelines, client tools and applications.

Trino users now have an alternative to the file-based access control from the Trino project itself, the effort to support your own Ranger integration, or the use of commercial offerings for access control.

What’s next

We reached another milestone but we are not done yet. Specifically for OPA, we are looking at the following next tasks:

Get more features from various older, private forks converted into pull requests to Trino so everyone can benefit.
Update the documentation with more practical advice and tips.
Provide further resources for running OPA with Trino, writing rego scripts, and helping the community.
Implementation of row level filtering and column masking, based on the draft from Pablo

Special thanks go to everyone participating so far. Consider this an open invitation to join the effort.

Ping me on Slack directly or find us in #opa-dev.

Manfred

Trino 2023 wrapped

2024-01-19T00:00:00+00:00

If “Wrapped” is good enough for Spotify, it’s good enough for Trino, right? As we look forward to a bright 2024, we can also take a moment to get sentimental, look back at everything we’ve accomplished, and reflect on the progress we’ve made. Commander Bun Bun has been hard at work, so if you haven’t been paying close attention to Trino or want an idea of all that went down in 2023, we’re happy to present you with an end of year recap. We’ll be exploring what’s gone on in the community, on development, the events we’ve hosted, and discuss the cool new features and technologies you can use when you’re running Trino.

2023 by the numbers

64,288 views 👀 on YouTube
5,872 hours watched ⌚on YouTube
5,018 new commits 💻 in GitHub
2,985 new stargazers ⭐ in GitHub
2,494 pull requests merged ✅ in GitHub
1,227 issues 📝 created in GitHub
704 new subscribers 📺 in YouTube
45 videos 🎥 uploaded to YouTube
30 Trino 🚀 releases
39 blog ✍️ posts
10 Trino Community Broadcast ▶️ episodes
2 Trino ⛰️ Summits

We’re excited to say that Trino continued to grow in 2023:

GitHub stars increased by nearly 50% total and by 8% more than last year
Commits increased by 7%
Slack usage picked up dramatically
YouTube viewership was up 7% despite a lack of Pokemon-themed musical content compared to 2022 (our bad)
30 releases kept new versions of Trino coming out more than every other week.

Thanks in part to all that growth, it’s more important than ever to be on our Slack. If you’re a Trino user or community member and aren’t already on there, you’re missing out! Make sure to join up for community announcements, release statuses, the shared expertise of the entire Trino community, and event-specific channels for discussion when we’re hosting things like Trino Fest and Trino Summit. Speaking of those…

Trino events

One of the best parts of being an open source community is that it’s easy to be excited and connect with others about using such a cool piece of technology. Whether that’s bringing Trino to new users who can take advantage of it, or sharing our learnings with other Trino users to make the most, events are one of the best ways to distribute that knowledge. So what were we up to this year?

Trino Fest and Trino Summit

Trino Fest and Trino Summit are becoming mainstays on the Trino calendar each year, and 2023 was no different. Formerly “Cinco de Trino,” we ditched the Cinco de Mayo theme and went with the simpler “Trino Fest” in June, opting to theme it around Commander Bun Bun’s Lake House Summer Camp, with a focus on integrating Trino with lakehouse and data lake architectures. Trino Summit only wrapped up a little over a month ago, rounding out the year and highlighting some amazing developments that we’ll be talking about later in this blog post.

Trino Fest has historically been the smaller event, but it did some catching up in 2023, as both Trino Fest and Trino Summit were made virtual and expanded to 2 days this year. Easier to attend than ever before, we reached a combined total of about 1,200 live attendees, with thousands more views on demand.

The lineups were packed with 34 talks across both events, featuring speakers from huge Trino users like Salesforce, Stripe, Apple, and Lyft, as well as from major Trino contributors like Starburst, Tabular, and Bloomberg. You can view recordings of every Trino Fest talk and every Trino Summit talk on the Trino YouTube channel if you missed out.

Meetups and international events

One of the more exciting developments was our a major event in Japan - Trino Conference Tokyo. A virtual event with four sessions, it brought Trino to a Japanese-speaking audience and further pushed our favorite query engine across language borders. On top of that, Starburst co-hosted a Trino meetup in Bengaluru, and the community organized the first-ever Korean Trino meetup (pictured below).

And last but not least, Trino, the Definitive Guide, 2nd Edition was translated into Mandarin and Polish.

The Trino Gateway

One of the biggest announcements in the Trino community this year was the launch of the Trino Gateway. A proxy and load-balancer, it’s a crucial piece of Trino infrastructure for organizations that need more than one Trino cluster to suit their needs.

Why would you want more than one Trino cluster? Maybe you want one cluster with fault-tolerant execution enabled for ETL workloads and another cluster for speedy ad-hoc analytics. Perhaps you have analysts performing wildly differently-sized queries, and high-volume compute-intensive queries are proving to be bad neighbors for lightweight and low-latency queries that shouldn’t take more than milliseconds. Historically, users would have to manually manage swapping between clusters, establish a new connection, and try not to get a headache in the process.

Enter the Trino Gateway! By routing all of your Trino traffic automatically, it’s never been easier to manage, maintain, and query multiple Trino clusters at once. Load balancing ensures that no one cluster gets overworked, and it’s the perfect way to stop large queries from getting in the way of the little guys. Add in the fact that you can seamlessly shut down an individual cluster for updates or maintenance while the Trino Gateway routes traffic elsewhere, and it’s easy to see why this is such a game-changer. We’re super excited for it to be out there in the world, and we hope it makes running Trino at the largest scales simpler and faster than ever before.

For more information on the Trino Gateway, check out:

New features

With more development on Trino than ever before, there were obviously a ton of new things being added to it. Let’s go over some of the biggest adds in 2023.

SQL routines

Whether you want to refer to them as SQL routines or as user-defined functions, they’re a big deal. Fresh off the presses and only a few months old, they do exactly what you’d expect them to do: you, a user, can define and re-use your own functions! Define and use them inline as part of a query to make that query cleaner, easier, and simpler to understand. Or, if you’re really cooking, you can run a query that defines the routine in the schema of the catalog. This allows other Trino users to access the same routine time and time again as part of their other queries. It’s a level of customization that we’ve never had before in Trino, and no longer do you need to write your own Java plugins to create and re-use functions that do exactly what you need them to do.

If you want to learn more about SQL routines, you can check out the introduction to SQL routines in our documentation, as well as a video from our SQL training series and a few example routines which give a good look at how they can be used.

Schema evolution and dynamic catalogs

While we’re providing more power, customization, and flexibility to Trino users, it’s also important to highlight just how much has been added this year to make it easier to adjust things on the fly.

Schema evolution in Hive was a big addition, allowing you to alter columns’ data types, rename columns, and handle nested fields when dropping columns. Instead of needing to use the underlying database or modify it some other way and reboot Trino, Trino can handle the adjustments on the fly.

But if you don’t use Hive and are feeling left out, we’ve experimentally taken things one step further in 2023, adding dynamic catalogs to Trino. Rather than adjusting your schema one column at a time, what about adding or dropping an entire catalog in one go? You can do that now. Though it’s currently still bleeding-edge and not ready for widespread use on your important production data sources, we’re looking forward to improving it and making it resilient and stable in 2024.

Project Hummingbird

Trino has always been about squeezing out every ounce of performance that you can get. Check out our release notes and you’ll see that every version includes at least a couple performance improvements. Over time, these performance improvements add up to a substantial gain, meaning that version-over-version, year-over-year, Trino is always getting faster. Project Hummingbird was a concerted effort this year to take a look at the core engine and make a number of architectural changes paired with small improvements that would add up to something very substantial. The GitHub issue tracking it lists a ton of work that’s been accomplished already, with a lot of that work done in 2023. Though stay tuned for more, because that’s only scratching the surface…

Lakehouse improvements

Want to leverage the historical log of all actions taken on a table in Hudi? The new $timeline system table has you covered. How about in Delta Lake? We’ve got the table_changes function for that, and views were added there, too. Too many metadata tables to list were added to Iceberg, along with the REST, JDBC, and Nessie catalogs for metadata.

Java 21!

Java 21. It’s required to run version Trino versions 436 and later. With the upgrade from Java 17 to 21 comes a ton of improvements that will make development on Trino easier and better than ever, which will in turn make it faster and smoother than ever. Though not as huge of a deal as our upgrade to Java 17 last year, expect to see the benefits coming down the pipeline as the engineers working on Trino are able to take advantage of the latest and greatest features in Java.

Trino ecosystem updates

There’s more to Trino than Trino itself! With community updates and other technologies integrating with Trino, the number of ways you can access and use Trino are always growing. And the number of people taking care of Trino is growing, too.

Python clients

Trino’s own Python client saw heavy development in 2023. It was updated to support SQLAlchemy 2.0 and had type support fully fleshed out, making it a robust, free, and open-source tool for running your Trino queries.

Elsewhere in the Python ecosystem, we heard from both Fugue and Ibis at Trino Fest, two different Python clients that integrate Trino with Python in new ways. Fugue is a wrapper that helps integrate with other Python tools and clients, and Ibis can help convert your Python code into SQL queries, making it feasible to be a 100% Python-based organization that still leverages the speed and power of a SQL query engine like Trino. We had Phillip Cloud from Voltron Data on for an episode of the Trino Community Broadcast to talk about Ibis in even more detail.

And other clients, too!

Also on the Trino Community Broadcast repping new client support for Trino in 2023 were Dolphin Scheduler, PopSQL, and Coginiti. Dolphin Scheduler is a workflow orchestrator - and scheduler! - that can be used to routinely run and coordinate Trino queries. PopSQL is like Google Drive for SQL, providing a suite of collaborative tools for editing and working on queries as a team, including synchronous query editing, storing query history, and a robust commenting and feedback system. Coginiti is a high-powered data workspace that connects to Trino among many other things, supporting a host of powerful features that make it easier to reuse code and snippets of queries, as well as featuring embedded variables to minimize redundancy. If you want to learn more about any of these clients, click in on the links above to check out the Trino Community Broadcast where we went in-depth with them!

Oh, and don’t forget the Trino Typescript client, for when you want to work at the beautiful intersection of web development and accessing tons of data.

New maintainers

Trino saw three new maintainers added to its ranks this year:

Manfred even took the liberty of updating the website’s roles page to list out all our maintainers. Thank you to them for their dedication to making Trino the best it can be, and congratulations to them on their shiny maintainer titles!

Conclusion

2022 had been the busiest year in Trino’s history, but 2023 has managed to surpass it. If you’re interested in contributing to Trino, make sure to check it out on GitHub. Even if you’re not interested in contributing, give us a star on GitHub, anyway! It’s been a great year for Commander Bun Bun, and we can’t wait to show you what 2024 has in store for everyone’s favorite data rabbit.

55: Commander Bun Bun peeks at Peaka

2024-01-18T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Technical Content at Starburst (@simpligility)

Guests

Mustafa Sakalsiz, CEO at Peaka
Ali Tekin, Principal Software Architect at Peaka

Releases 437-438

Trino 437

Support for configuring compression codecs
Support for char values in the to_utf8() and lpad() functions
Improved performance for Delta Lake queries without table statistics
Improved performance for Iceberg queries with filters on ROW columns

Trino 438

Support for access control with Open Policy Agent
Support for ALTER COLUMN ... DROP NOT NULL in Iceberg and PostgreSQL
Support for configuring page sizes in Delta Lake, Hive, and Iceberg
Better type support for the reduce_agg() function

And over in the land of the Trino Gateway…

Trino Gateway version 5 released!

Concept of the episode: Peaka

Another Trino Community Broadcast episode means another cool piece of technology that uses Trino for us to show off to the community. This time it’s Peaka, a no-code approach to date warehousing that makes it easier than ever to set up your data stack without needing a ton of complex engineering.

In their own words, Peaka is a platform that merges disparate data sources into a single data layer, letting you join and blend them, query them using SQL or natural language, and expose your data to outside users through APIs. Sounds a bit like Trino, right? That’s because underneath the hood, Trino is a key part of how they’re making it happen. In this episode, we talk to the team at Peaka about where they got started, how they’re making it easier than ever to leverage the federation that Trino is capable of, and the work they’ve done on top to integrate their platform with every SaaS data source under the sun.

Demo of the episode: Using Peaka!

If you want to see what the platform is like, then look no further. We’ll be exploring:

Connecting to data sources
Filtering and combining data
Editing and running queries, including their visual query editor
Natural language queries
Visualizing data

PR of the episode: #18719: Filesystem caching with Alluxio

Perhaps it’s a little easier to link to the issue for tracking the rollout, but however you want to present it, caching in Trino is renewed! Caching is a huge performance win for a wide variety of use cases, allowing the engine to run faster, better, and pump out query results at an unparalleled pace. This is going to lead to performance improvements for Trino queries using the supported object storage connectors, and you’ll hear more from us about it once it’s officially launched. The best part is that there’s even more coming down the line as support for it is expanded.

54: Trino 2023 wrapped

2024-01-18T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Technical Content at Starburst, (@simpligility)

Guests

Martin Traverso, Trino co-creator and CTO at Starburst

Releases 434-436

Trino 434

Support for a FILTER clause to the LISTAGG function
Support reading json columns and DELETE statements in BigQuery connector

Trino 435

Support for JSON_TABLE function
Improve reliability when reading from GCS
Improve query planning performance on Delta Lake tables
Improve reliability and memory usage for inserts

Trino 436

Support for Elasticsearch 8
New OpenSearch connector
Faster selective joins on partition columns

Additional comments:

Disallow invalid configuration options with Delta Lake and Iceberg connector in 434
Separate metadata caching in numerous connectors
Various improvements for schema evolution in Hive connector
Require JDK 21.0.1 to run Trino with 436
Remove support of Elasticsearch 6 in 436
Fix minor issues for SQL routine and JSON_TABLE function users

Recap of Trino in 2023

We chat about all the developments in the Trino project and the Trino community from 2023, including the following topics:

Various statistics about the project
Features and releases
Trino Fest, Trino Summit, and other events
New Trino maintainers
Polish and Chinese editions of definitive guide published

Find more details and other topics in our blog post Trino 2023 wrapped.

Rounding out

Upcoming events in NYC and Vienna, details available in the events calendar
Trino Contributor Congregation coming soon
Trino Gateway developer sync every two week, ping Manfred for invite

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF from Starburst or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Trino Summit 2023 recap

2023-12-18T00:00:00+00:00

Two days of non-stop Trino action are done! Last week, Trino Summit 2023 took place virtually another great community event. Great presentations from Trino experts across the globe showed different use cases and experiences with Trino.

Recap

During the event, our lively audience of over 600 attendees asked questions from the speakers and each other on chat, and we had fun with Trino trivia questions.

We talked about the SQL routine competition and announced Kevin Liu from Stripe and Jan Was from Starburst as the winners. You can find their submissions in the examples page for SQL routines.

Starburst announced their Trino Champions program. Kevin and Jan are the first recipients of the award and will receive their swag packs soon. Going forward, new champions will be crowned regularly, and Starburst is looking for nominations.

Sessions

If you missed out on the event, the following list of all the sessions provides links to the recordings. Over time, we will follow up with blog posts about each session with the presentation and further details.

The mountains Trino climbed in 2023 presented by Martin Traverso from Starburst. (Slides)
Trino workload management presented by Jinyang Li and Tingting Ma from Airbnb.
Secure exchange SQL: Building a privacy-preserving data clean room service over Trino presented by Taro Saito from Treasure Data.
Powering Bazaar`s business operation using Trino presented by Umair Abro from Bazaar. (Slides)
Efficient Kappa architecture with Trino presented by Sanghyun Lee at SK Telecom. (Slides)
Many clusters and only one gateway presented by Will Morrison (Starburst), Andy Su (Bloomberg), and Jaeho Yoo (Naver).
Trino upgrade at exabytes scale presented by Ramanathan Ramu from LinkedIn.
Powering data marts through Trino Iceberg connector at Zomato presented by Shubham Gupta and Bhanu Mittal from Zomato. (Slides)
Pinterest journey to achieving 2x efficiency improvement on Trino presented by Carlos Benavides from Pinterest.
Avoiding pitfalls with query federation in data lakehouses presented by Edward Morgan and Bhaarat Sharma from Raft.
Adopting Trino’s fault-tolerant execution mode at Quora presented by Gabriel Fernandes de Oliveira and Yifan Pan from Quora. (Slides)
Inherent race condition in Guava Cache invalidation and how to escape it presented by Piotr Findeisen from Starburst. (Slides)
Unstructured data analysis using polymorphic table function in Trino presented by YongHwan Lee from SK Telecom. (Slides)
Transitioning to Trino: Evaluating Lyft’s query engine capabilities presented by Charles Song from Lyft. (Slides)
Visualizing Trino with Apache Superset presented by Evan Rusackas from Preset.
Trino OPA authorizer - An open source love story presented by Sönke Liebau (Stackable) and Pablo Arteaga (Bloomberg). (Slides)
VAST database catalog presented by Jason Russler from VAST. (Slides)
Support for Parquet decryption and aggregate pushdown In Trino presented by Amogh Margoor and Manish Malhotra from Apple.

Shout outs

Shout outs for all their work with the speakers and organizing the event go to Anna Schibli, Mandy Darnell, and Monica Miller from the Trino Summit event team, and everyone else at Starburst who helped make this event a success.

Special thanks for making this Trino Software Foundation event a reality go out to our hosting sponsor Starburst, and our other sponsors Alluxio, Coginiti and Monte Carlo.

We will see you all at future Trino Contributor Congregations, Trino Fest 2024, Trino Summit 2024, and other events related to Trino.

Final reminder for Trino Summit 2023

2023-12-11T00:00:00+00:00

Are you ready? Trino Summit 2023 is just two days away, and our lineup of speakers, sponsors, and activities is truly amazing. Make sure to register and join us live.

Over the two days of the event we will enjoy sessions with our speakers from numerous well-known and respected companies, including Airbnb, Apple, Bloomberg, LinkedIn, Pinterest, SK Telecom, and others. Look at the full lineup for details.

Just like last time at Trino Fest 2023 we will have some fun Trino quiz questions for you all to puzzle over, and are ready to reward your fast and correct answers.

Cole Bowden and I will guide you through the two days of the event as hosts. The chat on the event platform as well as the Trino slack channel for the event will allow you to talk to other community members and the presenters, ask questions, and follow up for more answers and discussions.

We will announce the winning entries for our SQL routine competition and look a bit at the implementation. And if you are keen to write one, there is still have time to share your best SQL routine. You might be among the winners.

So you see - Trino Summit 2023 will be great. The event is virtual and free, so there really is no excuse for missing out:

Special thanks for their help with making this Trino Software Foundation event a reality go out to our hosting sponsor Starburst, and our other sponsors Alluxio, Coginiti and Monte Carlo.

We all look forward to see you in just two days. So exciting!

Functions with SQL and Trino

2023-11-29T00:00:00+00:00

In the fourth part of our training series Learning SQL with Trino from the experts Martin Traverso, Dain Sundstrom and I took on the big topic of aggregation functions, and covered the two new and exciting features of table functions and SQL routines.

The recording of the event allows you to watch it all as if you attended live, jump to specific sections as desired, or pause while you follow along with the demos:

Following are a couple of specific timestamps for interesting topics:

More timestamps for every part of the talk are in the description on YouTube. Also make sure you take advantage of these additional resources:

General overview slide deck for the series, with links to resources like our community chat
Slide deck for Functions with SQL and Trino, including files with all SQL statements, configurations and more ready to go
Trino: The Definitive Guide

With this last episode of the series for 2023 we are ready to showcase Trino with an amazing lineup of speakers and sessions at the upcoming Trino Summit 2023. Register now and catch all the presenters live for questions in the chat:

See you at Trino Summit 2023, upcoming Trino Community Broadcast episodes, and maybe even more SQL training in 2024.

Manfred

Trino Summit 2023 nears with an awesome lineup

2023-11-22T00:00:00+00:00

As winter nears, the days may be getting shorter, but so is the wait until Trino Summit 2023! It’ll be here before you know it on December 13th and 14th. We’ve got a packed speaker lineup full of exciting talks, and we’re ready to share some details with the Trino community today. Read on for a preview of some talks, and if you’re interested in attending, make sure to…

So, who’s going to be talking at Trino Summit? Here’s a quick rundown of the talks coming in from various companies.

Starburst: The mountains Trino climbed in 2023

As always, our keynote will come from Martin Traverso, Trino co-founder and co-CTO at Starburst. He’ll be giving a project update on everything exciting that’s happened in Trino since Trino Fest, as well as a sneak peek at the roadmap for features coming to Trino in 2024. It’s one of the best ways to keep up with the ongoing developments in the Trino community, and you won’t want to miss it.

Starburst, Bloomberg, and Naver: Many clusters and only one gateway

A second talk, which is a collaboration among Starburst, Bloomberg, and Naver, will be exploring the new Trino Gateway, a proxy and load-balancer that has been in the works for a long while in the Trino community. There’s no more need to worry about noisy neighbors or huge queries bullying out the quick and small workloads - with multiple clusters and the Trino Gateway on top, users interact with Trino like normal, but under the hood, queries get routed to available clusters to ensure that the time it takes to get your insights are shorter than ever before.

Airbnb: Trino workload management

Trino is the main interactive compute engine for offline ad-hoc analytics at Airbnb. Recently, they’ve redesigned their query workload processing on Trino clusters, introducing query cost forecasting and workload awareness scheduling systems. This helps them deliver a more stable and consistent analytics query service to offline data users at Airbnb, with improved performance and speed. And they’ll be explaining how they did it!

Pinterest: Journey to achieving 2x efficiency improvement on Trino

Trino usage has been growing at Pinterest each year, which comes with growing costs and increased demand on the existing Trino clusters. To help reduce costs and serve their Trino users, the engineering team there has migrated to AWS Graviton, taken advantage of Trino improvements, consolidated traffic, improved job scheduling, and worked to optimize their data and metadata formats. The end result has been a reduction in cost and an increase in query throughput. They’ll be sharing the details on the effort it took to make Trino faster and cheaper at the same time.

Quora: Adopting Trino’s fault-tolerant execution mode

Quora will be covering how they adopted Trino’s fault-tolerant execution mode to run some of their heaviest ETL jobs. They separate Trino queries from their main data pipelines in two clusters, one running the FTE mode for memory-intensive and longer jobs and another without it for lighter, general pipelines. This separation helped achieve better query failure rates, improved the execution time of long queries due to the more flexible autoscaling in FTE, and provided an alternative to run queries that would otherwise run out of memory without scaling up the cluster.

LinkedIn: Trino upgrades at exabyte scale

LinkedIn has been keeping up with Trino releases at an impressive rate, but getting to that point has required a lot of time, effort, and work on streamlining the update process. They’ll be discussing the challenges of breaking changes, applying internal patches, and ensuring that there are no meaningful performance regressions. They’ve automated much of this, including implementing a post-commit integration test suite that ensures nothing has broken, and creating an automated test framework that can validate the performance of each new Trino release before it deploys to users.

EA: Migrating 120 million HMS metadata records without customer impact

Migrating production databases is a scary task no matter who you are. It’s scarier when you’re talking about 600+ databases, 35,000+ tables, and over 120 million partitions, all of which you need to migrate while avoiding any customer impact. EA managed to pull it off with the help of Trino, and they’ll be at Trino Summit to share how they made it work and what they learned along the way.

SK Telecom: Efficient Kappa architecture with Trino

SK Telecom is bringing us two talks this year, as they’ve got a lot going on and some unique Trino stories to share!

The first talk will dive into Kappa architecture and the challenges involved in getting it to run in real-time at the massive scale SK Telecom needs. They started with Trino’s Kafka connector, but the limitations of that architecture steered them towards a solution with Flink and Trino’s Iceberg connector, which they’ll explain. They’ll also be sharing some tips and tricks for tuning Flink and Iceberg to get the most out of your Trino deployments.

SK Telecom: Unstructured data analysis using polymorphic table functions in Trino

The second talk will discuss the challenges of dealing with unstructured data. Pre-processing is essential for analyzing unstructured data, and it’s difficult for ordinary users and analysts to distribute large amounts of unstructured data. With the power of a custom-built polymorphic table function, they were able to invoke Python code within Trino to help structure that data for analysis, solving the problem in a powerful and fascinating way. We’ll get to hear about polymorphic table functions, how they work in Trino, and how anyone else may be able to leverage them to solve problems.

Raft: Avoiding pitfalls with query federation in data lakehouses

Raft has partnered with the US Department of Defense to build a data fabric that is built on top of Delta Lake, Trino, Apache Kafka, and Open Policy Agent (OPA). This talk will discuss the challenges involved, provide solutions and considerations for each, and end with a demo of Raft’s data fabric. The talk will focus on a plugin for Trino, developed by Raft, that uses OPA as a policy engine to provide fine-grained access control at query time based on a user’s JWT passed along with the query.

Treasure Data: Secure exchange SQL

Secure Exchange SQL is a production data clean room service deployed at Treasure Data, which leverages Trino and differential privacy technology to enable cross-company data analysis while mitigating the risk of privacy breaches. In their session, they’ll introduce the concept of differential privacy and discuss the privacy protection methods that need to be implemented during SQL processing. To minimize changes to Trino’s codebase, they employed approaches of SQL rewriting and validation at the logical plan level. They’ll explain these methods and provide some practical use cases of their data clean room.

Zomato: Powering data marts through the Trino Iceberg connector

It’s a common theme in the Trino community - Zomato recently migrated from a traditional data warehouse to a Trino-powered data lakehouse in conjunction with Iceberg. They’ll be discussing how this has enabled their analytics to run better than ever, including periodic updates to their data marts and tackling the challenges involved in maintaining Iceberg tables.

Bazaar: Powering Bazaar`s business operations using Trino

Bazaar’s talk will discuss how they leverage Trino’s capabilities to optimize data analysis and support data-driven decision-making. The talk specifically explores including real-time data querying across multiple sources and performance optimization, illustrating Trino’s role in Bazaar’s data-centric strategies. This presentation provides in-depth insights for individuals well-versed in Trino, shedding light on the platform’s transformative impact on enhancing e-commerce operations.

Preset: Visualizing Trino with Superset

Preset will be diving into the “last mile” of the modern data stack and show you how to query and visualize data pulled from Trino with Apache Superset and/or Preset. Specifically, they’ll discuss things like Trino’s federated query support (a common wish for Superset users) and how Superset can support near-real-time analytics for Trino users. They’ll also give a demo of connecting to Trino, building SQL queries, designing charts and dashboards, and other ways to gain insight and stay on top of your data.

VAST: The VAST database catalog

The VAST Database connector for Trino was open-sourced this year! They’ll be discussing the architecture of VAST and the connector, the purpose and major use cases for it, and demonstrate the workflows surrounding the VAST Database in the Trino ecosystem.

And still more to come!

Believe it or not, the great lineup we’ve gone over here still isn’t every talk. Stay tuned here or on the Trino Slack to hear about the other speakers as they’re announced. And of course, if you want to catch all these talks live, engage in chat, and have an opportunity to ask questions, make sure to register to attend.

53: Understanding your data with Coginiti and Trino

2023-11-16T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Technical Content at Starburst, (@simpligility)

Guests

Matthew Mullins, CTO at Coginiti, (@mullinsms)
Roman Nestertsov, Principle Engineer at Coginiti, (@nestertsov)

Releases 431-433

Trino 431

Support for SQL routines and CREATE/DROP FUNCTION
Support for REPLACE modifier in CREATE TABLE
Improved latency for prepared statements in JDBC driver

Trino 432

Faster filtering on columns containing long strings in Parquet data.
Predicate pushdown for real and double columns in MongoDB.
Support for Iceberg REST catalog in the register_table and unregister_table procedures.
Support for BEARER authentication for Nessie catalog.

Trino 433

Improved support for Hive schema evolution.
Add support for altering table comments in the Glue catalog.

Also note that Trino 433 also includes documentation for CREATE/DROP CATALOG. Check out the third SQL training session for a demo.

SQL routine competition

Trino 431 finally delivered the long-awaited support for SQL routines. To celebrate and see what you all come up with, we are running a competition. Share your best SQL routine, and win a reward sponsored by Starburst.

Call for Java 21 testing

Java 21, the latest LTS release of Java, arrived in September 2023, and we want to take advantage of the performance improvements, language features, and new libraries. But to do so, we need your input and confirmation that everything works as expected.

Concept of the episode: JDBC driver

Java Database Connectivity (JDBC) is an important standard for any JVM-based application, that wants to access a relational database. Trino ships a JDBC driver that abstracts all the low-level details of our conversational REST API for client tools and supports various authentication mechanisms, TLS, and other features. This allows tools like Coginiti to ignore those details, and work with the community on any improvements for the benefit of all users.

Client tool focus on Coginiti

Matthew and Roman are joining us from Coginiti. Coginiti delivers higher-quality analytics faster. Coginiti provides an AI-enabled enterprise data workspace that integrates modular development, version control, and data quality testing throughout the analytic development lifecycle.

With support for Trino, Coginiti as a client tool provides access to all the configured catalogs in Trino. It enables data engineers and analyst to work together in a shared platform, reducing duplication in their work, and bringing “Don’t repeat yourself (DRY)” to analysts.

We talk about why Coginiti added support for Trino. Coginiti is not a compute platform itself, but access to many platforms enables a “data blender thinking”. So as a user you start caring less about the location and source of the database, and more about the data itself and how you can mix it together to gain better insights. Every enterprise has more than one data platform, with different data warehouses, RDBMSes, and data lakes. Matthew talks about reasons for this situation,. and how Trino as a partner platform to enables users to federate across all of these platforms when needed.

Demo of the episode: Coginiti and Trino

In the demo of Coginiti, Roman and Matthew show some of the features of the tool that enable code reuse and managing transformations on Trino. A tour through major aspects of the application gives a good impression on benefits and supported use cases.

Rounding out

Our line up for speakers and sessions for Trino Summit is nearly finalized. Join us on the 13th and 14th of December for the free, virtual event. Stay tuned for details about all the sessions soon, and in the meantime - don’t forget to register.

Our Trino SQL training series just had a successful third session yesterday, and you can check out all the material in our follow up blog posts:

There is still a chance for you to register and attend the fourth session live.

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Data management with SQL and Trino

2023-11-15T00:00:00+00:00

In the third part of our training series Learning SQL with Trino from the experts David Phillips and I changed gears from reading data and performing analytics with Trino. We looked the the topic of write operations. We covered creating catalogs, schema, tables, and then inserting and updating data, and talked about related topics such as data source and connector support.

The recording of the event allows you to watch it all as if you attended live, jump to specific sections as desired, or pause while you follow along with the demos:

The full timestamps for every part of the talk are in the description on YouTube.

Also make sure you take advantage of these additional resources:

General overview slide deck for the series, with links to resources like our community chat
Slide deck for Data management with SQL and Trino, including a file with all SQL statements ready to go
Trino: The Definitive Guide

One more episode to go this year, and then we are going to celebrate our users at Trino Summit 2023. Register now and catch us live for both events:

See you next time. I am excited to show you more about SQL routines.

Manfred

Share your best Trino SQL routine

2023-11-09T00:00:00+00:00

We want to see the best SQL routines you can write, feature them as examples in the documentation, and send you some goodies as a reward!

With the recent Trino 431 release we shipped a feature that has been awaited by many Trino users for a long, long time. SQL routines are an easy way to define our own procedural, custom functions. All users on your Trino instance can then use that function in their queries and enjoy the new feature to simplify their queries.

The new process of writing a routine in your client tool in SQL can be used as alternative to the old way of having to create a custom plugin in Java, compiling it, and getting the binary deployed in your cluster. The time it takes to use a function has gone from hours to minutes and a few commands!

Our documentation includes details for all the supported statements:

BEGIN
CASE
DECLARE
FUNCTION
IF
ITERATE
LEAVE
LOOP
REPEAT
RETURN
SET
WHILE

With the memory connector and the Hive connector supporting routine storage, you can use CREATE FUNCTION and DROP FUNCTION, so that everyone using the cluster has access to your routines.

The unit tests and our examples documentation contain a number of routines that scratch the surface of what is possible. Now, we are looking for you to help us improve the documentation and maybe even find some bugs. So here is what we are asking from you:

Upgrade your Trino cluster, CLI, and other clients to 431 or newer. Support in client tools may vary.
Learn from the documentation and write your own routines.
Send us your best SQL routine.
- Create a pull request to add to the examples in the documentation with a new section, and request a review from Manfred (mosabua)
- Alternatively, email the details and submit a CLA separately.
Explain the use case, what the routine does, and maybe also how it works.
Include the full statement for the CREATE FUNCTION definition and an example invocation.
Add any necessary tables or data so we can test the function.
Reach out to us on the Trino community Slack, if you need any help.

We plan to present submissions at Trino Summit 2023, write a blog post, add them to the documentation, and Starburst will send a cool reward for the ten best entries.

Also, if you have more great Trino usage to talk about and share, we would love to see your speaker proposal for Trino Summit.

We look forward to seeing many great submissions from you all.

See you at Trino Summit 2023, and don’t forget to register.

Martin, Dain, David, and Manfred

Trino is moving to Java 21

2023-11-03T00:00:00+00:00

We’re excited to announce that as of version 432, Trino can run with Java 21. In fact, the Trino Docker image uses Java 21 now. We have done upgrades to newer Java LTS versions successfully before when we upgraded to Java 11 and then Java 17 with Trino 390. Each time the improvements to the JVM runtime, the garbage collectors, the involved libraries, and the dependencies resulted in performance gains that came nearly for free.

And each time we were able to take advantage of new language constructs and standard libraries to improve the codebase for all contributors and maintainers of the project.

Now it is time to do it again.

In September, Java 21 was released as the newest long-term support version. The consolidated release notes are truly impressive when it comes to breath and depth of improvements throughout the runtime, the standard libraries, the included tools, and the overall system.

Java 21 provides numerous great opportunities to improve Trino. Even without many code changes, the performance benefits can have a significant impact on the cost of running a Trino cluster.

Taking it one step further, and into the codebase and used libraries, we are able to move our performance work to the next level. Project Hummingbird, our performance fine-tuning initiative, is buzzing already. Dain Sundstrom shipped some great improvements recently again. Just like with our Java 17 upgrade, Mateusz Gajewski has been of critical importance to pull all the necessary changes together.

With the Trino 432 release we have now made the next big step. The Trino Docker image was changed to use the Eclipse Temurin distribution of Java 21. We have been running our test suites with Java 21 for quite some time and all looks good. With this release, you are now able to easily test Trino with Java 21. Just use the Docker container in your deployment or testing with your own pipeline or with the Trino Helm charts. The new version 0.14.0 of the chart already uses the right JVM configuration and Trino 432 by default.

Our plan is to make Java 21 the required runtime and move towards adopting the new language features and libraries. However, before we do that, we want your input. Are you ready to move to Java 21 for Trino? Did you do some testing with it already? Are there any issue you encounters? We want to know all about your experience. Find us on the Trino community chat and ping us in the #dev channel. Or leave comments in our Java 21 tracking issue.

We want to hear from you. Any input and feedback is welcome.

Update from 11 Jan 2024: The release of Trino 436 includes the switch to Java 21 as a requirement for running Trino.

Advanced analytics with SQL and Trino

2023-11-01T00:00:00+00:00

In the second part of our training series Learning SQL with Trino from the experts Martin Traverso and I built on top of the foundational knowledge from the first training session. We continued to learn more about data types and working with them, including the important strings, numeric, temporal, and JSON types.

The recording of the event allows you to watch it all as if you attended live, jump to specific sections as desired, or pause while you follow along with the demos:

Following are a couple of specific timestamps for interesting topics snippets:

The full timestamps for every part of the talk are in the description on YouTube.

Also make sure you take advantage of these additional resources:

General overview slide deck for the series, with links to resources like our community chat
Slide deck for Advanced analytics with SQL and Trino, including a file with all SQL statements ready to go
Trino: The Definitive Guide

We are halfway through the series, and there is lots more to cover. Don’t forget to register for the next session, join us to ask specific questions, and learn much more about SQL and Trino:

See you next time,

Manfred

52: Commander Bun Bun takes a bite out of Yugabyte

2023-10-26T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Technical Content at Starburst (@simpligility)

Guests

Denis Magda, Director of Developer Relations at Yugabyte

Releases 428-430

Unofficial highlights from Cole:

Trino 428

Reduced memory usage for GROUP BY
Simplified configuration for managing writer counts
Faster reads for small Parquet files on data lakes
Support for query options on dynamic tables in Pinot

Trino 429

Faster reading of ORC files in Hive
More types supported for schema evolution in Hive
Security improvements, including logging out of a session with the Web UI

Trino 430

Improved performance of GROUP BY
Support for setting a timezone on the session level
Table statistics in MariaDB

Concept of the episode: JDBC-based connectors

In Trino, we have a lot of connectors that are based on top of JDBC. JDBC could stand for “just da best connectors,” but it’s really Java database connectivity, and it’s one of the core APIs by which many of the most prominent connectors in the Trino ecosystem function. It’s so common, in fact, that we have an example JDBC connector in Trino to make it easier to go implement your own JDBC-based connector if you need one.

Concept of the episode: YugabyteDB

But if the topic of today’s episode is YugabyteDB, why are we talking about PostgreSQL? Well, if you’re unfamiliar with Yugabyte, lifting from their docs: “YugabyteDB is distributed PostgreSQL that delivers on-demand scale, built-in resilience, and a multi-API interface.” Distributed architecture should be a familiar concept to a community involved with a distributed query engine, and if you understand how Trino is able to leverage it, you should also understand why it makes sense to pair with Yugabyte. We’ll be discussing why Yugabyte got started, what it does differently from other databases, what it does better than other databases, and how you might want to use it with Trino.

Demo of the episode: Trino on YugabyteDB

As part of the episode, we’ll also be showing off how you can use YugabyteDB with Trino. We start with using the PostgreSQL connector, then Denis shows how to use the PostgreSQL connector to run Trino with Yugabyte. It’s always hard to explain demos in show notes, so tune into the YouTube video and take a look for yourself if you’re curious!

Rounding out

Trino Summit, the biggest Trino event of the year, is coming up on the 13th and 14th of December, and like Trino Fest, it’ll be fully virtual. If you’d like to give a talk about anything related to Trino, we’re looking for speakers now. Submit your talk here! If you’d rather attend, you can also go register to attend now.

Prior to Trino Summit, if you’d like to learn about SQL from the absolute experts, we’ve also gotten started with the Trino Training Series that we’ll be running as a buildup to the summit. The recap for the first session is live, but there’s three more to come! Register now and look forward to those great sessions starting from the ground up and ending with some key tricks and Trino specifics that even a seasoned SQL veteran may not know about.

We also have a talk about Trino on Ice and data meshes coming up in Redwood City with Slalom and Starburst. If you’re local, consider signing up and checking it out!

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Getting started with Trino and SQL

2023-10-18T00:00:00+00:00

In our training series Learning SQL with Trino from the experts Martin Traverso, Dain Sundstrom, David Phillips, and myself will run through the wide range of SQL support and features of Trino with our audience. In the first episode, we covered the concepts of Trino and SQL, and then started to learn some basic SQL. Now you can take advantage of the recording and available resources to learn at your own pace.

The recording of the event allows you to watch it all as if you attended live, jump to specific sections as desired, or pause while you follow along with the demos:

Following are a couple of specific timestamps for interesting topics snippets:

The full timestamps for every part of the talk are in the description on YouTube.

Also make sure you take advantage of these additional resources:

General overview slide deck for the series, with links to resources like our community chat
Slide deck for SQL and Trino concepts
Slide deck for SQL basics with Trino, including a file with all SQL statements ready to go
Trino: The Definitive Guide

Now that you know of the series and saw the first part of it, make sure you register for the next ones, so you can ask specific questions and learn much more about SQL and Trino:

See you then,

Manfred

A report from the Trino Conference Tokyo 2023

2023-10-11T00:00:00+00:00

The Trino community in Japan held an online event on October 5th, 2023. This article is a summary of the conference aiming to share the presentations and provide an overview.

Watch a replay of the whole event, or jump to specific time stamps and topic of interest:

This year, there were 4 sessions:

Trino, Starburst Galaxy, and Enterprise
Log infrastructure using Trino and Iceberg
Data infrastructure using Spark and Trino on bare metal k8s
Getting started Trino and a transactional data lake with serverless Athena

Trino, Starburst Galaxy, and Enterprise

The first session was presented by Yuya Ebihara (me) from Starburst. I explained the Trino changes from 2022 and 2023, as well as features of Starburst Galaxy and Starburst Enterprise. The session introduced a press release of the partnership of Starburst and Dell Technologies in Japan.

Log infrastructure using Trino and Iceberg

The second session was presented by Tadahisa Kamijo from Sakura Internet. He explained some requirements for new analytics environments such as concurrent read/write, schema evolution, record-level modification, restoring past snapshots, and addressing performance issues with the Hive metastore. They decided to use Trino and Iceberg for handling these requests. Kamijo-san also introduced the file layout in Iceberg and demonstrated how to debug Iceberg files using their Java client.

Data infrastructure using Spark an Trino on bare metal k8s

The third session was presented by Yasukazu Nagatomi from MicroAd. They started a migration to Trino from Impala to resolve the following issues - separating computing and storage, refreshing and utilizing table and column statistics even with large tables, and supporting schema evolution. Nagatomi-san shared a use case of the Trino features fault-tolerant execution and spill-to-disk, which is the first public use case of these features in Japan.

ベアメタルで実現するSpark＆Trino on K8sなデータ基盤 from MicroAd, Inc.(Engineer)

Getting started Trino and a transactional data lake with serverless Athena

The last session was presented by Sotaro Hikita from AWS. Athena is a serverless service for ad hoc analytics with Trino and Presto foundation. It supports not only S3 data but also various datasources via Federated Query. In Athena, Iceberg supports both read and write operations, while Hudi and Delta Lake only support read operations.

Wrap up

We sincerely appreciate the participation of community members in Japan. Thank you so much for watching the live event. We are planning to hold an offline event next year, see you next time!

Yuya

51: Trino cools off with PopSQL

2023-10-05T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Technical Content at Starburst (@simpligility)

Guests

Jake Peterson, Head of Customer Success at PopSQL
Matthew Peveler, Software Engineer at PopSQL, MasterOdin on GitHub

Releases 423-427

Official highlights from Martin Traverso:

Trino 423

Schema evolution for nested fields
Support for comments on materialized view columns
Support for CASCADE option in DROP SCHEMA for Clickhouse, MariaDB, MySQL, Oracle and SingleStore
Various performance improvements

Trino 424

Improved performance for JSON, CSV, text and related formats in Hive
Support for CASCADE in DROP SCHEMA for PostgreSQL and Iceberg
Improved coordinator CPU utilization for large clusters

Trino 425

Improved performance of GROUP BY.
Support for check constraints in MERGE for Delta Lake connector.
Support for the Decimal128 in MongoDB connector.

Trino 426

Support for SET/RESET SESSION AUTHORIZATION.
Improved performance of aggregations over decimal values.
Support for TRUNCATE TABLE in Delta Lake connector.
Support for Databricks 13.3 LTS.

Trino 427

Improved performance for GROUP BY and DISTINCT.
Support for pushing down UPDATE statements into connectors.
Support for reading Delta Lake tables with Deletion Vectors.
Faster writing to Parquet files in Delta Lake and Iceberg.
Support for querying tags in Iceberg.

Concept of the episode: PopSQL

It may be familiar to some of our viewers to describe an environment where key queries and dashboards are buried in someone’s personal workspace, and you have to go ask them directly every time you want to check on your metrics. When you’re running a world-class, highly-performant query engine like Trino and investing time and resources into maintaining it, shouldn’t you treat your queries like a first-class, collaborative, versioned system, too?

PopSQL, a playful spin on the word popsicle, solves the sadness that is disorganized and siloed insights by centralizing queries into a platform that has versioning, security, and a suite of collaborative tools comparable to Google Drive. Want to work with your teammate on a query? You can open up the same editor and see the same thing. Want to see what that query someone ran last week was to see how the new feature is doing? It’s there. Have a suggestion to improve something? Leave a comment. Realize your suggestion was wrong and need to undo the change? You can view past versions of the query.

PopSQL and Trino make sense together. PopSQL provides a best-in-class interface for organizing, collaborating, and working together on all of your SQL queries across the business, and Trino handles running those queries at unparalleled speeds. They go hand-in-hand for treating your data and SQL analytics as first class citizens. In today’s episode, we’ll be exploring what PopSQL is, how it integrates with Trino, and how the engineers at PopSQL have done some cool things with Trino to make the integration better than ever before. We’ll start with that last one, actually.

Concept of the episode: A new Node.js adapter for Trino

Trino in the frontend is… a tricky thing. We can go ahead and admit that the Trino web UI isn’t going to win any awards for design or functionality. And while a couple Node-based libraries exist out there, including presto-client-node and lento. But presto-client-node lacked support for streaming and had some issues handling 500 errors, and lento doesn’t quite support Trino out of the box and only supports single streams, which wasn’t ideal for PopSQL’s distributed architecture. So when PopSQl’s engineers went to build their frontend and integrate with Trino, what did they do? Build their own adapter.

We’ll talk about how it was implemented, what key features it unlocks, and why it makes using PopSQL with Trino an even better experience.

Demo of the episode: Using PopSQL with Trino

It’s hard to write show notes for a demo, because you can’t really experience the demo by reading about what’s happening. But as a surface-level overview, we’ll be going over:

Setting up a connection
The schema explorer
The SQL editor
Query scheduling

PR of the episode: #57 on trino-gateway: Release version 3

Last week, the community officially released the trino-gateway, a proxy and load balancer that enables large operations to run multiple Trino clusters in harmony with each other to serve big queries and small queries alike. If you or your organization have a need for more than one Trino cluster and want the seamless experience of being able to connect to any of them through a single interface, then check it out! It’s the product of many months of effort and should be a fantastic solution for running Trino at the absolute largest scales.

To learn more about it, you should check out the blog post announcing its first release.

Rounding out

Prior to Trino Summit, if you’d like to learn about SQL from the absolute experts, we’ve also announced the Trino Training Series that we’ll be running as a buildup to the summit. Register now and look forward to four great sessions starting from the ground up and ending with some key tricks and Trino specifics that even a seasoned SQL veteran may not know about.

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Trino Gateway has arrived

2023-09-28T00:00:00+00:00

You started with one Trino cluster, and your users like the power for SQL and querying all sorts of data sources. Then you needed to upgrade and got a cluster for testing going. That was a while ago, and now you run a separate cluster configured for ETL workloads with fault-tolerant execution, and some others with different configurations.

With Trino Gateway we now have an answer to your users request to provide one URL for all the clusters. Trino Gateway has arrived!

Today, we are happy to announce our first release of Trino Gateway. The release is the result of many, many months of effort to move the legacy Presto Gateway to Trino, start a refactor of the project, and add numerous new features.

Many larger deployments across the Trino community rely on the gateway as a load balancer, proxy server, and configurable routing gateway for multiple Trino clusters. Users don’t need to worry about what catalog and data source is available in what Trino cluster. Trino Gateway exposes one URL for them all. Administrators can ensure routing is correct and use the REST API to configure the necessary rules. This also allows seamless upgrades of clusters behind Trino Gateway in a blue/green deployment mode.

Up to now, many users had to maintain separate forks of the legacy Presto Gateway. Some of these users created numerous improvements in isolation of each other, sometimes even implementing the same feature multiple times. This first release of Trino Gateway starts a strong collaboration of some of these users. Bloomberg contributed the main bulk of the new features, including the much-requested support for authentication and authorization on Trino Gateway itself. Maintainers and contributors from Starburst pulled together the stakeholders and managed the project, and collaborators from Naver, LinkedIn, Dune, and others are already helping out and ready to move the project forward.

There are exciting times ahead for the project, and we have big plans for documentation, installation, and general modernizations of the app, so go and have a look at the project, read the documentation and release notes, file an issue, or submit a pull request:

Trino Gateway

Interested to find out more? Find us and others users and contributors on the trino-gateway and trino-gateway-dev channels in the Trino community Slack.

Also, don’t forget to tell us about your usage of Trino Gateway or Trino and submit a talk for Trino Summit 2023. And if you just want to learn and listen to others, register as attendee.

Manfred, Martin, and all the other Trino Gateway contributors

Learning SQL with Trino from the experts

2023-09-27T00:00:00+00:00

Do you have a rough idea of what SQL is? Do you need to get data out of object storage in the cloud and some relational database at the same time? You should look at Trino and learn about SQL.

Or do you know the ins and outs of joins, window functions, and your SQL queries are counted by the pages and not lines? You may even be the expert on SQL on your team. You should also look at Trino and SQL.

Luckily for you all, we have the right SQL training for everyone in our upcoming series with the founders of the Trino project and SQL experts Martin Traverso, Dain Sundstrom, and David Phillips, and myself as host and co-trainer.

In the SQL training series, we start with the basics of Trino. You will learn that despite the fact that there is leopard frog on the cover of Trino: The Definitive Guide, SQL does not stand for Silly Quacking Leopardfrogs. Instead SQL stands for Structured Query Language, and you will learn about the benefits of connecting many data sources to Trino, and using different clients. And you can always use the same powerful SQL. And for the SQL pros, you learn about catalogs and queries that go across data sources.

Then we’ll glance at the basic SQL foundations, since there are literally hundreds of books, videos, and training course around. All of them teach you things like SELECT statements, and WHERE clauses, and unravel the confusions around LEFT OUTER JOIN and the like.

And after this is when we get to the interesting stuff. Following is a list of some of the topics we will cover:

Trino concepts like cluster, data source, client, catalog, and more
Overview of all the SQL support with statements, data types, functions, and connector support
Working with data types, including numerical and text values, dates and times, JSON, …
Lots of scalar, aggregation, window functions
Object storage and other data sources
Creating schemas, tables, and views
Inserting, merging, moving and deleting data
Metadata in general and in hidden tables like $properties
Table procedures
Trino views, Trino materialized views and other views
Global and connector level table functions, including query pass-through
Support for SQL routines, also known as user-defined functions

Interested now? No matter how great your SQL knowledge or Trino expertise is, you will learn something new in this series. So what are you waiting for?

Join us in one or all of the sessions on the following dates:

18th of October 2023: Getting started with Trino and SQL
1st of November 2023: Advanced analytics with SQL and Trino
15th of November 2023: Data management with SQL and Trino
29th November 2023: Functions with SQL and Trino

We look forward to seeing you in class.

Martin, Dain, David, and Manfred

Update:

Videos, slide decks, and other resources for all classes are now available:

Getting started with Trino and SQL: Blog post with resources and video, Video on YouTube
Advanced analytics with SQL and Trino: Blog post with resources and video, Video on YouTube
Data management with SQL and Trino: Blog post with resources and video, Video on YouTube
Functions with SQL and Trino: Blog post with resources and video, Video on YouTube

Chinese edition of Trino: The Definitive Guide

2023-09-21T00:00:00+00:00

Trino, Trino, Trino everywhere. Just looking at our website stats and the users in our community chat, we know that Trino is going places. We also know that one of these places with a large user community is China. And now we have good news for you. A translation of the second edition of the book to Chinese is now available.

Today, we are happy to announce that a Chinese translation of the book Trino: The Definitive Guide is now available for the communities all across China and far beyond and hopefully a lowers the barrier to Trino for native speakers. We invite you all to get your own copy:

Trino权威指南(原书第2版) 机械工业出版社

Our thanks goes out the teams at O’Reilly and dangdang for making this happen. We hope many readers will benefit from the translated edition.

We look forward to chatting with many of our new readers and Trino users on the general-cn channel in the Trino community Slack, other channels, and direct messaging.

Also, don’t forget to tell us about your usage of Trino. You can contact us on Slack to be a guest in Trino Community Broadcast or submit a talk for Trino Summit 2023. And if you just want to learn and listen to others, register as attendee for Trino Summit 2023.

Manfred, Martin, and Matt

Join us for Trino Summit 2023

2023-09-14T00:00:00+00:00

The Trino community is buzzing. Commander Bun Bun is ready to invite you all to join us for Trino Summit 2023. And “all” really means everyone in the community. The event is free to attend, virtual, and full of news and shared knowledge from your peers using Trino. Don’t hesitate to submit your talk and register to attend now.

We are pleased to announce the upcoming Trino Summit 2023. The summit is scheduled as a virtual event on the 13th and 14th of December 2023, and attendance is free!

If you’d like to share your knowledge and information about Trino usage and give a talk at this year’s Trino Summit, we’re putting out a call for speakers. We are accepting submissions from now until the 12th of November, but we recommend submitting as soon as possible, because we expect slots to fill up fast.

We’re looking for intermediate to advanced-level talks on a variety of themes. If you have an interesting story about how you leverage Trino in your data platform for analytics and other workloads, found a neat way to extend it with a custom plugin or add-on, or swapped to Trino for a performance win, we’d love to hear about it. We’re excited to expand our speaker lineup with talks from the broader Trino community. Find more information about duration, technical details, and more suggestions when you submit your talk.

The event of the Trino Software Foundation is organized and sponsored by Starburst, and we invite other sponsors to help make this a successful event for the Trino community.

If that interests you or your employer, contact the Trino events team for more information.

And of course, we’re looking forward to reading your proposals and seeing you then.

50: Celebrating 50 episodes of Trino Community Broadcast

2023-07-27T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Technical Content at Starburst (@simpligility)

Guests

Brian Olsen, Head of Developer Relations at Tabular (@bitsondatadev)
Dain Sundstrom, Trino co-creator and CTO at Starburst (@daindumb)

Releases 421-422

Unofficial highlights from Cole:

Trino 421

Add support for CHECK constraints in UPDATE statements.
Support for INSERT on Google Sheets.
Faster queries on MongoDB tables with row columns.

Trino 422

Faster INSERT and CREATE TABLE AS ... SELECT queries.
Support for nested fields in ADD COLUMN.
Faster Avro reader for Hive.
register_table procedure to register Hadoop tables in Iceberg.

Concept of the episode: 50!

No, that’s not a factorial, we’re just excited to have made it to 50 Trino Community Broadcast episodes. We’ve brought back some familiar faces to talk about what we’ve done, how we’ve got here, what it takes to keep an open source project ticking for over a decade, and celebrate the steps we’ve taken along the way. It’s unscripted, and the discussion carries to wherever it feels like.

Tune in to hear about the history of the Trino Community Broadcast, the upcoming Snowflake connector, and a few of the core philosophies that have kept Trino running. Manfred also shows off updates to the Trino website, highlighting all the tools, data sources, and add-ons that you can use with Trino.

Trino events

Trino Fest was a little over a month ago, and we’re publishing the last recap of all the talks to the Trino blog today! Check out our YouTube channel and the Trino website to catch up on everything you missed.

If you have an event that is related to Trino, let us know so we can add it to the Trino events calendar.

Rounding out

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

FugueSQL: Interoperable Python and Trino for interactive workloads

2023-07-27T00:00:00+00:00

Fugue may be an unfamiliar name to those in the Trino ecosystem. It’s another Python tool, a programming model built to enhance interoperability between Python and SQL. On the Python side of things, it’s a wrapper around common tools like pandas and Polars that convert code into SQL for high-performance, large-scale query execution. So why are we talking about it at Trino Fest? Because Fugue recently launched an integration with Trino, enabling you to write Python code that can be converted to SQL to run on a high-powered Trino backend.

Though Trino users are quite familiar with SQL, it does present some challenges. Iterating on a SQL query and improving it can be difficult, and finding ways to optimize or speed things up can be a challenge that requires sophisticated external tools or working on hunches. Testing queries, especially incrementally, has never been super easy, either. Compare that to Python, which does not have those problems, but has issues of its own. Python, especially at scale, is not very performant. So it’s natural to try to take the advantages of both, which is what Fugue is aiming to do.

After that brief intro into Fugue, the rest of the talk consists of technical demos of the many various things that you can do with Fugue. This includes setting a query up, breaking it up into smaller parts, bringing it to pandas, and demonstrating extensions that are built into Fugue. With all of these intermediate steps, it becomes easier to unit test queries before sending them into production, making sure that everything works as expected.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

Starburst Galaxy: A romance of many architectures

2023-07-25T00:00:00+00:00

Let’s cut straight to the chase with this lightning talk from Benjamin Jeter, a data architect, platform manager, and data engineer at Datto. For those that are not familiar with Datto, they are an American cybersecurity and data backup company. They’re the leading global provider of security and cloud-based software solutions purpose-built for Managed Service Providers (MSPs). In Benjamin’s talk, he goes through some of the considerations and design goals of a reference architecture pattern that they use and why they chose to use Trino with Starburst Galaxy.

Check out the slides!

Recap

But you might be wondering: what does Ben mean when he says “reference architecture”? A reference architecture pattern is a pattern for making arbitrary data available to end users in a reproducible and modular way. It’s an opinionated representation of what best practices look like for a given class of use cases. You can almost think of it as a conceptual tool for thinking critically about specific patterns through a pragmatic balance of simplicity and effectiveness. However, it is not something that will work for every use case and not necessarily the best solution.

The main design goal that Benjamin had was to facilitate near real-time data access while using only Trino. In addition, he wanted it to be simple, easy to understand, flexible, and adaptable. Accomplishing this design goal requires many steps, such as first having a daily batch transform that transforms JSON into Iceberg and serve as T-1 data. Then he created an unpartitioned external table that is rebuilt every day as part of the daily batch transform. Using the Great Lakes connectivity with this table allows Datto to have scan on query semantics, which enables data access about as real-time as you can get it without a streaming solutions like Kafka or Kinesis. Benjamin shows how easy it is to design a use case with just a couple lines of code using Trino with Starburst Galaxy.

Interested? Check out the video where Benjamin shows the code and explains how it works!

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

Trino optimization with distributed caching on data lakes

2023-07-21T00:00:00+00:00

By 2025, there will be 100 zetabytes stored in the cloud. That’s 100,000,000,000,000,000,000,000 bytes - a huge, eye-popping number. But only about 10% of that data is actually used on a regular basis. At Uber, for example, only 1% of their disk space is used for 50% of the data they access on any given day. With so much data but such a small percentage being used, it raises the question: how can we identify frequently-used data and make it more accessible, efficient, and lower-cost to access?

Once we have identified that “hot data,” the answer is data caching. By caching that data in storage, you can reap a ton of benefits: performance gains, lower costs, less network congestion, and reduced throttling on the storage layer. Data caching sounds great, but why are we talking about it at a Trino event? Because data caching with Alluxio is coming to Trino!

Check out the slides!

Recap

So what are the key features of data caching? The first and foremost is that the frequently-accessed data gets stored on local SSDs. In the case of Trino, this means that the Trino worker nodes will store data to reduce latency and decrease the number of loads from object storage. Even if the worker restarts, it also still has that data stored. Caching will work on all the data lake connectors, so whether you’re using Iceberg, Hive, Hudi, or Delta Lake, it’ll be speeding your queries up. The best part is that once it’s in Trino, all you need to do is enable it, set three configuration properties, and let the performance improvement speak for itself. There’s no other change to how queries run or execute, so there’s no headache or migration needed.

Hope then gives deeper technical detail on exactly how data caching works. She highlights a few existing examples of how large-scale companies, Uber and Shopee, have utilized data caching to reap massive performance gains. Then the talk is passed off to Beinan, who gives further technical detail, exploring cache invalidation, how to maximize cache hit rate, cluster elasticity, cache storage efficiency, and data consistency. He also explores ongoing work on semantic caching, native/off-heap caching, and distributed caching, all of which have interesting upsides and benefits.

Give the full talk a listen if you’re interested, as both Hope and Beinan go into a lot of great, technical detail that you won’t want to miss out on. And don’t forget to keep an eye on Trino release notes to see when it’s live!

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

Inspecting Trino on ice

2023-07-19T00:00:00+00:00

For those unfamiliar, Stripe is an online payment processor that facilitates online payments for digital-native merchants. They use Trino to facilitate ad hoc analytics, enable dashboarding, and provide an API for internal services and data apps to utilize Trino. In Kevin Liu’s session at Trino Fest 2023, he showcases the Trino Iceberg connector and how it can replace more complex usage to access Iceberg metadata. He also discusses how Trino is a core part of operations at Stripe.

Check out the slides!

Recap

Trino is the foundational infrastructure on which other data apps and services are built upon. In Kevin’s words, “I call Trino the Swiss army knife in the data ecosystem.”

At Stripe, they use Iceberg tables extensively, replacing legacy Hive tables. But Iceberg isn’t perfect: one problem with Iceberg is reading its metadata from S3. To work with Iceberg metadata, Stripe developed an internal CLI tool. The tool requires a privileged internal machine, which is only accessible to developers. And outputs the result in JSON format, which is difficult to process, read, and use for further analysis. However, Kevin found that the Trino Iceberg connector can replace most of the functionality of the Iceberg CLI. The connector brings Iceberg metadata information to Trino’s powerful analytical engine and facilitates lightning fast debugging and analysis.

Unfortunately, there was no way to grab all desired table property information from the Trino Iceberg connector, because they were using an older version. Thus, they use the Trino PostgreSQL connector to connect directly to the backend database of the Hive Metastore, allowing them to inspect table metadata directly. With the two connectors, they have all the information about the data warehouse, powering their analysis and meta-analysis of the data and how it’s used.

They also use Trino to inspect Iceberg usage patterns. They log every Trino query using the Trino event listener and store that in another PostgreSQL database. This gives the full information of every query that has ever run through Trino, and allows them to perform analysis using historical queries. Combined with Trino’s built-in query metadata enrichment, this method enables a multitude of auditing, debugging, and optimization use cases.

In the future, they plan to use Trino to improve data quality by leveraging it as a validation framework, to perform Iceberg table maintenance, and to optimize tables based on historical read patterns.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

Data mesh implementation using Hive views

2023-07-17T00:00:00+00:00

At Comcast, data is used in a data mesh ecosystem, with a vision where users can discover data and request data through a self-service platform. With federation, various tools, and the ability to create, read, and write data with different platforms, it’s a full-blown data mesh. So how do you build that? With Trino, of course, and with the power of Hive views. Tune into the 10-minute lightning talk that Alejandro gave at Trino Fest to learn more about how Comcast pulled it off.

Recap

With various different storage systems, like S3 and MinIO, and users that want to be able to use a variety of data platforms, including Trino, but also Databricks and Spark, Comcast needed something to sit between the data and those platforms. The solution was the Hive CLI and Hive views, which could read from all their various forms of storage, and which could be read from all the user-facing query engines and data platforms with no issues.

By centralizing data, there was also the upside of easily integrating with Privacera, which allowed for privacy policies to be implemented without much issue. Users could request access to the data within the Hive views, and data owners could approve or reject access as appropriate. Because of the centralization, it was easy to go very fine-grained with data access rules, allowing for access control as specific as column-level.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

DuneSQL - A query engine for blockchain data

2023-07-14T00:00:00+00:00

The need to make blockchain data easily accessible has risen over the recent years due to the popularity of cryptocurrencies, NFTs, and other uses of blockchains. Dune has made it their mission to make blockchain data more accessible. Dune is a community data platform for querying public blockchain data and building beautiful dashboards. They use their own query engine called DuneSQL, built as extension of Trino, to query blockchain data. In the session, Miguel and Jonas from Dune talk about the challenges of querying blockchain data, their transition to Trino, and how DuneSQL is operated. Watch the recording of the session or keep reading for a recap.

Check out the slides!

Recap

The Dune community data platform is a serverless, open access, community-wide collaboration portal. Dune experienced some difficulties with blockchain data, such as processing and ingesting raw data, deserializing and decoding function calls and arguments, and allowing the community to build abstractions. Their engine, DuneSQL, is Trino with custom extensions that they created. It runs tens of thousands of queries that are executed, saved, and re-used each day.

At first, Dune used PostgreSQL, where they sharded per blockchain and used vertical scaling. However, they quickly ran into bottleneck issues on storage size and IOPS (I/O operations per second). Thus, they switched to Apache Spark with Databricks to allow horizontal scaling and support more blockchains processing and to support the vast query volume that they had. Unfortunately, the result was not performant and not interactive enough. In the end, Miguel says that, “Trino was our choice for performance reasons, for the good environment and ecosystem, and to fully support our scheme and our datasets.” Using Trino addressed the performance issues.

Operating DuneSQL requires modifications and extensions of Trino to suit the needs of the users and platform as a whole. DuneSQL needs to manage the whole fleet and the capacity they have, because they use over 4000 CPUs per hour, do more than 100 billion S3 requests per month, and operate over 10 clusters. To handle the scheduling and load balancing of these massive operations, DuneSQL uses query execution services and gateway. Clusters have a fixed size to have a predictable capacity and performance. The gateway exposes the clusters to reduce the blast-radius so failures do not affect other clusters. Even with all these adjustments, they still have work to do as they plan to optimize the billions of S3 requests they receive, improve data layout, and implement sandboxed user defined functions.

Interested in DuneSQL? Check out the video where Jonas goes over the specificities and unique characteristics of DuneSQL.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

Let it snow for Trino

2023-07-12T00:00:00+00:00

In this recap, we can skip right to the exciting part: through the joint efforts of engineers at ForePaaS and Bloomberg, there is a Snowflake connector coming to Trino! Though it hasn’t landed yet, it has been tested and run in production at both companies, and a pull request is open and working its way towards completion as this blog post goes up. In the talk, Yu and Erik talk about difficulties in developing the connector, the motivations to make it happen, and the new features that come as part of it for Trino users to take advantage of. Sound interesting? Give the talk a listen, or read on for more details.

Check out the slides!

For those unfamiliar, Snowflake is a cloud-based data warehousing and analytics platform. It offers a great combination of scale, flexibility, and performance, with the downside of being a proprietary software that is vendor-locked, and in order to use Snowflake, you must go through Snowflake, Inc. ForePaaS and its customers store data in Snowflake, but they also store data in many other formats and systems, and they rely on Trino to run their analytics. With no Snowflake connector in Trino, this meant that while they could run analytics and queries on most data, Trino had a blind spot. They needed to develop a Snowflake connector in order to see and query 100% of their data. Bloomberg was in a similar boat, having data in Snowflake, using Trino for analytics, and needing a way to join those two together. With a shared need, ForePaaS and Bloomberg joined forced and made the connector happen.

The connector has been in use at both companies for some time, and it comes with the full feature set one would expect from a Trino connector. With the connector, you can query Snowflake directly from Trino, taking advantage of Trino’s lightning-fast speeds and the underlying features of Snowflake with no issue.

Curious to see more? For the rest of the talk, Erik Anderson at Bloomberg gives a demo of the connector in action. Give the talk a watch, and you can check out progress on how adding the connector to Trino is coming along on the pull request contributing it.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

Redis & Trino - Real-time indexed SQL queries (new connector)

2023-07-10T00:00:00+00:00

Ever since the pandemic, it has become clear that the need for a digital first economy is becoming more and more necessary. As Redis’ Field CTO Allen Terleto said during their talk from Trino Fest 2023, “In a digital first economy, data is the lifeblood of the organization, which makes the databases the heart of enterprise architectures”. Redis, a popular open source project, is a distributed in-memory key–value database. It includes a cache, message broker, and optional durability. In his talk, Allen demonstrates Redis’ new connector for Trino. It can push down advanced queries and aggregations while leveraging Redis’ unique in-memory secondary indexing. As a result, performance with the new connector is much higher.

Recap

Redis is an open source, in-memory, NoSQL database that natively supports a variety of data structures. Redis is designed for utmost performance and high throughput use cases across different types of workloads. Redis is widely known for being the fastest data store in the market with sub millisecond performance, its ease of use, and being a multi-model database. Redis is able to map relational tables to a key-value database by adding a key-value pair as a hash attribute for each column. However, how can you search for a certain key in a way that scales well in high throughput databases? Redis has a unique way to deal with this problem: secondary indexing and Redis Search.

Redis Search enables secondary indexing and full-text search, which allows Redis to support many features such as multi-field queries, aggregations, exact phrase matching, numeric filtering, geo-filtering, and vector similarity semantic search on top of text queries. As Allen says, “Redis Search will be at the heart of our new integration with Trino and game-changing better performance at scale to the existing Redis Trino connector”. In addition, Redis supports a native data model for JSON documents, allowing you to store, update, and retrieve JSON values in a Redis database like other Redis data types. It also works with Redis Search to let you index and query JSON documents.

The syntax for Redis Search is a bit different from traditional SQL syntax, so Redis is introducing a quicker and more reliable Redis-Trino connector that lets you easily integrate with visualizations frameworks and platforms that support Trino. The connector is open source and publicly available on their public GitHub. In addition, it will be contributed directly to the Trino project.

Want to see Redis in action? Check out the video where Julien does a demo on how you can load data from some file system, relational database, or data warehouse and query it without writing a single line of code.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

Skip rocks and files: Turbocharge Trino queries with Hudi’s multi-modal indexing subsystem

2023-07-07T00:00:00+00:00

Optimizing data access and query performance is crucial to building low-latency applications and running analytics. Even with the modern data lakehouse designed to be as efficient and performant as possible, there are a number of bottlenecks that can slow things down and plenty of challenges to overcome. Nadine and Sagar explored this at Trino Fest, introducing us to multi-modal indexing and the metadata table in Hudi, how they work, and how leveraging them with Trino can unlock queries faster than ever before.

Check out the slides!

Recap

When you’re building large-scale data-based applications, bottlenecks are inevitable. Finding ways to address these bottlenecks and optimizing your platform to avoid them is going to be a huge cost, so it pays off to know your requirements. In the same vein, if you know the types of services and features you need to effectively scale, you can build with them in mind from the ground up. Hudi has a couple key features you might be interested in that aren’t present in all lakehouses:

Write indexing, speeding up and optimizing inserts and upserts
Automated table services, which handle clustering, cleaning, compacting, and metadata indexing without any need for manual orchestration or overhead

Nadine also goes on a deep dive into exactly how the Hudi table format works, but emphasizes that these extra features elevate it to being an entire platform, not just a table format.

From there, Nadine passes things off to Sagar, who does an explanation of the multi-modal indexing sub-system in Hudi, which features a scalable metadata table, different types of indexes, and an async indexer. All of these features minimize tradeoffs while maximizing performance, helping you read and write data faster than ever. And with Trino’s Hudi connector, the Trino coordinator is able to read the feature-rich Hudi metadata to more effectively delegate workers, leveraging that speed as the best-in-class query engine for running analytics on your data stored in Hudi.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

49: Trino, Ibis, and wrangling Python in the SQL ecosystem

2023-07-06T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Technical Content at Starburst (@simpligility)

Guest

Phillip Cloud, Principal Engineer at Voltron Data. Check out his YouTube channel!

Releases 419-420

Official highlights from Martin Traverso:

Trino 419

New array_histogram function.
Faster reading and writing of Parquet data.
Support for Nessie catalog in Iceberg connector.

Trino 420

Underscores in numeric literals (e.g. 1_000_000)
Hexadecimal, binary and octal numeric literals (e.g., 0x1a, 0b1010, 0o12)
Support for comments on view columns in Delta Lake connector.
Support for RENAME COLUMN in MongoDB connector.
Support for mixed case table names in Druid connector.
Faster queries when statistics are unavailable.

Question of the episode: What is Ibis?

Taken straight from the Ibis website, Ibis is a dataframe interface to execution engines with support for 15+ backends (including Trino!). Ibis doesn’t replace your existing execution engine, it extends it with powerful abstractions and intuitive syntax.

For those who love doing all their data-related work in Python, this allows you to write Python code that leverages the speed and power of Trino without needing to become a SQL master. For the die-hard SQL users out there, they have a guide on Ibis for SQL users that explains how it fully replaces SQL with Python code that is:

Type-checked and validated as you go.
Easier to write. Pythonic function calls with tab completion in IPython.
More composable. Break complex queries down into easier-to-digest pieces.
Easier to reuse. Mix and match Ibis snippets to create expressions tailored for your analysis.

Even if you’ve been writing SQL queries since day 1 and swear by it, opening the door to using Python for analytics creates many new possibilities, widens the possible talent pool you can work with, and gives you an entire second ecosystem to integrate with.

And ultimately, at the end of the day, the idea is that you get the ease of writing Python code with the power and performance of a blazing fast SQL engine. You get the best of both worlds, and using Ibis doesn’t lock you out of rolling up your sleeves and writing some SQL when a situation calls for it.

And you don’t need to learn different SQL dialects

Trino more or less adheres to ANSI SQL, but it implements some ANSI features that are rarely seen in other query engines, and other query engines choose to deviate in a variety of ways. This can be a headache if you’re migrating to Trino, as queries need to be re-written, re-structured, and tested to make sure they return the same results. If you got set up with Ibis, first, it would do that thinking for you, and a Python query could be converted to whatever dialect of SQL you need without any issue. It can save time, effort, headaches, or a sense of being locked into a specific SQL dialect, freeing you up to move between query engines without any pain points… because of course, you want to move to Trino, which is the best query engine.

It also needs pointing out that this allows you to federate your queries while you federate your queries.

Concept of the episode: Converting Python to SQL

Take some Python like so:

>>> import ibis
>>> movies = ibis.examples.ml_latest_small_movies.fetch()
>>> rating_by_year = movies.group_by('year').avg_rating.mean()
>>> q = rating_by_year.order_by(rating_by_year.year.desc())

And Ibis can automatically turn it into SQL that executes on Trino:

>>> con.compile(q)

SELECT year, avg(avg_rating)
FROM movies t1
GROUP BY t1.year
ORDER BY t1.year DESC

Obviously, this example is lightweight, but as queries grow more complex and sophisticated, the conversion becomes more and more worthwhile. And we mentioned that the Python code is easier to re-use, but it really is - if you want to run a similar query in conjunction with the query above, those movies and rating_by_year variables still exist, and writing some code to leverage them is a lot easier and more intuitive than setting up SQL sub-queries and aliases.

Questions for Phillip

Why is it called Ibis?
How much of a normal SQL workload do you think could be handled and run by Ibis?
How much can Ibis optimize SQL queries for performance?
Which SQL dialect has been the worst to deal with?

PR of the episode: #15026: Support INSERT in Google Sheets connector

Google Sheets is one of our not-as-talked-about connectors in Trino, but it still sees use and community updates, and we want to give that a shoutout in today’s Trino Community Broadcast. #15026 from Sebastien Bernauer adds INSERT support to the connector, so now you can read and write from Google Sheets in Trino, empowering the world of SQL-on-spreadsheets.

PR of the episode: #477 on trino.io: Add Mateusz Gajewski to maintainer list

We’ve added another maintainer to Trino! We just spend an episode introducing Manfred and James Petty as maintainers, and Mateusz is right behind them after years of effort helping Trino as a contributor and reviewer.

Trino events

Trino Fest wrapped up a few weeks ago, and we’re publishing recaps of all the talks to the Trino blog! Keep an eye on our YouTube channel and the Trino website to catch up on everything you missed.

If you have an event that is related to Trino, let us know so we can add it to the Trino events calendar.

Rounding out

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

AWS Athena (Trino) in the cybersecurity space

2023-07-05T00:00:00+00:00

Arctic Wolf Networks, a cybersecurity company that provides security monitoring to cyber threats, is one of the companies that have recently switched to using AWS Athena as a new and efficient service to query their data using Trino. AWS Athena is a serverless, interactive analytics service built on open-source frameworks that runs on Trino, supporting open table and file formats and providing a simplified, flexible way to analyze petabytes of data where it lives. Senior software developer Anas Shakra from Arctic Wolf Networks gave a talk at Trino Fest 2023 detailing their switch to AWS Athena and how “queries that took hours with old solution now take around a minute today”. Tune in to the talk or you can read the recap!

Recap

At Arctic Wolf, data access use-cases fall under three categories: investigations, compliance, and customer self-serve platform. The process of preparing the data follows an established pattern of starting with datastore, performing an operation to filter or transform the data, and then outputting the data in some format like a CSV or JSON, depending on the client needs. Arctic Wolf’s custom legacy service was unable to match the growing service demand and had four main problems:

Optimized for breadth over depth
Struggles to handle growing service demand
Proprietary query language
Complicated design

This compelled Anas’ team to find a different and improved service: Trino as provided by AWS Athena.

They had four main objectives for the new service: defined access patterns, performant at scale, user-friendly, and deterministic pricing. AWS Athena satisfied these objectives, while also providing numerous benefits such as using a powerful query engine, being purposefully built for large datasets, using SQL syntax, and having a clear pricing structure. However, with these benefits come some drawbacks for Athena. These includes being subject to quota limits, having suboptimal file sizes for their system, and being unable to control access sufficiently. Anas addresses this by using log queries that resolves these three main impediments. As next step, Anas is considering switching to a self-managed Trino deployment for more control with the same performance gains.

Want to learn more about log queries that they use? Check out Anas’ explanation in the video!

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

Ibis: Because SQL is everywhere and so is Python

2023-07-03T00:00:00+00:00

The PyData stack has been described as “unreasonably effective,” empowering its users to glean insights and analyze moderate amounts of data with a high level of flexibility and excellent visualization. The large-scale, production data stack using a query engine like Trino sits on the other side of the world, capable of handling petabytes and exabytes, but perhaps not integrating as seamlessly with the Python ecosystem as one would hope. SQL has been a means of bridging this gap, but we’ve now got an exciting solution to bridge it even better: Ibis.

Check out the slides!

A major problem with bridging the gap between Python and SQL engines has been the lack of standardization in SQL. Though Trino prides itself on being ANSI-compliant and many other SQL dialects strive to be similar, the reality is that every SQL engine is different, and a complicated SQL query will error out or return different results based on what engine you’re using. So if you want to convert some Python code to SQL, the question is… which SQL? If you’re doing your data analysis in Python because you prefer to use it, spending time scratching your head and trying to work out a SQL conversion can be frustrating, time-consuming, and painful. But SQL is everywhere, and for large, performant, efficient queries, you may need a SQL engine like Trino.

Enter Ibis, a lightweight Python library for “data wrangling.” It can easily convert your Python code into SQL queries for 16 different engines, including Trino. With Ibis, you can leverage the ease of writing Python code with the power and performance of running queries in Trino, getting the best of both worlds in both the Python and SQL ecosystems. Want to learn more? Check out the Ibis project website, give the talk a listen, and tune into the Trino Community Broadcast on July 6th, where we’ll be going into even more detail about Ibis.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

CDC patterns in Apache Iceberg

2023-06-30T00:00:00+00:00

Have you ever wanted to keep your data in a table and have an efficient way to interact with them? Iceberg, an open standard table format, is exactly what you need. One of the great and unique features of the Iceberg table format is its support for change data capture (CDC). Co-creator of Apache Iceberg, Ryan Blue, presented at Trino Fest 2023 this past week detailing the CDC support and the trade-offs between different patterns that can be used for writing CDC streams into Iceberg tables.

Check out the slides!

Recap

To begin, what is CDC and why should you use it? CDC is the idea that when relational or transactional tables are modified, you emit an update stream. This enables you to keep copies in sync by capturing changes to tables as they happen. As Ryan states, “[CDC] is very lightweight on the source database … rather than being super careful with what we run on the database, what we want to do is just make a copy of it very easily and maintain that copy.” Ryan continues giving an example of a bank using a transactional table in Iceberg to offer some context on what’s going on.

Although CDC has many advantages, there are also some problems that make it difficult:

Lower latency means more work
Write amplification - the work necessary to balance the trade-offs between efficiency at write time and efficiency at read time
Batch writes with double update and possible inconsistency
Read requirements with the different types of deletes in a table

With these types of problems, the importance of the trade-offs between the different patterns rise due to the need for utmost efficiency. The first trade-offs that Ryan talks about are the storage trade-offs between using direct writes and a change log table, which is considered the most important and often overlooked decision. The next trade-offs are in regards to the MERGE pattern’s choice of lazy merge (merge-on-read) or eager merge (copy-on-write). In addition, the commit frequency trade-offs have different benefits depending on if you prefer it to be faster or slower. The change log pattern and MERGE pattern both have benefits you may want, so Ryan suggests using a hybrid version of both that may give you what you want from both patterns. With Iceberg, you have the choice and the different CDC patterns can be supported for you to adjust your usage to your specific needs. Check out the video and review the slides for more details!

Want to read more about CDC? Check out some of Ryan Blue’s blog posts: Hello, World of CDC! and CDC Data Gremlins!

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

Zero-cost reporting

2023-06-28T00:00:00+00:00

Let’s say you have some data. Maybe it’s in a spreadsheet, a CSV file, a relational database, or multiple terabytes of data in an S3 bucket. You need to run SQL queries on this data, and you’d like to share those results with your teammates, coworkers, and partner teams, but you want to do it in a way that allows everyone to view those results on-demand, on the web, and with the latest results without the need for any manual effort on your part.

Recap

There are a lot of tools that might be able to do this for you, but whatever you choose, you’ll need to spend time or money to set it up, and you don’t want to spend a lot. With so many options, there’s the possibility of getting stuck in analysis paralysis, and trying to find the best way forward may leave you stymied. Jan Waś from Starburst has a suggestion: keep it simple with Trino, plaintext files, Git, and GitHub actions, and you can set it all up for free.

To start, why put results into plaintext files? With markdown, files are both human-legible and machine-readable. By saving queries in normal files, it’s easy to see and edit those queries. You can commit your queries and results to Git, and then you can push them to a service like GitHub, where those files will be even more readable thanks to the web UI. Then, once on GitHub, you can use the power of actions to re-run the queries, update your results on a schedule, and keep things up to date for teammates to view via GitHub Pages. Sound neat? Check out the talk to see how Jan does it!

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

Anomaly detection for Salesforce’s production data using Trino

2023-06-26T00:00:00+00:00

Rolling into our next presentation from Trino Fest 2023, we’re excited to bring you Tuli Navas and Geeta Shankar’s talk from the Performance Engineering Team at Salesforce. They provide numerous reasons for why they need Trino and further explain how it is essential for anomaly detection in their data. It’s an insightful talk about using a query engine to ensure data quality and how switching to Trino has massively improved their performance. You definitely don’t want to miss it.

Check out the slides!

Recap

Salesforce provides customer relationship management software and applications focused on sales, customer service, marketing automation, e-commerce, analytics, and application development. They host hundreds of thousands of customers that generate millions of transactions per day. For a company of this size, they need a query engine that is fast and efficient. During the talk, Tuli made it clear how much Salesforce relies on Trino, stating, “Trino has been a one-stop shop for analytics.” Trino is the perfect solution for them, as Tuli mentions, “Because of how well Trino scales and how efficiently it has been able to process even the most gnarly looking queries.” It allows them to do everything they need.

In addition, Trino has helped Salesforce get more value from their production logging data by accelerating their access to it, speeding up their decision making. For years, they used Splunk for all their production data, but after switching to Trino, they have had numerous improvements:

Reducing their team’s analytics cost
Improving their cost-to-serve
Improving the time it takes to run the same query by 194%
Providing an SLA of 20-minute latency on all production logs
Retaining and accessing data up to 2 years compared to Splunk’s 30 days
Reducing the number of queries needed, which creates a smaller footprint
Creating tables and views for temporary data storage and analytics

With this, they use specific heuristics to create an anomaly detection framework with a very quick response time that they are able to constantly observe. This also allows them to monitor customer behavior efficiently, allowing them to respond to any urgent changes quickly. In the future, they plan to expand and ramp up their usage of Trino throughout their teams.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

Trino for lakehouses, data oceans, and beyond

2023-06-22T00:00:00+00:00

Trino Fest 2023 got off to a bang, as Trino co-creator and maintainer Martin Traverso gave an update on all the amazing things that have happened to Trino since Trino Summit last year. He also provided some insight into what’s coming down the pipeline for Trino, with a brief look at the project’s roadmap. You can watch the recording of the talk if you want to see for yourself, or you can read on for the highlights.

Check out the slides!

Recap

It’s only been about 7 months since Trino Summit in 2022, but Trino moves quickly. In the words of Martin, “the project is on fire” and “is as active as it’s ever been,” leaving us a lot to catch up to since then:

16 releases and 2,250 commits
Two new maintainers
Several new table functions
Simplified configuration and improved performance for fault-tolerant execution
Better support for schema evolution and lakehouse migration
45 bullet points worth of performance improvements
Tracing with OpenTelemetry
An improved Python client and dbt Cloud support

And keep in mind that these are the highlights of the highlights! In the talk, Martin goes into depth on all of the above, making it a worthwhile watch or listen. There’s also a lot to look forward to, which you’ll hear more about as they roll out in the coming months:

SQL 2023, including enhancements to JSON functions and numeric literals
A new Snowflake connector and an improved Redis connector
Java 21
Project Hummingbird, the ongoing effort to incrementally make Trino faster than ever before

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. If you think Trino is awesome, give us a 🌟 on GitHub !

Trino Fest 2023 recap

2023-06-20T00:00:00+00:00

Last week we held Trino Fest, and it kept us all so busy, we forgot to spend time chilling by the lakehouse! Great demos, amazing announcements, new plugins, and use cases reached our active audience. Thanks go to our event host and organizer Starburst, to our sponsors AWS and Alluxio, to our many well-prepared speakers, and to our great live audience. Now you get a chance to catch up on anything you missed.

In the weeks leading up to the event, we published numerous blog posts, and racked up great interest in the Trino community and beyond. Over 1100 registrations blew away our numbers from last year. More importantly, during the two half-days of the event, we had over 560 attendees watching live and participating in the busy chat.

Sessions

If you could not attend every session, or if you missed out on attending completely, then we’ve got great news for you! You still have a chance to learn from the presentations and the experience and knowledge of our speakers.

Trino for lakehouses, data oceans, and beyond presented by Martin Traverso, co-creator of Trino and CTO at Starburst.
Anomaly detection for Salesforce’s production data using Trino presented by Geeta Shankar and Tuli Nivas from Salesforce.
Zero-cost reporting presented by Jan Waś from Starburst.
CDC patterns in Apache Iceberg presented by Ryan Blue from Tabular.
Ibis: Because SQL is everywhere and so is Python presented by Phillip Cloud from Voltron Data.
AWS Athena (Trino) in the cybersecurity space presented by Anas Shakra from Artic Wolf.
Skip rocks and files: Turbocharge Trino queries with Hudi’s multi-modal indexing subsystem presented by Nadine Farah and Sagar Sumit from OneHouse.
Redis & Trino - Real-time indexed SQL queries (new connector) presented by Allen Terleto and Julien Ruaux from Redis.
Let it SNOW for Trino presented by Erik Anderson from Bloomberg and Yu Teng from ForePaaS.
DuneSQL, a query engine for blockchain data presented by Miguel Filipe and Jonas Irgens Kylling from Dune.
Data Mesh implementation using Hive views presented by Alejandro Rojas from Comcast.
Inspecting Trino on ice presented by Kevin Liu from Stripe.
Trino optimization with distributed caching on Data Lake presented by Hope Wang and Beinan Wang from Alluxio.
Starburst Galaxy: A romance of many architectures presented by Benjamin Jeter from Datto.
FugueSQL, Interoperable Python and Trino for interactive workloads presented by Kevin Kho.

Next up

This first recap is sharing all the video recordings with you all if you can’t wait. But stay tuned, because we’ll also be publishing individual blog posts and recaps for each session, and they’ll include additional useful info:

Summary of the main lessons and takeaways from the session
Slide decks for you to browse on your own
Interesting and fun quotes from the speakers and audience
Notes and impressions from the audience and event hosts
Questions and answer during the event
Links to further documentation, tutorials, and other resources

We’ll be rolling out recap posts for a few talks each week, so keep an eye out on our community chat or the website for updates.

At the same time, we are already marching ahead and planning towards our next major event in autumn. Trino Summit 2023 - here we come!

Trino Fest nears with an all-star lineup

2023-06-01T00:00:00+00:00

Trino Fest is just around the corner! We’re only two weeks away, and we’re excited to share that we’ve got an incredible speaker lineup with a wide variety of talks about all things Trino. If you’re out of the loop, we announced Trino Fest back in April as a two-day, free, virtual event. If you want to attend, see talks live, engage with our speakers in Q&As at the end of each session, you’ll need to register, so don’t delay, and…

With that said, we’re also excited to bring you a preview of our exciting speaker lineup. Read on if you’d like to learn more.

New connectors

We’ve got two talks, one from Bloomberg and ForePaaS and another from Redis, about ongoing efforts to extend Trino’s functionality to query even more data sources. Erik Anderson from Bloomberg and Yu Teng from ForePaaS will talk about their shared need for a Snowflake connector and the collaboration to get their two connectors merged and then merged into Trino. Allen Terleto and Julien Ruaux at Redis will be talking about a new, custom, and improved Redis connector for Trino, showing how you can leverage the speed of both Redis and Trino to run queries faster than ever while seamlessly integrating with data visualization frameworks.

The Python ecosystem

We’ve got talks from Fugue and Ibis, two different tools that integrate Python with SQL, and then run that SQL on underlying data sources. Both have recently added Trino support, and they’re excited to share their use cases and introduce the Trino community to the new, powerful ways you can leverage it. Trino has always been a SQL query engine, but with Fugue and Ibis, writing Python code to run queries with Trino is suddenly a reality, and analysts and data scientists may not even need to know much SQL to get the insights they’re looking for.

Data lakes

Ryan Blue, the co-founder of Iceberg and founder of Tabular, will be exploring how to best write CDC (change data capture) streams into Iceberg tables. A talk from Kevin Liu at Stripe will explore how a data engineer can monitor queries being run on Iceberg to catch performance outliers and understand usage rates. A talk from Alluxio highlights caching optimizations with Trino and data lakes. OneHouse is giving a talk about using Trino with Hudi, exploring how to get query latency down, how multi-modal indexing works in Hudi, and how Trino can utilize that indexing to execute queries at astonishing speeds. A lightning talk from Comcast will explore Hive views, and DuneSQL will be discussing its use of Trino with Delta Lake, rounding out coverage on all four of Trino’s lakehouse connectors.

And more!

We’ll hear from customers of Trino’s main commercial vendors - Datto will be discussing their use of Starburst Galaxy, and Arctic Wolf will give an overview of how AWS Athena helps them provide data to customers. Jan Was from Starburst has a lightning talk on avoiding the costs of BI tools or expensive visualization software by setting things up for free with GitHub Actions. And Walmart has a talk on finding ways to cut costs with cloud storage, rounding out our expansive lineup.

Does any of that sound exciting? Go sign up to attend Trino Fest 2023, and we look forward to seeing you there!

48: What is Trino?

2023-05-31T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Technical Content at Starburst (@simpligility)

Releases 417-418

Official highlights from Martin Traverso:

Trino 417

Faster UNION ALL queries.
Faster processing of Parquet data in Hudi, Iceberg, Hive, and Delta Lake connectors.
Faster reads of nested row fields in Delta Lake connector.

Trino 418

Add support for EXECUTE IMMEDIATE.
Add the table_changes function in Delta Lake connector.
Faster joins on partition columns in Delta Lake, Hive, Hudi, and Iceberg connectors.
Support for fault-tolerant execution in the Oracle connector.

Question of the episode: What is Trino?

We’ve put out nearly 50 Trino Community Broadcast episodes, but we haven’t yet done the simplest, most obvious topic of them all - an exploration of what Trino is, how Trino works, and how you can run it. This week, we’re taking a step back and doing a broader overview of those things, because the world needs to know… what is Trino?

If you check the Trino documentation, it starts with a definition of what Trino isn’t. But we’ll start with what Trino is: a distributed SQL query engine written in Java. If you have a SQL query, Trino can process and run it on an extremely wide variety of data sources and return a result to you that you’d expect from that SQL query. It can run queries on traditional relational databases like Oracle, MySQL, and PostgreSQL; it works on data likes like Hive, Iceberg, Delta Lake, and Hudi; and it runs on no-SQL databases like Cassandra and MongoDB. You give Trino a query, Trino gives you results. And the best part is that it doesn’t just work, it works blazing fast.

The key thing to point out is that Trino does not store data, and it is not a database on its own. It is a query engine, designed to sit on top of databases and provide an ANSI-standard SQL interface to query whatever you’re storing your data in. In order to use Trino, you need to start by having data stored somewhere else. Of course, Trino can write data to those underlying sources with the same SQL syntax, so for the end user, it can be an all-in-one interface to those underlying data sources, an abstraction that saves users from needing to understand the differences between data being stored in Iceberg and data being stored in Oracle.

How does it work?

Trino uses a distributed architecture, with a singular coordinator node that schedules and orchestrates the workload, as well as many worker nodes that carries out tasks and processes data.

Concept of the episode: How do you run Trino?

The better question might be “how can’t you run Trino?” As the project has matured, it’s been added to various third-party tools and integrated into different apps that help make it easier to run than ever before. We have some exciting news to share on that front soon, but for now, the biggest ways to run Trino include:

Tarball

You can directly download the Trino server, manually configure it, and start it up like any other program. Clients can connect to the server from there, utilizing the web interface or the CLI to run queries. This is the most manual way to set up Trino, but it works, and it doesn’t depend on anything else. Our docs go into a ton of detail on this process.

Docker

Trino provides a Docker image that can be run through the Docker software. You start by downloading and installing Docker, create a container from the Trino image, and then you can run that image to immediately get Trino up and running. No manual configuration needed, no messing around with creating directories or files, it just works. It’s perhaps the simplest way to get Trino off the ground, and recommended for anyone trying to run it independently just to fiddle around with it. As always, you can refer to the docs for more information.

Kubernetes and Helm

Trino provides a Helm chart for use with Kubernetes, so after setting up Kubernetes, kubectl, and Helm, you can install Trino on your Kubernetes cluster with Helm. It comes with the same pre-configured image as Docker, so there’s no need to manually set that up, but in order to run queries, you’ll also need to set up a tunnel between the coordinator pod within Kubernetes and whatever machine you want to run those queries on. If this is the right setup for you, you probably already know that, and you don’t need us to go into more detail. More info is in the Trino docs.

Trino clients

On the most basic side of things, Trino provides a command-line interface and a web UI. If you want something more robust, a couple open source clients have been made in the community - one written for Python and one written in Go. There’s a couple other Python clients that will be even easier to run coming soon, and we’ll be hearing from them at Trino Fest in just two weeks.

Or…

On the not-so-free side of things, Starburst Galaxy and AWS Athena offer Trino as a cloud service, which can make life even easier.

Concept of the episode: How can you contribute to Trino?

We’ve got a page on the website dedicated to the contribution process, though we’d like to welcome anyone and everyone listening to take a crack at contributing to Trino if it’s something you’re interested in. Open source projects can always use more help, and we’d like to see community contributions whenever. From that process page, the steps are:

Sign the CLA.
Make sure your contribution is something that Trino wants/needs.
Implement your change.
Open a pull request.
Request and wait for a review.
Address review comments.
Wait for it to be merged.
Wait for the next release, and then… your code change is in Trino!

PR of the episode: #11701: Support Nessie Catalog in Iceberg connector

Nessie is a transactional catalog designed for use with data lakes like Iceberg and Delta Lake. Its key selling point is git-like version control, making it easy to view history, roll back, and see who made what adjustments when. PR #11701 allows Trino’s Iceberg connector to query Nessie, adding yet another tool and opportunity for query federation to Trino’s belt.

And though we hate to say it, Nessie might just be the only other project in the world with a mascot that can compete with Commander Bun Bun.

Trino events

Coming up in just two weeks, Trino Fest is a two-day event that will feature talks from a wide range of speakers surrounding the Trino ecosystem. As already hinted at, we’ll be hearing from a couple new Python clients, from Trino users sharing tips and tricks to maximize the utility of the software, and from community contributors adding exciting new features and extensions to Trino.

Register to attend if you’re interested and want to tune in to an awesome speaker lineup! It’s virtual and completely free to attend, so all you’ve got to do is sign up.

If you have an event that is related to Trino, let us know so we can add it to the Trino events calendar.

Rounding out

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Trino at Open Source Summit North America 2023

2023-05-15T00:00:00+00:00

Last week, I had the pleasure to attend Open Source Summit North America 2023 in Vancouver. A quick hop across the Strait of Georgia got me right into the event and into the midst of my peers of open source developers, advocates, and enthusiasts.

A highlight of the event for me was catching up with many existing and new friends from the open source communities. It was inspiring to learn details about the success of open source projects, including Opensearch, RISC-V, the British Columbia government DevHub project, NASA open source and open data projects, and many others.

In my interview with John Furrier and Rob Strechay for SiliconANGLE theCUBE, I was able to share more information about Trino, query engines, lakehouses, and Starburst. We also talked about the benefits of using Trino for different use cases, how data continues to be crucial, and how it is even important thanks to the new wave of large language models.

SiliconANGLE theCUBE features more interview coverage from the summit. and The Linux Foundation makes keynote and session videos as well as presentation decks available.

My special thanks goes to Starburst for sending me to represent the Trino community at the summit. I also really appreciate the help for organizing Trino Fest. The speaker proposals are all in, and the free, virtual event is promising to be a great showcase of Trino, modern lakehouse platforms and tools from the community of users, contributors and vendors, and our increased adoption for a wide range of use cases.

Join us in June for the event, you don’t want to miss some of the announcements and demos.

Manfred

47: Meet the new Trino maintainers

2023-05-05T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Technical Content at Starburst (@simpligility)

Guests

James Petty, Senior Software Engineer at AWS
Also Manfred. Kind of.

Releases 411-416

Official highlights from Martin Traverso:

Trino 411

migrate procedure to convert a Hive table to Iceberg.
Join and LIKE pushdown in Ignite.
Support for DELETE in Ignite.
procedure table function for executing stored procedures in SQL Server.
Faster join queries over Hive bucketed tables.
Faster planning for tables with many columns in Hive.

Trino 412

New exclude_columns table function.
Support for ADD COLUMN in Ignite.
Support for table comments in PostgreSQL connector.
Faster sum(DISTINCT ...) queries for various connectors.

Trino 413

Support for MERGE in the Phoenix connector.
Support for table comments in the Oracle connector.
Improved performance of queries involving window functions or MATCH_RECOGNIZE.

Trino 414

Experimental support for tracing using OpenTelemetry.
Support for Databricks 12.2 LTS in Delta Lake connector.
Support for fault-tolerant execution in Redshift connector.
sequence table function.

Trino 415 and Trino 416

A whole lot of minor performance improvements.

Introducing the two new Trino maintainers

Manfred should hardly need an introduction to Trino Community Broadcast viewers, as he’s been around and hosting episodes from the beginning, and authored Trino: The Definitive Guide. In the background, he’s also been quietly working on docs, the website, and a wide variety of other initiatives in the Trino community.

James should also be familiar to anyone who has contributed on Trino. Iconically rocking a GitHub avatar of the face of Bob Ross, it’s hard to miss when he shows up on a pull request. And working on Trino as part of AWS Athena, he’s been a major engineering contributor for the last several years, with 262 commits under his belt and more on the way.

What is a maintainer?

If you don’t go clicking around on the Trino website fanatically trying to find everything you can possibly read about the project, there’s a chance you’ve never bumped into our roles page, which highlights how Trino is governed. To quote that page:

In Trino, maintainer is an active role. A maintainer is responsible for merging code only after ensuring it has been reviewed thoroughly and aligns with the Trino vision and guidelines. In addition to merging code, a maintainer actively participates in discussions and reviews. Being a maintainer does not grant additional rights in the project to make changes, set direction, or anything else that does not align with the direction of the project. Instead, a maintainer is expected to bring these to the project participants as needed to gain consensus. The maintainer role is for an individual, so if a maintainer changes employers, the role is retained. However, if a maintainer is no longer actively involved in the project, their maintainer status will be reviewed.

Or, in normal speech, a maintainer is a trusted individual with merge rights. But with great power comes great responsibility, higher standards, and an expectation to be an active steward of the Trino project. It’s not easy to become a maintainer - prior to Manfred and James, it had been over a year since the most recent maintainer was appointed. The high bar of activity, quality, and attitude is not trivial by any stretch, and so we’re excited to talk to them about the role, how they got here, and what they’re looking forward to for the future of Trino.

The path to becoming a maintainer

Manfred

When did you first start working on Trino?
What’s your proudest contribution to the project?
Have a funny story you’ve wanted to share to the world?

James

When did you first start working on Trino?
What’s your proudest contribution to the project?
Why the Bob Ross avatar?

PR of the episode: 16753: Improve TopN row number / rank performance

We normally focus on flashy and user-facing PRs for the PR of the episode, but this week, courtesy of our guest James, we’re going to highlight something that better represents the more routine work that’s going on in Trino all the time: a performance improvement.

Trino events

Trino Fest is coming up in just a couple months. Register to attend or sign up to submit a talk if you have something to share!

If you have an event that is related to Trino, let us know so we can add it to the Trino events calendar. Kevin Haley’s Getting to Know Trino in Boston was a great success, and we’d love to hear from other Trino community members who’d be interested in hosting other events!

Rounding out

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Refreshing at the lakehouse summer camp

2023-05-03T00:00:00+00:00

Summer is just around the corner, and we are busy getting ready for Trino Fest 2023. Everything is ramping up. Early birds are starting to register, and so should you. Our Trino Fest theme song is available for your listening pleasure, and we are reviewing speaker submissions. The festival is promising to be another great event to learn about lakehouse use cases with Trino, but we are also featuring some great presentations for querying data with Trino. And of course, we are still looking for more presenters, so don’t hesitate and submit your proposal.

Before you dive into the technical details of our upcoming conference, lean back and listen to our theme song. Hopefully you are feeling the summer vibe coming your way already.

Our event host Starburst is again helping us ensure that Trino Fest is a venue for Trino beginners and experts to meet, exchange ideas, and learn from each other. One of the Starburst engineers, Jan Waś, is scheduled to present about his amazingly low-effort setup to use Trino for data analysis and report generation.

Getting closer to the theme of the event “Lakehouse summer camp”, we are planning to have sessions about Iceberg, Delta Lake, and Hudi usage with Trino. Learn about the latest developments from these projects and practical tips and tricks from the user community.

In the keynote, Martin Traverso will speak about the many new features that arrived in Trino since Trino Summit last year. This includes the new Apache Ignite connector we talked about in the Trino Community Broadcast episode 46. At Trino Fest we are going to share some more exciting news about new connectors and integrations for Trino. Specifically on the client tooling side you can expect some great demos and news from the Python community.

So what are you waiting for? It’s time to register for the event. And if you think you also want to share your knowledge and usage of Trino, submit a speaker proposal.

In either case, as your hosts and guides through the two half days, we look forward to have you at the event.

Manfred and Cole

Just the right time date predicates with Iceberg

2023-04-11T00:00:00+00:00

In the data lake world, data partitioning is a technique that is critical to the performance of read operations. In order to avoid scanning large amounts of data accidentally, and also to limit the number of partitions that are being processed by a query, a query engine must push down constant expressions when filtering partitions.

Partitions in an Iceberg table tend to be fairly large, containing up to tens or even hundreds of data files. It is therefore crucial to be able to skip irrelevant partitions while scanning a table in order to ensure high performance query processing speed. When a table is created in a data lake, its partitioning scheme constitutes a de-facto index, speeding up queries against it by pruning out irrelevant partitions from the scan operation.

Date and time are natural and universal partitioning candidates. Common partition patterns revolve around month, day, hour. One exciting feature of the Iceberg table format is its hidden partitioning. Iceberg uses handy transforms such as year, month, day, hour to deal with the complexities of mapping a raw timestamp value to an actual partition value in a manner that is transparent to the user.

Let’s look at a typical example of an Iceberg table containing log events which are partitioned by day:

CREATE TABLE logs (
    event_time timestamp(6) with time zone,
    level varchar,
    message varchar)
WITH (partitioning=ARRAY['day(event_time)'])

When dealing with logs, it often happens that we want to know what happened today or within the last few days:

SELECT *
FROM logs
WHERE
  event_time >= CURRENT_DATE

SELECT *
FROM logs
WHERE
  event_time >= CURRENT_DATE - INTERVAL '7' DAY

Constant folding

Trino uses the constant folding optimization technique for dealing with these types of queries by internally rewriting the filter expression as a comparison predicate against a constant evaluated before executing the query in order to avoid recalculating the same expression for each row scanned:

Predicate pushdown

Another common query scenario for log data is to query for a specific date in the past. A seasoned SQL user, being aware of the underlying data type of the partitioning column, would likely specify the date to be queried explicitly as two timestamp constant filter expressions:

SELECT *
FROM logs
WHERE
  event_time >= TIMESTAMP '2022-01-20 00:00:00.000000 UTC'
  AND event_time < TIMESTAMP '2022-01-21 00:00:00.000000 UTC'

A different flavor of the above-mentioned query would be to use the BETWEEN range operator:

SELECT *
FROM logs
WHERE
  event_time BETWEEN TIMESTAMP '2022-01-20 00:00:00.000000 UTC'
  AND TIMESTAMP '2022-01-20 23:59:59.999999 UTC'

Users can focus on writing queries that are concise and readable by other human readers, and leave the eventual grunt optimization work to the query engine.

A succinct way of querying the logs for a specific day would be to cast the timestamp field value to its corresponding date value and compare it with the day containing the relevant logs:

SELECT *
FROM logs
WHERE
  CAST(event_time AS date) = DATE '2022-01-20'

In this case, Trino unwraps the initial temporal filter to a filter that tests whether the column event_time is within the constant timestamp range corresponding to the date used in the initial filter, which is equivalent to the most efficient of the explicit filters mentioned above.

A different approach of querying the log data for a specific date is to use the date_trunc function:

SELECT *
FROM logs
WHERE
  date_trunc('day', event_time) = DATE '2022-01-20'

Trino again replaces the initial temporal filter to a filter testing whether the column event_time is within the constant timestamp range corresponding to the date used in the initial filter.

A slightly different use case is querying the log data to see whether an exotic error type is recorded in the logs during previous months of the current year by making use of the year() function:

SELECT *
FROM logs
WHERE
  year(event_time) = 2023

This time, Trino rewrites the temporal filter applied on the column event_time with a BETWEEN filter for the unfolded date range corresponding to the entire span of the specified year:

event_time BETWEEN TIMESTAMP '2023-01-01 00:00:00.000000 UTC'
AND '2023-12-31 23:59:59.999999'

Without predicate pushdown, the filtering is done by Trino on each tuple, after scanning the entire content of the table:

The optimization techniques employed by Trino to speed up the above mentioned types of queries all involve replacing the provided filter with an equivalent filter expression. Constant replacement optimizations compare the table column against a constant or a constant range with the purpose of literally pushing the filter down to Iceberg.

As a consequence, the partition pruning happens on the metadata layer of the table instead of filtering on top of the data itself, dramatically reducing the amount of actual data files scanned:

As described in the Iceberg Table Spec, for any snapshot of the table, Iceberg tracks each individual data file and the partition to which it belongs. Iceberg uses a hierarchical index in its metadata layer by storing lower_bounds and upper_bounds for:

each partition in the manifest list files
each data file in the manifest files

Desugaring seemingly variable filter expressions to comparison predicates involving only columns and constants or constant ranges pays off. Not only does it prune out partitions, but it also skips portions of the data file (for example a Apache Parquet row group) or even the data file altogether in certain circumstances. For instance, pruning and skipping can occur if the queried range value does not overlap with the indexed Iceberg metadata range of values contained in the file, in case of a non-partition column filter.

To put things in perspective, the optimization techniques presented in this article, which have been already integrated in Trino, can cause the execution of queries containing temporal filters with selective filters to complete in seconds compared (depending on the size of the table scanned) to hours.

A reader keen to experiment and discover whether the previously mentioned optimization techniques are actually effective can use EXPLAIN to examine the output of the query planning stage. If the temporal predicate employed in the query is being pushed down, the scan operation should definitely have fewer rows than the count of all rows contained in the table.

The queries in this post showcase just a tiny fraction of the myriad of techniques which can be employed to perform queries on date and time columns. Trino continuously strives to streamline its users’ workflows by providing the results of queries as fast as possible.

Polish edition of Trino: The Definitive Guide

2023-04-06T00:00:00+00:00

At this stage Trino is used all around the globe as we know from the community chat and our speakers at Trino Summit 2022. One large community of Trino contributors and maintainers, many employed by Starburst, is located in Poland. Poland also has a very active participation of developers and users in the Java and Big Data communities.

Today, we are happy to announce that a translation of the book Trino: The Definitive Guide to Polish is now available for the communities in Poland and beyond. We invite you all to get your own copy:

Trino Profesjonalny Przewodnik

Our thanks for making this happen go out the teams at O’Reilly and Promise. We hope many readers will benefit from the translated edition.

Manfred, Martin, and Matt

Trino and the BDFL model: a renewed focus

2023-04-06T00:00:00+00:00

For those who are paying close attention, you may notice updates to a few pages across the Trino website with a renewed focus on leadership roles in Trino. This is part of an effort to re-focus and make the operating model more transparent both for contributors and for end users. While this is not a functional change, this does involve clarifying our roles following the BDFL (benevolent dictator for life) model.

Trino has been a popular open source project used by many companies and organizations since its inception in 2012. As a founder-led project, it has consistently operated under a BDFL model, though not necessarily by name. The model is used to describe the persons who can make the final decisions for the direction and development of the project. Many successful open-source projects, including Linux, Python, Scala, Ruby, and Rust, operate using a BDFL model.

Why the BDFL model?

One of the key benefits of the BDFL model is that it allows for a clear decision-making process. When a project has a large number of contributors, it can be difficult to reach consensus on certain issues. The BDFL can step in and make the final decision, which can be particularly helpful in situations where time is of the essence. Additionally, having a BDFL can provide a sense of stability and direction for the project.

It’s important to emphasize that the use of the BDFL model is not a new development in Trino’s history. We (Dain, David and Martin) have acted in this role since the beginning.

Why now?

Why is there a renewed focus on the BDFL model now? Trino has reached a level of maturity and a community size that has made increasingly important to have clear leadership and decision-making processes. By making the BDFL model more explicit, we can ensure that the project remains focused and continues to deliver value to its users.

More info

You can check out the following pages for additional information:

Lakehouse summer camp at Trino Fest 2023

2023-04-05T00:00:00+00:00

Get ready to kick off your summer with Commander Bun Bun at Trino Fest 2023! This year’s event is going virtual and will take place over two days, the 14th and 15th of June. The focus of the event will be on Trino as a data lakehouse query engine, with discussions on how new features and the ecosystem around Trino can support better data lakehouse management.

Trino Fest 2023 is the new annual summer event dedicated to all things Trino. Building on the success of last year’s Cinco de Trino, we’re excited to bring the community together once again to explore the latest trends and innovations in Trino and data lakehouse management. With a focus on education, community collaboration, and inspiration, Trino Fest 2023 will be a valuable experience for anyone interested in improving their data and analytics platform. We hope to see you there as attendee, speaker, or sponsor! Read below to find out how to sign up.

Call for speakers

Call for speakers is now open, and we invite you to submit a talk if you have an interesting perspective on Trino. We’re particularly interested in talks related to:

Data lake and lakehouse use cases, architectures and experiences
Apache Iceberg
Delta Lake
Hudi
Industry use cases for Trino
Query federation
Data governance with Trino
SQL with Trino
ETL/ELT/batch query processing
Other tools and integrations in the Trino ecosystem

The call for speakers closes on May 19th, so be sure to submit your talk soon!

What’s new this year?

Aside from the new title, this year’s Trino Fest will differ from last years, short conference in a few ways. We’re featuring more talks from Trino practitioners, the event will run over two shorter days to avoid the death march of talks, and there will be more summer, lakehouse, and camping puns. Of course, there will be continued use of the Trinoritaville song . Whether you’re just getting started with Trino or you’re a seasoned pro, there will be something for everyone at Trino Fest.

What is Trino Fest versus Trino Summit

Trino was built from the beginning to query on Hive data, so Trino moving on to support a data lakehouse is simply the evolution from its flagship use case. Trino Fest covers the latest features and improvements to Trino that make it an even better choice for data lakehouse management. You’ll hear from speakers who are using Trino in innovative ways, and who can provide valuable insights and tips for managing your own data lakehouse. Going with the chill summer theme, there will be plenty of time to have fun and relax too!

If you’re interested in sponsoring Trino Fest 2023, we’d love to hear from you! Sponsoring the event is a great way to get your brand in front of a highly engaged audience of Trino enthusiasts and data professionals. Your support will help make the event a success, and in return, we’ll offer a range of benefits, such as logo placement on our website, social media shoutouts, and more. To learn more about sponsoring Trino Fest 2023, reach out to events@starburst.io.

See you there

Mark your calendar to save the 14th and 15th of June for Trino Fest 2023: Lakehouse Summer Camp. Get ready for a two-day event that will get you diving into the deep end of the data lake. Registration is open now, and the call for speakers closes on April 28th, so be sure to sign up and submit your talk soon!

Happy querying!

46: Trino heats up with Ignite

2023-03-15T00:00:00+00:00

Hosts

Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Information Engineering at Starburst (@simpligility)

Guests

Jason, Senior Data Engineer at Shopee.

Releases 408-410

Official highlights from Martin Traverso:

Trino 408

New Apache Ignite connector!
Add support for writing decimal types to BigQuery.
Improve performance when reading structural types from Parquet files in Delta Lake.

Trino 409

Support for nested fields in DROP COLUMN.
Support for sorted tables in Iceberg.
Support for time type in Cassandra.
Faster aggregations containing DISTINCT.
Faster LIKE with dynamic patterns.

Trino 410

Support for the sheet table function in Google Sheets.
Better file pruning in Iceberg.

Introducing the Ignite connector to Trino

The Trino Ignite connector was added a couple releases ago in Trino 408. It’s not every day that we add a new connector to Trino, and so the topic of today’s episode is exploring the connector, what it does, and what its use cases are. After that, we are going to talk about the process of coming in as an outside engineer and contributing an entirely new connector to Trino.

What is Ignite?

Apache Ignite is an in-memory distributed database, comparable to others you may be familiar with like Redis and SingleStore. If you’re not familiar with them or with in-memory computing, the gist is that by focusing on using RAM instead of disk storage, you can create a database system which is much faster - the Ignite website advertises 10-1000x improvements. Of course, this is more expensive, too, so it thrives in settings where performance is critical.

With an initial release 7 years ago, Ignite is still a relative newcomer among in-memory databases, coming with modern bells and whistles that has it positioned to become a successor to those other, comparable databases mentioned above. It also has some key functionality that sets it apart, including a fully-distributed architecture which can use disk storage, allowing it to scale horizontally.

Contributing the Ignite connector

The Trino community and developers try their best to be active reviewers, collaborators, and participants on pull requests coming in from outside contributors. Massive contributions like the Ignite connector can take a lot of round trips, back-and-forth discussion, and work from both the contributor and the project’s maintainers to get it into a state where it is ready to merge and go live for users to try out.

To give you an idea, the pull request (PR) to contribute Ignite was opened in mid-June, 2021. It received immediate feedback from a couple maintainers, went through a few round trips with amendments, re-reviews, more edits, and then other reviews. But in an open source environment, each round trip can tend to take longer and longer. Progress stalled in November 2021, and neither Jason nor the maintainers poked the Ignite PR for nearly a year. In October 2022, as part of Trino DevRel’s roundup of stale and out-of-date pull requests, we bumped back into the work that Jason had done. The wheels began to turn again, starting slow but picking up the pace, until it returned to full and active development, with several maintainers checking in frequently until the connector was ready to go. But that’s the story from an observer, and we’ve got Jason here to go into more detail.

Questions for Jason

How was the Trino review process?
Were there any major lessons you picked up along the way?
What tips would you give to someone else looking to add something into Trino?

PR of the episode: #13493: Add support for `migrate` procedure in Iceberg

If you’ve been in the data space for a while, you may know that there’s a bit of a prevailing current in migrating from Hive to Iceberg. Out with the old, in with the new, and in with the performance gains. Yuya Ebihara, one of the Trino maintainers, has added a table procedure to Trino’s Iceberg connector to make that process much, much simpler. Rather than a slow, manual, and arduous process, if you have a Hive table stored in a file format supported by Iceberg, it’s now as simple as calling the migrate table procedure and letting it run. The procedure copies the schema, partitioning, properties, and location of the source table, then streams in all the data files from the source table to re-build it all in the Iceberg format. Neat, right?

More about Ignite

Trino events

If you have an event that is related to Trino, let us know so we can add it to the Trino events calendar.

Kevin Haley will be hosting an in-person event, Getting to Know Trino, in Boston, Massachusetts on Wednesday, April 5. You need to register in advance, so if you’re in the Boston area and interested in attending, go sign up!

Rounding out

Check out the in-person and virtual Trino Meetup groups.

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

45: Trino swimming with the DolphinScheduler

2023-02-23T00:00:00+00:00

Hosts

Brian Olsen, Developer Advocate at Starburst (@bitsondatadev)
Cole Bowden, Developer Advocate at Starburst

Guests

David Zollo, Apache DolphinScheduler PMC Chair
Jay Chung, Apache DolphinScheduler PMC Member
Niko Zeng, Apache DolphinScheduler Community Manager
William Guo, Apache Software Foundation Member

Recap of Trino in 2022

Highlights from the blog post The rabbit reflects on Trino in 2022 touch upon various aspects.

Release 407

Official highlights from Martin Traverso:

Trino 407

Improved performance for highly selective queries.
Improved performance when reading numeric, string and timestamp values from Parquet files.
New query table function for full query pass-through in Cassandra.
New unregister_table procedure in Delta Lake and Iceberg.
Support for writing to the change data feed in Delta Lake.

Cole’s comments:

For our contributors, we added a new action to track and ping the developer relations team on stale pull requests to further prompt maintainers to take a look. This doesn’t have any immediate impact on end users, but it’ll improve the development and contribution process.
A Kerberos fix for the Kudu connector should make using it much less of a headache on long-running Trino instances.
There were some really sophisticated performance improvements that came from shifting default config values and adding some new ones, all of which took a whole lot of testing.

More detailed information is available in the release notes for Trino 407.

What is workflow orchestration?

Workflow orchestration refers to the process of coordinating and automating complex sequence of operations known as workflows consisting of multiple interdependent tasks. This involves designing and defining the workflow, scheduling and executing the tasks, monitoring the progress and outcomes, and handling any errors or exceptions that may arise. In the context of Trino, the tasks are typically the processing of SQL queries on one or more Trino cluster and other related systems to create a data pipeline or similar automation.

Why do we need a workflow orchestration tool for building a data lake?

Building a data lake can involve many complex and interdependent data processing tasks, which can be challenging to manage and scale without a workflow orchestration tool. Sometimes we can consider tools like Trino at the center of the universe, and perhaps it would be easier to schedule SQL queries with a much simpler tool. Most companies, however, require a larger variety of tasks to build a data lake that interoperate on more than just running SQL on Trino. Even if you primarily run Trino SQL scripts to run these jobs, it is better to have an orchestration tool instead of managing all processes manually.

What is Apache DolphinScheduler?

Apache DolphinScheduler is an open-source, distributed workflow scheduling platform designed to manage and execute batch jobs, data pipelines, and ETL processes. DolphinScheduler enables users to create and manage consecutive jobs run easily, including support for different types of tasks, such as SQL statements, shell scripts, Spark jobs, Kubernetes deployments, and many others. In short, it’s a powerful and user-friendly workflow orchestration platform that enables users to automate and manage their complex data processing tasks.

Read this blog on Trino and Apache DolphinScheduler to find out more.

Does DolphinScheduler have any computing engine or storage layer?

DolphinScheduler is a powerful tool for managing and orchestrating data processing workflows across a range of computing engines and storage systems, but it does not provide its own computing or storage capabilities.

What are the differences to other workflow orchestration systems?

Airflow is the incumbent de facto workload orchestrator. Many data engineers currently rely on Airflow to handle their workflow orchestration today so it helps to understand DolphinScheduler’s benefits in relation to Airflow. Both Dolphin Scheduler and Airflow are designed to be scalable and highly available to support large-scale distributed environments.

Airflow supports a wide range of third-party integrations, including popular data processing frameworks such as Trino, Spark, and Flink, as well as with cloud services such as AWS and Google Cloud. Dolphin Scheduler supports a similar range of data processing frameworks and tools. This makes both platforms suitable for managing diverse data processing tasks.

DolphinScheduler project believes that future data governance belongs to data engineers and consumers alike and should not be centralized to a single team. Product-focused engineering teams should have access to data and be able to orchestrate workflows without the need for extensive coding skills. DolphinScheduler uses a drag and drop web UI to create and manages workflows while also providing programmatic access using tools like Python SDK and Open API.

A positive feature of DolphinScheduler supporting users outside the data team through a UI is that it offers robust security features. This includes authentication, authorization, and data encryption, to ensure that users’ data and workflows are protected.

DolphinScheduler has relatively limited documentation and community support since they are a newer project, but they are working hard to improve the developer experience and documentation.

How does DolphinScheduler deal with failures?

Failure is an inevitable aspect of data workflow orchestration. The merits of many of these orchestration tools come from how well they aid users in responding to failures by monitoring health and notifying users when things go wrong.

Does DolphinScheduler have an alarm mechanism itself?

Apache DolphinScheduler supports user notifications as part of a workflow. This mechanism is designed to help users monitor and manage their workflows more effectively and respond quickly to any issues.

These alerts can be configured to notify users via email, SMS, or other communication channels, and can include details such as the name of the workflow, the name of the failed task, and the error message or stack trace associated with the failure.

In addition to these configurable alerts, DolphinScheduler provides a dashboard for monitoring the status and progress of workflows and tasks. It includes real-time updates and visualizations of workflow performance and status. The dashboard helps users quickly identify any issues or bottlenecks in their workflows and take corrective action as needed.

Demo of the episode: Creating a simple Trino workflow in DolphinScheduler

For this episodes’ demo, we look at creating a workflow consisting of a Trino query execution managed by a workflow in DolphinScheduler.

Run the demo by following the steps listed.

PR of the episode: Improve performance of Parquet files

While we’re on the topic of data lakes, we had several performance for Parquet files in release 407 from contributor and maintainer, @raunaqmorarka. This change includes an improvement on performance of reading Parquet files for decimal types, numeric types, string types, timestamp and boolean types.

While Trino has historically had better performance for the ORC format, the Parquet file type has grown drastically in popularity and so this is one of many examples of the improving support around Parquet files for data lakes.

Find out more about DolphinScheduler

Trino events

If you have an event that is related to Trino, let us know so we can add it to the Trino events calendar.

Rounding out

Check out the in-person and virtual Trino Meetup groups.

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

44: Seeing clearly with Metabase

2023-01-26T00:00:00+00:00

Hosts

Manfred Moser, Director of Information Engineering at Starburst (@simpligility)

Guests

Luis Paolini, Success Engineer at Metabase
Andrew DiBiasio, Software Engineer at Starburst
Piotr Leniartek, Product Manager at Starburst

Recap of Trino in 2022

Highlights from the blog post The rabbit reflects on Trino in 2022 touch upon various aspects.

Lots of growth for the community celebrating 10 years Trino
Trino Summit, Cinco de Trino, Trino Community Broadcast, and more content
Trino: The Definitive Guide second edition

Lots of Trino releases and new features:

MERGE support
JSON functions
Table functions
Fault-tolerant execution
Upgrade to Java 17
New Delta Lake, Hudi, and MariaDB connectors
Tons and tons of performance improvements

Releases 404 to 406

Official highlights from Martin Traverso:

Trino 404 not found

Trino 405

Support for ALTER COLUMN ... SET DATA TYPE statement.
Support for Apache Arrow when reading from BigQuery.
Support for views in the Delta Lake connector.
Support for the Iceberg REST catalog.
Support for Protobuf encoding in the Kafka connector.
Support for fault-tolerant execution in the MongoDB connector.
Support for DELETE and query pushdown in the Redshift connector.
Performance improvements when reading Parquet data.

Trino 406

Support for JDBC catalog in the Iceberg connector.
Support for fault-tolerant execution in the BigQuery connector.
Support for exchange spooling on HDFS.
Support for CHECK constraints with INSERT statements.
Improved performance for Parquet files with the Delta Lake, Hive, Hudi and Iceberg connectors.

More detailed information is available in the release notes for Trino 405, and Trino 406.

We also shipped trino-python-client 0.321.0 with the following improvements:

Add support for SQLAlchemy 2.0.
Add support for varbinary query parameters.
Add support for variable precision datetime types.

What is Metabase

Metabase is the easy, open-source BI tool with the friendly UX and integrated tooling to let your company explore data on their own. Everyone in your company can ask questions and learn from your data.

Running Metabase locally is easy. Try with a container runtime and the 300 MB image:

docker run -it -p 3000:3000 metabase/metabase

Or use a JVM and the 260MB single JAR file:

wget https://downloads.metabase.com/latest/metabase.jar
java -jar metabase.jar

You can go zero to dashboard in under 6 minutes - learn more from the demo.

Core features and advantages of Metabase include the following:

Visual query build
Dashboards
Models

Metabase is a web-based application that you run on a server. You can make it available to multiple users. It uses SQL to create queries, reports, visualizations, dashboards, and more.

You can host it yourself locally, run it in your own datacenter or use the cloud:

Metabase is an open source project licensed under the GNU Affero General Public License (AGPL) license. It is written in Clojure and therefore runs on the Java virtual machine.

Following is a high-level architecture diagram:

Metabase is also the name of the company, founded in 2014. It provides an expanded version under a commercial license, a SaaS version of the application, support and others services, and manages the open source project.

Metabase is running in more than 50K instances around the world, including over 2K using the SaaS version.

History of Metabase and Trino

Metabase was first released in 2015 as version 0.9. Since the initial release it has grown to be a well known and widely used BI application.

A Presto driver was created in 2018. It directly integrated with the client REST API. With the rename of Presto to Trino, Manfred created a PR that replicates this for Trino to ensure continued support for the community. In the discussion it was decided that it would be better to use the Trino JDBC driver, similar to how other drivers for Metabase work.

After some more demand from the user and customer community, Starburst and Metabase established a collaboration, and started implementation of the current driver. Piotr led the charge, Andrew buckled down and learned Clojure, and together a first release was created and tested. The driver is now provided as an open source project managed by Starburst.

Core advantages of using Metabase with Trino

With Metabase and the driver for Trino, Trino users have access to a well established and proven open source BI tool. It is suitable for internal usage in any organization, and users can upgrade to commercial version for more demanding deployments and use cases.

The combination of Trino and Metabase also provides a number of unique benefits for Metabase users that are not available with typical drivers for systems. These are typically databases that support SQL, and are limited to the specific database.

With Trino and the driver, you have access to the following unique features:

Metabase users can connect to databases that do no yet have a Metabase driver, but are supported by Trino
Trino also enables using SQL for system that don’t support SQL such as MongoDB or Elasticsearch, and therefore allows Metabase usage with these systems.
With Trino you can join data from different catalogs in the same SQL query. This also applies to Metabase reports or visualizations.

Can I join multiple engines? Yes
Can I join SQL and no-SQL engines? YES!

ElasticSearch, Google Spreadsheets, Cassandra, Redis and others are all accessible with Trino. Specifically this also opens up querying object storage data lakes on S3 and other systems with the Hive, Delta Lake, Iceberg, and Hudi connectors - all from Metabase.

Metabase also includes support for access control for any connected datasource, all the way to row level security. This includes Trino and can be used to secure Trino access through Metabase to a large group of your Trino users, such as all BI users. It can even be used to add row level security for No SQL databases.

Demo of the episode: Metabase and Trino

Luis shows us the demo from his repository at https://github.com/paoliniluis/metabase-trino. Watch our video to see it and action, and check out the instructions in the repository to try yourself.

Real world use cases at Meesho

Meesho is India’s fastest growing internet commerce company. They provide a large retail website and support small business entrepreneurs with their platform.

Meesho relies on the Trino, Metabase and the Trino Metabase driver from Starburst for their data platform.

Piotr and Luis share more details:

Meesho needs the ability to query the lake, with high speed, concurrency and scale. It was not possible before Trino, in the form of Starburst Enterprise, and Metabase were introduced.
Meesho observes more than 13 million queries from Metabase in 10 months.
Meesho uses Metabase to add security and governance for the data assets.
A next planned step is to integrate with Metabase Model Caching to improve user experience even more.

PR of the episode

Let’s explore the code a bit, instead of focussing on a specific PR. The whole driver codebase is open source at https://github.com/starburstdata/metabase-driver.

As mentioned earlier the whole driver is written in Clojure, and Andrew tells us more about his experience writing the driver and working with the two systems.

We also talk about a recent community PR for datetime functions and the ongoing work to support model caching.

Datanova and other Trino events

We invite you all to join us for the free, virtual conference Datanova from Starburst. Trino and related tools and approaches are touched upon in many presentations and discussion.

If you have an event that is related to Trino, let us know so we can add it to the Trino events calendar.

Conclusion

Metabase and Trino are a great combination of tools. Together they unlock use cases that are difficult or impossible to implement with other tools. Give it a try!

Rounding out

Check out the in-person and virtual Trino Meetup groups.

If you want to learn more about Trino, get the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

The rabbit reflects on Trino in 2022

2023-01-10T00:00:00+00:00

It’s that time of the year where everyone gives excessively broad or niche predictions about the finance market, venture capital, or even the data industry. And we are now bombarded with “year-in-review” summaries where we find out just how much data is being collected to generate those summaries. End-of-year reflections are always useful because you can find patterns of what’s going well and what’s going poorly. It’s also good to pause and take stock of the things that did go well, because without that, you’ll only be looking at the list of things that you still have to do, and that isn’t healthy for anybody. In that spirit, let’s reflect on what we’ve been able to accomplish as a community this year, as well as what to look forward to in the next year!

2022 by the numbers

Let’s take a look at the Trino project’s growth and what happened specifically in the past year:

1,031,842 unique visits 🙋to the Trino site
116,231 unique blog post views 👩‍💻 on the Trino site
60,296 views 👀 on YouTube
5,982 hours watched ⌚on YouTube
4,696 new commits 💻 in GitHub
2,775 new members 👋 in Slack
2,769 new stargazers ⭐ in GitHub
2,550 pull requests merged ✅ in GitHub
1,465 issues 📝 created in GitHub
1,322 new followers 🐦 on Twitter
1,068 pull requests closed ❌ in GitHub
702 new subscribers 📺 in YouTube
658 average weekly members 💬 in Slack
56 videos 🎥 uploaded to YouTube
37 Trino 🚀 releases
36 blog ✍️ posts
12 Trino Community Broadcast ▶️ episodes
12 Trino 🍕 meetups
2 Trino ⛰️ Summits

The Trino website got an impressive number of unique visits, also referred to as entrances. This metric filters out refreshes and through traffic to count the number of times a visitor started a unique session. Blog posts saw a 47 percent increase from last year. Slack membership grew 13 percent and average weekly active members grew an exciting 25 percent. YouTube views have increased by 218 percent. We’ve more than doubled the number of hours watched, which makes sense, as we’ve nearly doubled the number of subscribers since last year.

The project’s velocity hasn’t slowed down either. The number of commits grew 27.6 percent this year and the number of created issues grew by 20 percent. This increase in demand for features also pushed up merged pull requests numbers by nearly 29 percent!

Why are we pointing out the number of closed pull requests that weren’t merged? We are improving communication with contributors regarding when and why we explicitly decide not to move forward with a pull request. Part of this has included a new initiative to close out old and inactive pull requests. There have been a good number of pull requests that have fallen through the cracks and are missing communication from the pull request creator or reviewer. The DevRel team, Brian Olsen, Cole Bowden, and Manfred Moser, are actively working on improving the workflow around pull requests and issues. Cole recently posted a blog that dives deeper into what this team is actively working on to improve the experience of contributing to the project.

A lot of these metrics indicate the growing popularity of Trino, but they also help drive further awareness of the project to others. One metric we pay close attention to is the number of visitors we get through blog posts, as they grow Trino’s visibility. This increases the number of contributors and users that shape Trino to be the best analytics SQL query engine on the planet. One of our most successful blog posts was Why leaving Facebook/Meta was the best thing we could do for the Trino Community. The day this blog post was released, it doubled the website traffic we received and set the record for blog post views or website views in a single day. For reference, our previous record was the post we had when the project was rebranded.

This post gained a lot of traction for two reasons. Posts related to Meta and the inner workings of open source communities naturally perform well, as many developers are interested in these topics - drama is exciting! But you can have an interesting topic that doesn’t go viral if nobody sees it. The catalyst to this success was actually when David Phillips posted this to Hacker News. We hit the top ten of Hacker News and occupied the front page for about two days.

So what is the takeaway here? We need your help! While it made sense for David to do this post once, Hacker News generally looks down upon repeated self-promotion. Clearly there’s a lot of people interested in Trino, and Hacker News and many other social media outlets are how we get the word out. If you don’t think that sharing has much effect, we hope sharing this impact motivates you to help us. We don’t want to keep Trino the hidden secret of Silicon Valley much longer. We need your help to really get people continuously reading and hearing about all things Trino. So share any time you see something cool going on in our community!

Trino touches the world

Let’s take a look at the number of users who have initiated at least one session on the Trino site in 2022 by top 10 countries. This goes to show the true global reach this project has attained in 10 years.

123,326 USA 🇺🇸users
33,540 Indian 🇮🇳users
30,955 Chinese 🇨🇳users
12,282 British 🇬🇧users
11,638 German 🇩🇪users
10,760 Canadian 🇨🇦 users
9,980 Brazilian 🇧🇷users
9,098 Singaporean 🇸🇬users
8,649 South Korean 🇰🇷users
8,636 Japanese 🇯🇵users

Our reach currently favors the USA, but our aim is to grow Trino in all countries that are starting to show interest. The new edition of “Trino: The Definitive Guide” is being translated into Chinese, Polish, and Japanese. If you want to translate the book to your local language, please reach out to Manfred Moser.

Trino celebrates its tenth birthday

Of all the incredible things that happened, one that gave us cause to reflect was Trino’s tenth birthday. Martin, Dain, and David cite longevity of the project as one of the core philosophies that govern decisions around Trino. We expect that Trino will be used for at least the next 20 years. We build for the long term. This first decade has been an adventurous ride, and wow has it produced an incredible system.

We wanted to do something special with the community to celebrate this milestone, so Brian put together a birthday video to timeline the evolution of Presto and now Trino. We had a premiere watch party on the day of the tenth anniversary and got some folks’ reactions. Take a look at the video if you haven’t yet, you don’t want to miss it.

Trino Summit

The next event in 2022 was the Trino Summit, which was the first in-person summit we’ve had as Trino, with well over 750 attendees. We had a stellar lineup of speakers from companies like Apple, Astronomer, Bloomberg, Comcast, Goldman Sachs, Lyft, Quora, Shopify, Upsolver, and Zillow.

This summit had a Pokémon theme, making the analogy that data sources are much like Pokémon and Trino is much like a Pokémon trainer trying to access and federate all the data, train it, and level the data up. Check out the video for a small summary, and if you missed this event, we have all the recordings and slides available.

We want to thank Starburst for hosting this event and all the sponsors for making this year’s summit possible. As usual, a huge thanks to the community for showing up, engaging with each other, and bringing your stories and curiosity.

Cinco de Trino

Cinco de Trino was our mini Trino Summit held in the first half of the year. It dove into using Trino with complementary tools to build a data lakehouse. The virtual event was held on Cinco de Mayo (5th of May), which gave it a Margaritaville, on-the-lake vibe. We used this conference as a platform to launch the long-awaited Project Tardigrade features around the fault-tolerance mode for Trino.

Trino Contributor Congregation

This year, we began what we are calling the Trino Contributor Congregation (TCC), which brings together Trino contributors, maintainers, and developer relations under the same roof. This congregation was to counter the siloed nature of Trino development that occurred during the pandemic. Many community members felt like their work wasn’t being seen and much of this was due to lack of communication, and especially face-to-face communication, which builds empathy and demands attention. The TCCs aim to increase connections and collaboration between maintainers and contributors, create opportunities for highly technical exchange of ideas and plans for Trino, and learn about usage scenarios and issues from each other. This is different from the Trino Summit since it focuses on gathering those who contribute code to keep the conversations focused on developing features and removing blockers for contributors.

The first TCC happened just following Trino Summit in Palo Alto. This was convenient for many, as a lot of folks were already in San Francisco to attend Trino Summit. Moving forward we will continue having in-person TCCs around Trino Summit to minimize the travel expected for anyone wanting to attend in-person TCCs.

Along with the in-person TCC, we also had the first virtual TCC in December. This included a great deal of people in Eurasia who weren’t able to travel to San Francisco in November. We covered mostly similar topics but with a larger amount of interaction from those new voices.

During these discussions the biggest topics covered timelines of existing roadmap items and suggestions for other items that should get more attention. We talked about upcoming connectors and plugins, and all the required infrastructure needed to support that. A recurring theme was the need for better testing infrastructure. The more information we can gather as a community, the quicker we can remove any issues as new releases come out and increase adoption of newer versions of Trino. We also discussed desired features around resource-intensive and batch workloads, and the new polymorphic table function features.

The biggest takeaway from these meetings was that everyone now had a better basis to engage with each other. As we move forward, we will continue the cadence of having these virtual TCCs to keep everyone on the same page, and have in-person meetings when there is a larger conference. With that, let’s cover some of the features we gained this year.

Features

Of course, one of the main deliverables of our project are Trino releases. In 2022, we improved our release process and cadence, shipping 37 releases that were packed with features, and we’re about to dive into a high-level list of the most exciting ones that made their way to you. For details and to keep up you can check out the release notes.

Fault-tolerant execution mode

2022 was the year of resiliency for Trino. Users have long requested adding a fault-tolerant mechanism to Trino akin to query engines like Apache Spark. Users wanted the ability to take the queries that they were running in Trino and scale those queries to larger data or resource intensive queries. Experimental features were implemented in late 2021 for automatic query retries and earlier this year task-level retries. The efforts for these features were codenamed Project Tardigrade.

Fault-tolerant execution relies on storing intermediate data between task shuffles to have data persist in an exchange spool. The first iteration of this was AWS S3, but eventually Azure Blob Storage and Google Cloud Storage were included. The Project Tardigrade engineers started improving performance and fixing bugs in fault-tolerant execution as users tested the early implementation. Later, memory efficiency for aggregations, faster data transfers, and dynamic filtering with fault-tolerant query execution were added. The launch of fault-tolerant execution happened at Cinco de Trino. The first iterations only applied to queries being run on object-storage connectors such as Hive, Iceberg, and Delta Lake. Recently, support for MySQL, PostgreSQL, and SQL Server were added. These contributions added a foundation for other JDBC connectors. A few companies, most notably Lyft, have adopted this feature and are scaling it in production.

SQL language improvements

Here are all the notable SQL features that made it to Trino this year:

MERGE statement support is the most impactful SQL feature released this year. MERGE allows users to implement INSERT, UPDATE, and DELETE functionality in one statement. MERGE is not simply syntax sugar, the implementation has profound performance improvements. A lot of your operations can be merged (pun intended) from multiple tasks into a single scan over data. This functionality is absolutely critical for positioning Trino as a data lakehouse query engine. MERGE is currently available in the Hive, Iceberg, Delta Lake, Kudu, and Raptor connectors. We discussed this and did a demo with MERGE on the recent Trino Community Broadcast with Iceberg.
Another massive update was the introduction of polymorphic table functions ( PTFs). Table functions initially released with some initial passthrough query functionality that we see in connectors like Pinot, Elasticsearch, MySQL, PostgreSQL, and other JDBC connectors. However, this is only one small instance of what can be achieved with PTFs and the true power comes from the generalization of this feature. Dain and David gave a simpler explanation of PTFs. To dive in deeper, watch this episode of the Trino Community Broadcast where Kasia Findeisen and Martin discuss PTFs in greater detail.
Dynamic function resolution has been discussed for many years and finally arrived. This provides the ability for connectors to provide functions at runtime. Unlike before, where you needed to statically register your functions ahead of time, you can now provide a plugin that contains these functions that are resolved at runtime. This enables features like supporting function calls to dynamically registered user-defined functions in different languages like Javascript or Python. Martin and Dain go into great detail about how this works when answering this question at Trino Summit.
Trino gained support for JSON processing functions, which is a part of the ANSI SQL 2016 specification. This resolves a large number of issues reported by the community over the years. This includes the json_array, json_object, json_exists, json_query, and json_value functions that were added to Trino this year.
The JSON format was added to the EXPLAIN statement to provide an anonymized query plan output to enable offline analysis.
It became possible to comment on tables, columns of tables, and even views for various connectors. Support for setting comments on views was introduced very recently and includes support for Hive and Iceberg.
A ton of new functions were added, including to_base32, from_base32, trim_array, and trim.

Performance improvements

Despite all the hype about vectorization being a silver bullet to make databases go fast, the real speed comes from better algorithms and better data structures that lead to lower resource consumption. Following is a list of some improvements that made their way into Trino this year:

Trino now offers improved performance for a variety of operations, including complex join criteria pushdown to connectors, faster aggregations, faster joins, and better performance for large clusters. We have also implemented improvements specifically for aggregations with filters and for the Glue metastore. In addition, we now support dynamic filtering for various connectors and have faster query planning for the Hive, Delta Lake, Iceberg, MySQL, PostgreSQL, and SQL Server connectors.
Along with general performance optimizations, there have been a great deal of query planning optimizations that lead to better performance for specific SQL operators. These include faster INSERT queries, improved performance for LIKE expressions and highly selective LIMIT queries, and enhanced performance and reliability for INSERT and MERGE operations. We also made performance improvements for JOIN, UNION, and GROUP BY queries, as well as faster planning of queries with IN predicates.
There are also optimizations for specific SQL types’ performance, such as string, DECIMAL, MAP and ROW types. We have also made aggregations over DECIMAL columns faster and improved the performance of ROW type and aggregation.
A last set of improvements come from reading open file formats like ORC and Parquet efficiently. We have improved the speed of reading or writing of all data types from and to Parquet in general. There were also general performance to ORC types, and now have the ability to write Bloom filters in ORC files. We have also improved performance and efficiency for a wide range of ORC and Parquet-related operations.

These improvements in aggregate are at the core of what makes Trino fast. There is no silver bullet you can plug in to speed things up. It takes time, effort, and smart changes to improve the speed of various systems.

Runtime improvements

Trino upgraded to Java 17. This upgrade improves the overall speed and lowers the memory footprint of Trino with various performance fixes to the JVM and garbage collectors. Trino uses the G1 garbage collector which can now more efficiently reclaim memory and reduce pause times.

Aside from having to perform the upgrades, we get a lot of these performance enhancements for free. On top of performance, upgrading to Java 17 adds new Java language features to improve the ability to write and maintain higher quality code.

To learn more, read this blog post and watch episode 36 of the Trino Community Broadcast

Along with the Java upgrade, Trino now has a Docker image for ppc64le and added CLI support for ARM64, which means Trino’s Docker image can run on AWS Graviton processors and the image and CLI can run on the new MacBooks.

Security

Trino added the following improvements and features relevant for authentication, authorization and integration with other security systems:

There were a lot of updates to OAuth 2.0 authentication like support for OAuth 2.0 refresh tokens and allowing access token passthrough with refresh tokens enabled. We also added support for automatic discovery of OpenID Connect metadata with OAuth 2.0 authentication, support for groups in OAuth2 claims, and reduced latency for OAuth2.0 authentication.
Hive, Iceberg, and Delta Lake got AWS Security Token Service (STS) credentials for authentication with Glue catalog and allow specifying an AWS role session name via S3 security mapping config.

Object storage connectors (Hive, Iceberg, Delta Lake, Hudi):

One of the common uses for Trino is being used as a data lakehouse query engine. This year we not only added two connectors to this category, but a lot of performance improvements across the board with the file reader and writer improvements.

Earlier this year, we added the Delta Lake connector to finally reach everyone using Trino in the Delta Lake community. Delta Lake is a table format that improves on the Hive table format in areas like better support for ACID transactions. After the initial release, we added read and write support on Google Cloud Storage, added support for Databricks 10.4 LTS, and improved overall performance of the connector. To learn more about the Delta Lake connector, watch the Trino Community Broadcast on Delta Lake.
The Hudi connector is a more recent addition, but it’s just as exciting. Hudi was created at Uber with the goal of handling realtime ingestion to a data lake. This connector is the youngest of the three newest object storage connectors, so stay tuned to see more features land around this connector. See how Robinhood uses Hudi and Trino in the Trino Community Broadcast.
The Iceberg connector had a massive amount of improvements as well, bringing it to the same level of a production-ready connector as Hive. Iceberg now has new expire_snapshots and delete_orphan_files and OPTIMIZE procedures. Having these capabilities along with MERGE are really the keys to being an effective lakehouse query engine. This year, Iceberg added support for the Glue metastore, the Avro file format, file-based access control, and UPDATE and time travel syntax. Iceberg received a lot of performance improvements and improvement in latency when querying tables with many files.
Although it seems like Hive is gradually on its way out, there are many that still depends on the Hive connector to be performant. Hive received support for S3 Select pushdown for JSON data, IBM Cloud Object Storage in Hive, improved performance when querying partitioned Hive tables, and the flush_metadata_cache() procedure for the Hive connector.

Other connectors

A major feature of Trino is the availability of other connectors to query all sorts of databases with SQL. All with the speed that Trino users are used to. Here’s some of the major improvements that landed for these connectors in 2022:

New MariaDB connector
Performance improvements with various pushdowns in the MongoDB, MySQL, Oracle, PostgreSQL and SQL Server connectors.
Support for bulk data insertion in SQL Server connector.
Added a query passthrough table function to numerous connectors.
Expanded SQL features for various connectors by adding support for TRUNCATE TABLE, DELETE, CREATE/DROP SCHEMA, INSERT, and others.
Update Cassandra connector to support v5 and v6 protocols.
A collection of improvements on the Pinot and BigQuery connectors

Bug fixes

Any software includes issues and bugs, Trino included. Thanks to our community we learned about many of them, and fixed even more. Continue to test new releases and report issues. Check out all the release notes for details.

Updates in the Trino ecosystem

Outside of the excitement within the main Trino project, there was a great deal going on in the larger Trino community and ecosystem:

Trino: The Definitive Guide second edition

Martin, Manfred, and Matt released the second version of Trino: The Definitive Guide. This update of the book from O’Reilly fixed errata, added the deployment process to include newer Kubernetes installation methods, and updated features for all the additions that had been released since the first version of the book. Along with this, efforts are underway to translate this book to different languages. Huge thanks to everyone involved in this!

Starburst provides Trino in the cloud

As a major community supporter, Starburst helped us with events, marketing, developer relations, and partner cooperation. Starburst also provided a large part of development and code contributions to Trino and its related projects. Starburst acquired Varada and integrated the object storage indexing technology, and they shipped many Starburst Enterprise releases for self-managed deployments. On top of all that amazing work, Starburst launched Starburst Galaxy as a powerful, multi-cloud SaaS offering of Trino. Security, cluster management, a query editor, and many other features are included in this new platform.

Amazon upgrades Athena

Athena version three rolled out and is now based on a recent Trino release. This is great news for Athena users who were missing the many performance gains, expanded SQL support, and other features from Trino, since the prior versions are based on old Presto releases. As a result, the large Athena community and their feedback and knowledge have become more integrated with the Trino community, and we are seeing positive impact for Trino releases already.

dbt-trino

dbt users rejoice! The official dbt-Trino integration made it into dbt this year! This means that anyone who wanted to read or write data to or from multiple data sources is now able to. If you want to dive into it, check out this blog post written by the contributors of this integration.

Python client improvements

The amount of development of the trino-python-client doubled this year. A major focus was on performance improvements with the sqlalchemy integration. There was also a wide range of bug fixes.

Airflow integration

The long-awaited Trino/Airflow integration landed this year. This paired well with the new task-retry and fault-tolerant execution features. To learn more about the full capabilities of pairing Trino’s few fault-tolerant execution mode with Airflow, check out Philippe Gagnon’s talk at this year’s Trino Summit.

Metabase driver

A lot of folks in the community were asking for a Trino/Metabase driver after Trino updated its name. This was a large blocker for anyone who wants to move to Trino and uses Metabase. Through a collaboration of the Metabase and Starburst engineers, the metabase-driver for Trino was released, and we saw numerous users migrate to Trino.

2023 Roadmap

The upcoming roadmap was covered in detail by Martin at Trino Summit. To avoid extending this blog even further, we’ll leave you with the featured project that covers many aspects of the Trino core engine.

Project Hummingbird

Project Hummingbird aims to improve Trino’s columnar and vectorized evaluation engine. Every year we report on many incremental performance improvements. These improvements are typically small in isolation but have a large aggregate impact. This incremental approach is the real key to improving query engine performance, and there is always room for further optimization. If you want to get involved with this exciting project, or to learn about the latest innovations as they are being discussed, join the #project-hummingbird channel in the Trino Slack workspace.

Conclusion

2022 was by far the busiest year this bunny has been. Trino has consistently continued growing as we’ve attracted more contributors. We believe this trend will continue in 2023 as we begin to put more process in place around managing pull requests. Remember to get the word out and share anything you genuinely think is cool or important for others to hear! Looking forward to an even more successful 2023 Trino nation!

Cleaning up the Trino pull request backlog

2023-01-09T00:00:00+00:00

At some point in the lifecycle of a successful open source project, it reaches a point where the number of incoming pull requests (PRs) outpace the project’s ability to get code merged. It happens for a huge variety of reasons, including developers moving on to other projects before tying up every loose end, reviewers who miss a request for review, and because some stagnant PRs were never going to happen and should have been closed two years ago. The GitHub notification system doesn’t do anyone any favors, either. Having too many open PRs is a problem for a project, because they make it harder to tell what is being worked on and what may as well be dead code walking.

And when we cross 700 open pull requests in Trino, constantly adding a few more to the pile every week, what do we do? We clean it up! Let’s talk about how we’re doing it, why we’re doing it that way, and how we’re planning on preventing this from happening again. The end result should be some process improvements that make contributing to Trino a better, faster, and more painless experience.

Spring cleaning

The “how” is an easy thing to talk about. The Trino developer relations team is in the process of going through all open PRs, from oldest to newest, manually taking a look at each one and checking in on how we may want to proceed. For PRs where the author seems to have abandoned it and not responded to a review, we close them down, encouraging the authors to open them right back up if they decide they want to continue work. For everything else, though, we’ve been taking a more measured approach, offering to help facilitate reviews or discussion for these long-lasting bits of code that may still have a chance of making their way into Trino.

To anyone who’s managed a repository before, this may seem like more effort than necessary. You can add a bot to close anything that’s been stale or inactive for too long, and problem solved, right? Sure, that does solve the problem, but it creates a couple others.

First, and perhaps most importantly: it’s not very human. Having a pull request that you put time and effort into get shut down by a bot without having another person swing by to say hello can be demoralizing, and it builds a negative experience that might discourage future contributions to the project. We want our contributors to like Trino and to enjoy the process of adding on to it, and a GitHub bot slamming the door shut on their hard work isn’t going to help with that. Having a bot do our work for us would also deprive us of a valuable learning opportunity. Manually checking in on each pull request that slipped through the cracks has allowed us to identify pain points in Trino code reviews which we can try to mitigate moving forwards, and it’s provided a ton of valuable insights for deciding on how to best improve the process.

Second, and perhaps even more significant: there’s a lot of cool stuff we’d be missing out on if we automatically closed everything. While going through the backlog, we’ve found dozens of year-old pull requests that still have a lot of value for Trino and only needed someone to take another look at them. For some, the author may be missing, but the ideas are good and the PR can be handed off to someone else to carry the torch and get it across the finish line. For others, the author is still happy and ready to iterate on it, and all that’s needed to get the ball rolling again is to ping a reviewer or two to take another look. We’ve even found a couple PRs that were approved and ready to go, and all it took was a simple click of the merge button. The effort-to-impact ratio on that is off the charts - think of all the value we’d be missing out on if we’d automatically closed those!

The result of the effort so far has been excellent.

We’re not completely done with the cleanup effort, but as you can see, we’re slowing down. Our oldest PRs are increasingly recent, still in development, and worth having open. Going from a peak of 700+ open pull requests to around 300 is a massive improvement, and the goal is to end up in the vicinity of about 200 open pull requests in Trino at any point in time.

Keeping things pristine

But with the cleanup being so manual, the next challenge is stopping the pull requests from steadily piling back up while we’re not paying attention to them. The fix for that is simple - we’re going to keep paying attention. The Trino developer relations team is planning on tracking and getting involved in two categories of pull requests to keep the number of open PRs stable.

The first category is pull requests that don’t get any immediate attention from a reviewer. While Trino reviewers are overall excellent and quick to take a look at incoming pull requests, about five percent slip through the cracks, where a contributor submits something that receives no reviews or comments and lives on in the pull request backlog. That’s not a good experience for the contributor, and it’s not good for Trino, either, because that contribution could have a lot of value. We plan on stopping this from happening by implementing workflows which spring Trino developer relations into action when these situations arise. If a pull request goes a few days without a comment, we’ll be the safety net to ask questions, get engineers involved, and make sure that at least a few pairs of eyes take a look at every incoming PR in a timely manner.

The second category is pull requests that get some reviews, but eventually stagnate or stop being actively worked on. This happens for a lot of reasons, but in all cases, if a pull request goes a few weeks with no activity, the developer relations team will be checking in. Our goal will be to figure out the proper path forward, whether that’s flagging down some reviewers again, communicating that the pull request should be closed, or anything else. The end result should be that nothing slips through the cracks and ends up going months without human contact. If an author vanishes or everyone gets too busy to look at a pull request again, though, the final stop will ultimately be a stale bot which closes pull requests that have gone a few months with no activity.

With all these processes in place, contributors should never feel like their efforts are going unnoticed. Submitted code should be reviewed quickly, iterated on in a timely manner, and merged without much delay. In situations where a pull request is not going to be merged, the Trino developer relations team should be able to chime in quickly to make that clear, saving contributors from wasting time and effort on a false impression that their code will be landed. And if you have any questions, concerns, or suggestions about all of this, don’t hesitate to reach out to us directly on the Trino Slack using @devrel-team!

Using Trino to analyze a product-led growth (PLG) user activation funnel

2022-12-23T00:00:00+00:00

As the holiday season approaches, we have reached the end of our Trino Summit 2022 recap posts. With the last talk of the summit, Mei Long from Upsolver gave an insightful overview of how they use data to inform product decisions.

Check out the slides!

Recap

When talking about product-led growth (PLG), it helps to start by defining what it even means. The core idea is simple: see how users engage with your product, and make decisions based on how you can improve the product to better serve those users. At Upsolver, the goal of PLG is to maximize user value. The issue is that while this can be simple in some situations, when you’re delivering complicated analytics tools, it’s not always immediately clear what features would be the most valuable or useful. You need a lot of data to glean a lot of insight, and you need to make sure your insights that can lead to action. And of course, you need to be absolutely certain that your data is high-quality, accurate, and trustworthy, lest you end up accidentally giving a customer a ten million dollar discount.

Mei explores the initial pass at using analytics to drive PLG at Upsolver, letting her intern use a tool called Amplitude that worked for a time and for limited use cases. As Upsolver grew, the analytics requirements did, too, and Amplitude wasn’t powerful enough for Upsolver’s use case, nor for the more complicated queries and analysis that needed to be run.

Want to guess what query engine they swapped to using? Trino. Mei dives into a quick demo that shows how Upsolver ingests all of its streaming data and stores it for Trino to query, driving down time-to-insight to make it quick and efficient to ask questions and make decisions based on those answers. With Trino at the ready, Upsolver has never been better-equipped to work towards PLG.

If you thought this talk was interesting, please consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/12/23/trino-summit-2022-upsolver-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

Using Trino with Apache Airflow for (almost) all your data problems

2022-12-21T00:00:00+00:00

As we close in on the final talks from Trino Summit 2022, this next talk dives into how to set up Trino for batch processing. Trino has historically been well-known for facilitating fast adhoc analytics queries as opposed to long-running, resource intensive batch/ETL queries. This is due to the fact that Trino kills queries that run out of resources in order to prioritize faster query execution. Earlier this year, Trino added features to better support batch queries with a new fault-tolerant execution mode. This mode backs up intermediate data during execution time, allowing Trino to restart individual query tasks on failure rather than a query stage or the query itself.

Batch queries don’t typically involve human intervention and run asynchronously. These tasks may depend on each other and have a complex workflow. This talk describes how to orchestrate this complexity using Airflow’s new Trino integration to run Trino batch queries to solve (almost) all your data problems.

Check out the slides!

Recap

In this talk, we’re going to hear from Philippe, a Trino contributor and Solutions Architect at Astronomer, the company building a SaaS product around Apache Airflow. Philippe describes a fictional trading scenario that initially follows a traditional warehousing approach to storing data. This architecture has data sources that are queried and submitted as raw data into a centralized warehouse. Within the warehouse itself, the raw data is transformed into data ready to be consumed.

This model enforces centralization, in which one team runs the platform and builds the integration between producers and consumers. This team focuses on the aspects of the data platform which further separates them from the business use case. As source databases evolve, the central data team must keep up with these changes. As the data consumers that rely on the data infrastructure grow, this team commonly becomes a bottleneck.

Trino allows you to move the queries as close as possible to the federated data sources, removing the labor-intensive process of moving data into stages before ingesting it into a central warehouse. This doesn’t mean that data movement is no longer a necessity, but the necessity shifts from an availability concern to a performance and scalability concern.

Without investing into more resources, your data professionals are able to work closely with producers and stakeholders with a shared understanding of the domain. This increases data literacy and data availability throughout your organization.

Trino is not only for fast adhoc analytics with a human in the loop, but now provides a fault-tolerant execution mode that enables it to run resource intensive batch jobs. This, paired with the federation capabilities, make Trino able to ingest any data that can be represented in a tabular format. Users can implement user-defined functions and run transformations using SQL without involving intermediate systems.

To run Trino batch queries at scale requires building complex interdependencies between different tasks and often needs monitoring if there are any failures that occur. This configuration also demands reactive automation to handle the failing instances. Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows on systems like Trino, perfectly complementing the challenges of handling these intensive queries at scale.

Even before introducing fault-tolerant execution mode, Trino was already being used to run batch queries at scale. In these scenarios, Trino and a tool like Airflow already work well together because these jobs will take time and likely nobody wants to wait around to run the pipeline components in sequence. The reason why fault-tolerant execution mode brings the Trino and Airflow combination to the forefront, is due to the anticipation of Trino being adopted as a batch query engine tool as the learning curve to run ETL jobs on Trino becomes as trivial as other tools in the space.

Philippe dives into building out basic Airflow jobs to run over Trino and introduces the concept of a directed acyclic graph (DAG). He then dives into multiple useful features that help break down large jobs into manageable tasks, and jobs that can adjust the schedule based on runtime execution. Sharded job creation splits large batch jobs into smaller tasks that can easily be retried. Dynamic task mapping splits jobs into smaller tasks based on data observed at runtime. Finally, a new features called data aware scheduling can schedule tasks based on interdependencies between datasets.

To get started with Trino in Apache Airflow, check out the Airflow Trino provider documentation.

If you thought this talk was interesting, please consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/12/21/trino-summit-2022-astronomer-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

Journey to Iceberg with Trino

2022-12-19T00:00:00+00:00

This post comes from the second half of Trino Summit 2022 session. Our friends JaeChang and Jennifer from SK Telecom traveled across the globe from South Korea to join us in person! SK Telecom recently had some issues scaling Trino on the Hive model, among other issues that come with Hive. While some initial tweaking helped speed things up, it ultimately never solved the problem. After switching to Iceberg, SK Telecom ran initial performance tests with some very impressive results. In this talk, Jennifer and JaeChang describe their journey to Iceberg with Trino.

Check out the slides!

Recap

SK Telecom is a South Korean telecom company that has built and operated an on-premise data platform based on open source software to determine manufacturing yield since 2015. SK Telecom’s goal has always been to build an observable federated data platform on open source software at scale.

SK Telecom manages on-premise Hadoop clusters to store their data. Previously, they used tools like distcp to make data available in one center. SK Telecom started using Presto in 2016 and shifted to Trino in 2021. To run batch queries on their warehouse, Trino workers are deployed on HDFS data nodes. There is also an adhoc Trino cluster deployed to manage federated queries over multiple data silos from an array of disparate data sources. This was one of the slow and brittle processes that Trino replaced. They chose Trino because it simplifies querying novel big data systems and combines that data more commonplace systems for their users.

As Trino adoption grew within the company up to 300 requests per minute, they eventually faced challenges with scaling. Not only were the number of requests growing, but the range of data being queried grew as well; users were evaluating petabytes of data, with terabyte-sized query input processed across hundreds of nodes. Many user queries were blocked while waiting for resources to become available. In response, the data engineering team began investigating how they could both scale and improve individual query performance.

To find the root cause, SK Telecom’s data engineers investigated cluster behavior beyond what was exposed in the web UI. They began collecting all the query plan JSON files, coordinator and worker JMX stats, system metrics, and Trino logs to build out their own metrics dashboard. The two main causes were that input data was too large, and there were spikes in the number of BlockedSplit operations leading to queries being blocked while waiting for other tasks to complete. They initially aimed to address this by changing some settings to increase thread counts and tuning the settings, but these changes still didn’t achieve the desired results. The ultimate bottleneck was the Hive metastore and the expensive list operations that caused many of the blocking operations to finish slowly.

At this point, the team reevaluated their needs to consider alternative solutions. They needed a better indexing strategy on the data with a flexible partitioning strategy. They also needed to remove the bottleneck on the metadata for this data while still maintaining compatibility across multiple query engines as Hive did.

The team looked at the existing set of novel data lake connectors available in Trino version 356, which at the time only included Iceberg. SK Telecom was immediately impressed by the metadata indexing in the Iceberg project. They particularly liked Iceberg’s snapshot isolation as data is created or modified. They were able to speed up queries using data file pruning on partition and column stats stored in the manifest file.

After running a benchmark, the team found that Iceberg reduced the input data size on the order of hundreds, down to under ten gigabytes. They also investigated adding a high amount of partitions to continue lowering the input data, but found that there’s a tradeoff where creating too many partitions increases query planning time. Ultimately, they found a sweet spot where the input data size was around six gigabytes and planning only took 70 milliseconds.

This summary is just the tip of the iceberg of all the information JaeChang and Jennifer shared with us about how Iceberg helped SK Telecom with their Trino scaling issues. Watch this incredible talk to learn more if you’re considering taking the leap from Hive to Iceberg!

If you thought this talk was interesting, please consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/12/19/trino-summit-2022-sk-telecom-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

Trino at Quora: Speed, cost, reliability challenges, and tips

2022-12-16T00:00:00+00:00

As we near the end of the Trino Summit 2022 recap series, it’s time to take a stop at Quora. At Quora, being an engineer responsible for maintaining Trino comes with its fair share of challenges. With concerns about cost, performance, and reliability, Quora has taken several creative steps to ensure that they get the most out of Trino. Other Trino users may be able to learn a few neat tips and tricks to do the same by tuning in.

Check out the slides!

Recap

Trino at Quora is used in the big ways that we’re all familiar with. It receives queries from a variety of clients and services, then executes those queries on an S3 data lake and Hive metastore to return results at high speeds. With a wide variety of clients, Quora gets the most out of Trino, using it for ad-hoc analysis, but also for ETL, backfill jobs, A/B testing, and time series queries. But as with any large system being used for so many things, this isn’t without a few challenges.

The first challenge is a universal one - how can Quora keep the costs of running Trino to a minimum? One of the biggest strategies was to migrate to AWS Graviton instances to run Trino clusters, as they have proven to be more cost-efficient than other AMD and Intel-based EC2 instances at Quora. Graviton does have lower availability, though, so they sometimes must be complemented with some AMD/Intel instances in order to avoid any downtime. Auto-scaling also led to great cost savings, as the workloads varied based on time of day. By checking usage and anticipating it by ramping up the number of machines during the busy workday and ramping it back down when fewer jobs are in progress, Quora was able to minimize idle machines and cut back on unnecessary spending. Finally, and perhaps most obviously, the team at Quora worked to make ETL queries more efficient. By using partitions effectively and creating a tool to detect inefficient queries scanning too many partition keys, the result is efficient queries that take less time and use fewer resources, saving on cost.

Up next - how could Quora maximize Trino’s performance? With data analysts expecting quick runtimes and occasionally running into problems, fine-tuning Trino to run as well as it possibly can isn’t always an easy task. One particular major issue they found at Quora was that some worker nodes which ran for 24 hours or more straight would utilize less CPU and run slow, bogging things down. The fix? Gracefully restart worker nodes that run for over a day, and implement a detector to flag and restart any nodes which showed signs of behaving slowly.

The final big concern at Quora is reliability, as users expect Trino to be up and running whenever they need it. In one instance, they found that overwriting a specific configuration option caused a cluster to crash repeatedly and slow down to a crawl. The issue was that they’d steadily been bumping the value of the query.min-expire-age configuration property up and up and up from the default value of 15 minutes, until eventually, unexpired query history was using up too much memory and causing the cluster to falter. Lowering the value back down to something more advisable saved the day in that situation. But wanting to avoid similar situations from happening again, Quora built extensive monitoring tools to track the health of their Trino clusters. They ensure that even when user error does cause problems, those problems can be flagged and send out alerts, bringing the data engineering team to the rescue.

If you thought this talk was interesting, please consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/12/16/trino-summit-2022-quora-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

43: Trino saves trips with Alluxio

2022-12-15T00:00:00+00:00

Hosts

Brian Olsen, Developer Advocate at Starburst (@bitsondatadev)
Manfred Moser, Director of Information Engineering at Starburst (@simpligility)

Guests

Bin Fan, VP of Open Source at Alluxio and PMC maintainer of Alluxio open source and TSC member of Presto (@binfan)
Beinan Wang, Software Engineer at Alluxio and Presto committer

The Alluxio crew at Trino Summit 2022.
From left to right: Beinan Wang, Bin Fan, Brian Olsen, Denny Lee, Hope Wang, Jasmine Wang.
Somehow Denny Lee from Delta Lake snuck in there 😉. Love the data community vibes on this one.

Concept of the episode: Data caching and orchestration

Out of all those petabytes of data you store, only a small fraction of it is creating business value for you today. When you scan the same data multiple times and transfer it over the wire, you’re wasting time, compute cycles, and ultimately money. This gets worse when you’re pulling data across regions or clouds from disaggregate Trino clusters. In situations like these, caching solutions can make a tremendous impact on the latency and cost of your queries.

Trino without caching

There seems to be a sizeable portion of the community who aren’t using a caching solution. Not all workloads will really benefit from caching. If you are performing more writes than reads, the cache will need to constantly be invalidated before performing each read. If you are scanning all your data to run daily migrations, you would not benefit from caching. However, one of the most common use cases where Trino shines is interactive adhoc analytics. This type of querying is very fast in Trino, especially when using modern storage formats like Iceberg.

Two types of caching

There are two types of caching used with Trino. The first type caches the results of a common query or sub query to optimize performance for any query that overlaps with predicates to obtain the cached results.

The other type of query is object file caching. Rather than store the results of the query, you are caching the files from a file or object store that are scanned as part of the query.

In this episode, we will focus on the latter type of caching. This will apply to connectors like Hive, Iceberg, Delta Lake, and Hudi.

Hive connector caching

Trino has an embedded caching engine in the Hive connector. This is convenient as it ships with Trino, however, it does not work outside the Hive connector. The caching engine is Rubix. While this system works for simple Hive use cases, it fails to address use cases outside of Hive and hasn’t been maintained since 2020. There are many features missing like security features and support for more compute engines.

What is Alluxio?

Alluxio is world’s first open source data orchestration technology for analytics and AI for the cloud. It provides a common interface enabling computation frameworks to connect to numerous storage systems through a common interface. Alluxio’s memory-first tiered architecture enables data access at speeds orders of magnitude faster than existing solutions. Alluxio was originally developed at Berkley Amp Labs, and was originally called Tachyon. It was less focused on caching and data orchestration and more focused on fault-tolerance via lineage and other techniques borrowed from Spark.

Alluxio lies between data driven applications, such as Trino and Apache Spark, and various persistent storage systems, such as Amazon S3, Google Cloud Storage, HDFS, Ceph, and MinIO. Alluxio unifies the data stored in these different storage systems, presenting unified client APIs and a global namespace to its upper layer data driven applications.

Alluxio is commonly used as a distributed shared caching service so compute engines talking to Alluxio can transparently cache frequently accessed data, especially from remote locations, to provide in-memory I/O throughput. Alluxio also enables unifying all data storage to a single namespace. This can make things simpler if your data is stored across different systems, have data stored in different regions, or stored across different clouds.

Source: https://docs.alluxio.io/os/user/stable/en/Overview.html

What is data orchestration?

A data orchestration platform abstracts data access across storage systems, virtualizes all the data, and presents the data via standardized APIs with global namespace to data-driven applications. In the meantime, it should have caching functionality to enable fast access to warm data. In summary, a data orchestration platform provides data-driven applications data accessibility, data locality, and data elasticity.

Source: https://www.alluxio.io/blog/data-orchestration-the-missing-piece-in-the-data-world/

Trino and Alluxio: Expedia use case

Expedia needed to have the ability to query cross cluster over different regions while simplifying the interface to their local data sources.

Source: Unifying cross-region access in the cloud at Expedia Group — The path toward data mesh in the brand world

PR of the episode: Alluxio/alluxio PR 13000 Add a doc for Trino

This episode’s PR is actually not located in a Trino repository. This PR comes from the Alluxio repository. It happened in the wake of the rebranding from Presto to Trino. PRs like this helped continue the Trino community as it grew awareness around the new name, as well as, fixed any potential issues that occurred with the hasty renaming we had to do.

This was submitted by Alluxio engineer, David Zhu. A huge thanks to David and his contributions to Trino as well!

Demo of the episode: Running Trino on Alluxio

This demo of the episode, covers how to configure Alluxio to use write-through caching to MinIO. This is done using the Iceberg connector with only one change to the location property on the table from the Trino perspective.

To follow this demo, copy the code located in the trino-getting-started repo.

Check out the in-person and virtual Trino Meetup groups.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Federating them all on Starburst Galaxy

2022-12-14T00:00:00+00:00

As the Trino Summit 2022 recap post series continues on, I have been reading all the wonderful posts by our awesome speakers, facilitated by the Trino developer relations team. Because I have a perpetual fear of missing out, I convinced them that I should get in on the fun. For this latest installment in the series, I will be recapping my very own Trino Summit talk. Basically, I’m ripping off Bo Burnham’s comedy bit where he reacts to his own reaction video, blog style.

In this session, I demonstrate building a data lakehouse architecture with Starburst Galaxy, the fastest and easiest way to get up running with Trino. Before I dive into the recap, I want to thank the Trino community for showing up. I am grateful that I was able to meet and learn from so many members of the community in person.

Recap

The premise of this example is that we have Pokémon Go data being ingested into S3, which contains each Pokémon’s encounter information. This includes the geo-location data of where each Pokémon spawned, and how long the Pokémon could be found at that location. What we don’t have is any information on that Pokemon’s abilities. That information is contained in the Pokédex stored in MongoDB which I’ve cleverly nicknamed PokéMongoDB. It includes data about all the Pokémon including type, legendary status, catch rate, and more. To create meaningful insights from our data, we need to combine the incoming geo-location data with the static dimension CSV table located in MongoDB.

To do this, I build out a reporting structure in the data lake using Starburst Galaxy. The first step is to read the raw data stored in the land layer, then clean and optimize that data into more performant ORC files in the structure layer. Finally, I join the spawn data and Pokédex data together into a single table that is cleaned and ready to be utilized by a data consumer. Next I apply role-based access control capabilities within Starburst Galaxy, which provides the proper data governance so that data consumers only have read permissions to that final table. I then create some visualizations to analyze which Pokémon are common to spawn in the San Francisco area.

I walk through all the setup required to put this data lakehouse architecture into action including creating my catalogs, cluster, schemas, and tables. After incorporating open table formats, applying native security, and building out a reporting structure, I have confidence that my data lakehouse is built to last, and end up with some really cool final Pokémon graphs.

Helpful links

Sign up for Starburst Galaxy
Read the docs
Try a tutorial for yourself
Register for Datanova

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/12/14/trino-summit-2022-starburst-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

Trino for large scale ETL at Lyft

2022-12-12T00:00:00+00:00

Buckle up, for the next post in the Trino Summit 2022 recap series. In this post, we’re covering the talk given by Lyft engineers, Charles and Ritesh, on how they have not only scaled Trino as adoption grew, but with less nodes and more effective usage. They also started moving to utilizing Trino more for ETL rather than just interactive analytics. Get ready for a smooth ride as Lyft brings you large scale ETL with Trino.

Check out the slides!

Recap

Lyft uses Trino to perform ETL jobs reading 10 petabytes of data per day and writing 100 terabytes per day. They run 250,000 queries per day, with around 2,000 unique users. This requires approximately 750 EC2 instances scaling up or down with an autoscaler. Over 90 percent of queries complete within a one to three minutes.

In the last year, Lyft cut their number of Trino nodes in half, while increasing their workloads. This is possible due to recent improvements in Trino and upgrades in Java versions. Lyft is not using fault-tolerant execution, but has started seeing interest in using Trino for ETL jobs due to the faster turnaround. Some issues Lyft has faced has been around how resource hungry Trino is, as well as, the issue where the coordinator can be a single point of failure for queries executing on a cluster.

Lyft was one of the earliest companies to really push using Trino for ETL use cases. They built custom best effort rollback code in Apache Airflow. If a query fails, the operation reverts to the state before the operation began. Lyft runs four Trino clusters split by the type of workload used on that cluster. The best practices are careful usage around broadcast joins, query sharding, and scaling writers for ETL loads.

One final point Lyft pointed out is keeping up with the rapid release cycle of Trino was a challenge. Lyft showcases their regression testing using their query replay framework. This session is a smooth five out of five ride. Enjoy!

If you thought this talk was interesting, please consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

Rewriting History: Migrating petabytes of data to Apache Iceberg using Trino

2022-12-09T00:00:00+00:00

Rolling right along with another one of our Trino Summit 2022 recap posts, we’re excited to bring you the engaging talk from Marc Laforet at Shopify. He talked about the ordeal (or, if you look at it in a positive light, the privilege) of migrating petabytes of data from Hive to Iceberg table formats with the help of Trino. With details on why Shopify chose to move to Iceberg, the various migration strategies that were considered, and the ultimate process of moving all that data while the Trino Iceberg connector was still in active development, it’s an insightful talk that you don’t want to miss.

Check out the slides!

Recap

Along with many other Trino users, it should come as no surprise that Shopify has a lot of data to work with. First-party data comes in from a few different sources, and there’s a mountain of modelled data to go along with it. In Shopify’s case, one of the issues was that some data sets were built on top of custom table formats. On top of that, the architecture wasn’t scaled with a careful plan in mind, leading to limited interoperability of datasets among various tools. With data scientists unable to unify data across different tools and storages, it was time for a change.

When you’ve got tons of data that isn’t currently in one place, what’s the fix? Create a central lakehouse for all the data to be accessible from, a single-service portal that could serve all users’ needs. The first question was which table format to use, and if the title of the blog post didn’t already give it away, they chose to go with Apache Iceberg. It was an easy, central vision to work towards: all data in a centralized lakehouse stored in Iceberg, then queryable by Trino.

Having a plan and putting that plan into action are two different things, though. When nothing is already in Iceberg, moving it all there is a migration on the scale of thousands of tables and petabytes of data. In Marc’s words from the talk, once Shopify committed to the migration and invested resources into it, the realization was, “crap, now I have to build it.” Even worse, because the old data was primarily in gzipped JSON format, it all needed to be rewritten… and so it was.

Then, enter Trino! With new Iceberg-based tables, Trino was identified as the right tool for the job to process all that data. This wasn’t without snags, as the migration happened while the Iceberg connector was still being aggressively worked on and developed. There were a few different incidents where Shopify hit a snag or an issue, and an update or bugfix to Trino’s Iceberg connector solved those problems in a matter of days or weeks.

The result of all of this? Some incredible benchmark results. Large tables saw a 96% reduction in planning time, a 96% reduction in cumulative user memory, and a 95% reduction in query execution time. That’s the difference between thousands of terabytes of memory to under 100, and a query that would take an hour to run only taking three minutes. For the absolute largest table at Shopify, some queries saw a 99.9% reduction in execution time. Yes, that number is real.

Moral of the story? If you find yourself using an old Hive table with outdated file formats, lamenting the resources you need and the time it takes, the decision is easy. Migrate to Iceberg with Trino. Shopify has shown us the way, and the full talk has plenty of useful advice for how to best go about it.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/12/09/trino-summit-2022-shopify-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

Elevating data fabric to data mesh: Solving data needs in hybrid data lakes

2022-12-07T00:00:00+00:00

Tune in for the next post in the Trino Summit 2022 recap series. In this post, we’re joining Saj from Comcast, to talk about their migration from a data fabric to data mesh. Saj shows you that there is more to the buzzword than meets the eye. He gives a solid overview of why Comcast is taking data mesh to heart.

Check out the slides!

Recap

Comcast engineer Sajuman Joseph brings us through Comcast’s process to move from their initial use case of using Trino to power their data fabric architecture to include more governance features by leveraging Trino. Data fabric enables querying data across distributed data sets, but importantly, it allows Comcast to transparently migrate data across on-prem and cloud storage without impacting users.

Despite offering query federation, data fabric still misses out on a higher-quality experience that data mesh aims to solve. Not only does having access to the data matter, but also adding data quality checks and a dedicated owner to ensure the data is correct and consumable. The ownership is split by domains defined by Comcast. It is the responsibility of the owners to ensure data quality, compliance, and security on the data they own. This data can be exposed internally or externally as a data product. While many of the drivers for this are done through company policy, there are technical means to make this possible. This includes improving metadata on the data, access logs, global data catalogs, and managing data access.

Trino facilitates a single point of access and is the a primary location where policies are enforced. Comcast created an engine called the Enterprise Policy Hub which syncs with all data stores and compute engines to enforce company policy and update metadata on all data across Comcast. Trino, along with other query engines, consults this engine to determine what information a user has access to, who owns the data, and creates an audit trail of what queries are run.

There are still some open challenges Comcast is looking to overcome. Data discovery is a large challenge for anyone looking to find a specific table and who is responsible for updating it. Another interesting area Comcast is researching is creating automated retention and minimization of data copies. This talk was exciting and gives a pretty clear roadmap to some beneficial changes many teams can make to improve the quality and governance of their data sets.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/12/07/trino-summit-2022-comcast-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

AI and poetry meet Trino

2022-12-06T00:00:00+00:00

Sometimes when working in the depth of the query engine core, the planner, the optimizer, or some other tricky problem or code, you run out of steam and need a distraction. Your brain will quietly work on the problem in the background while you get to read or do something interesting.

During one break we found out more about GPT-3 and the recent improvements on the OpenAI project. So we had to try.

It’s pretty amazing that you can feed the AI a prompt just like we did:

Write a short rhyming poem explaining Trino in easy but accurate terms

And after a few trials, the best one produced by GPT-3 was this:

Trino is a tech platform,
That makes data integration a breeze,
It’s designed to help your business grow,
And optimise your systems with ease.

It links together different sources,
Both big and small,
Gathers data of different types,
Helping you make decisions for all.

So Trino is an integration tool,
That connects data for you,
It’ll ease the burden of data sync,
Giving you more time to do what you do.

Cool, huh? Have a look at another prompt:

Write a short rhyming poem explaining Trino, the SQL query engine in complex
and accurate terms.

And the compelling result:

Trino is a query engine that does more than just run,
It’s a SQL engine with performance that stuns.
It is secure, efficient, and kind,
Providing real-time analytics of any kind.

It’s a distributed system that’s built to scale,
Processing queries that never fail.
Connecting to myriad data sources,
It can quickly analyze any course.

It’s flexible, reliable, and fast,
And is designed to make your queries last.
It can handle large volumes of data with ease,
Making the most complex queries a breeze.

If you are interested to learn more you can look at the blog post on Ars Technica or go and try it yourself on the playground.

Enjoy while we are heading back to working on Trino pull requests and other code now.

Martin and Marcos

Leveraging Trino to power data at Goldman Sachs

2022-12-05T00:00:00+00:00

Continuing with the Trino Summit 2022 sessions posts, we’re diving into an insightful lightning talk from Goldman Sachs. They explore how they use Trino to help ensure data quality across the board for all users and customers. By using Trino to federate their various data sources, querying everything in one place provides them with the flexibility they need. With that flexibility, they can validate that all data is as it should be where that data lives, settling any concerns that may exist about data integrity.

Recap

Validating data quality can be a tricky and complicated process. Data resides in many sources, with different rules and different processes for checking quality. Goldman’s data ingestion team may not have a detailed understanding of all data sets. Despite that, there is a need to autonomously verify and validate all data to be confident in its quality and integrity. The solution to this challenge? A queryable data quality platform powered by Trino.

The underlying data quality platform’s logic handles the validation. Resting on top of it is Trino, the scalable, fast solution to ensure that users can query what they need. Even when the platform is profiling the data, enforcing various quality rules, and validating the data in different ways, Trino is there to provide access to everything contained within, proving that quality, speed, and accessibility don’t need to be tradeoffs.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/12/05/trino-summit-2022-goldman-sachs-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

Trino delivers for Amazon Athena

2022-12-01T00:00:00+00:00

Our community just keeps growing! Today, it is time to reach out and welcome another large group of Trino users. The release of the new engine version for Amazon Athena upgrades Athena to a recent version of Trino from a rather old version. This update brings a ton of improvements from the Trino project to the users of the popular cloud-based query service.

Shared history

Amazon Athena and Trino share a long history. From the beginning of Athena, the query engine under the hood was Trino, then still called Presto. Athena created a low-maintenance, powerful access mode to your data in S3 and beyond. It combined the performance and features of Trino, with the convenience of a cloud service, which enabled new users and use cases. You could take advantage of Trino without needing a team of experts to deploy and operate a Trino cluster for your organization. In fact, we wrote about this in the first edition of Trino: The Definitive Guide. There is also a section in the new second edition that you can get for free from Starburst.

Time flies

But since the initial release of Athena, time has not stood still. In fact, the Trino project has accelerated in innovation, features, and releases tremendously. Until now Athena users missed out on these improvements. However with the update Amazon Athena users now get access to many of these great features. As AWS mentions in the announcement, “over 50 new SQL functions, 30 new features, and more than 90 query performance improvements” are now available due the upgrade to a new version of Trino. These include Row pattern recognition with MATCH_RECOGNIZE, new window features, support for UPDATE or TRUNCATE statements, and many others.

Performance improvements in our core engine and all the Trino connectors show up in every release note. The improvements observed by the Athena team in their benchmarks show the resulting gains nicely. This is great evidence that our approach of constantly working on small improvements wherever we find potential works well. This approach is necessary since Trino is already at a very high performance level, and like an elite athlete, where every small improvement matters.

It is also important to note that these improvements are only in the Trino version of the engine, since the Presto project does not include these features.

Client tools and collaboration

Athena users also benefit from improvements for supporting client tools such as Python clients, dbt, Metabase and others. Working with other communities is of critical importance to the Trino project. The innovations in our Iceberg connector that are all now also available to Athena users are a great example how we can lead the way together. Working with contributors from Amazon and other companies and projects has yielded some amazing improvements. At the Trino summit and contributor congregation, we to reconnected in person and established even closer collaboration.

Looking forward

So, what is next for Trino and Athena users? First up, you should upgrade to the new Trino engine in Athena, and avoid the legacy Presto engine.

Second, check out some of the great presentations from Trino Summit 2022 and hear about some of our impressions.

And last but not least, stay tuned for more goodness. Trino already shipped further releases that included support for MERGE, table functions, and more performance improvements. The Athena team is working hard on updating Trino for your benefit regularly.

Celebrating our first decade of the Trino project this last summer has shown a great trajectory for the project and the community, and it looks like the next decade is going to be even better!

Sending a warm welcome from the Trino community to the Amazon Athena team and users. Now you know that you were Trino users all along.

Martin and Manfred

Optimizing Trino using spot instances with Zillow

2022-12-01T00:00:00+00:00

In this installment of the Trino Summit 2022 sessions posts, we jump into an exciting topic by folks from Zillow about running Trino on spot instances. Spot instances are cheap and ephemeral nodes that lead to reduced overall compute costs. Spot instances are cheaper as they are not guaranteed to remain available.

In this session, Zillow engineers talk about how they use Trino on spots to take advantage of the cost savings while handling the transitory nature of spots.

Check out the slides!

Recap

Zillow’s BI platform team is tasked with enabling access to data and metrics from their data lake in a self-serving and performant manner. The platform must handle generating up-to-date reports and metrics to unlock time-critical opportunities. They also need to enable adhoc analytics across multiple domains within Zillow.

There are close to 600 data pipelines and 65,000 queries running daily. The average read covers 600 terabytes of data, and the average P95 time is around 20 seconds. They have six Trino clusters that service various workflows based on load. These are all deployed on Amazon EKS with a range of eight to 60 workers based on CPU utilization.

When deploying Trino on EKS, Zillow uses worker groups, which enables them to collocate nodes in AWS local zones. It also made it possible to choose spot instances, which are 90% cheaper than regular on-demand instances. A critical aspect they needed to cover was to correctly tune the percentage of nodes that were spot instances. They created pools of nodes that were entirely on-demand for coordinators since a coordinator going down, brings down the entire cluster. Other pools used for workers are tuned to an optimal blend of spot and on-demand.

Watch this session to learn how to properly optimize the number of spot instances running for your Trino clusters, without losing reliability of your service. Also learn some ways that Zillow is planning on using the fault-tolerant execution mode.

If you thought this talk was interesting, please consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/12/01/trino-summit-2022-zillow-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

Enterprise-ready Trino at Bloomberg: One Giant Leap Toward Data Mesh!

2022-11-30T00:00:00+00:00

This post continues a larger series of posts on the Trino Summit 2022 sessions. Following the Trino at Apple talk, engineers from Bloomberg shared the latest about their additions to Trino. Bloomberg uses Trino to federate huge amounts of disparate financial data together. When you have many users with different use cases and resource needs, you need something to ensure that the huge workloads don’t bully the small ones. Enter the Trino Load Balancer, a privacy-aware solution to help maintain high availability while still treating data security as the first-class citizen that it should be.

Check out the slides!

Recap

Bloomberg collects data, creates experimental data, and ingests data from vendors. Its data analysts then refine, clean, and structure that data using whatever their preferred method is, generating even more diverse data. Internal teams and clients then want to look at and query that generated data, too. Sound like a data mesh? That’s because it is. Trino isn’t new at Bloomberg, and it’s been in use to help federate all of those varying data sets into one unified access point.

When trying to deploy multiple Trino clusters for such a wide array of users who demand high uptime, high throughput, and fast response times, the Trino coordinator becomes a single point of failure. There’s the risk of infrastructure outages, the need to shut things down for occasional upgrades, and some users run high-throughput jobs for millions of rows while others are expecting low-latency jobs for only hundreds. Keeping Trino up, running, and meeting all users’ expectations is no small task.

And that’s where the Trino Load Balancer comes in! As a fork of the open-source presto-gateway, it helps to do exactly what it says on the tin for Trino: balance workloads. By being aware of what’s running on each cluster and how many resources are being used, it can direct traffic to the ideal clusters to meet each user’s needs. And with a brief demo, we get a look at how data owners can set policies that are respected within the load balancer, ensuring that users can only access and query what they’re supposed to.

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/11/30/trino-summit-2022-bloomberg-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

Trino at Apple

2022-11-28T00:00:00+00:00

This post continues a larger series of posts on the Trino Summit 2022 sessions. Following the Keynote: State of Trino session, engineers from Apple shared the current usage of Trino at Apple. They discuss how they support Trino as a service for multiple end-users, and the critical features that drew Apple to Trino. They wrap up with some challenges they have faced and some development they have planned to contribute to Trino.

Check out the slides!

Recap

Trino is deployed at scale in Apple, and it continues to see tremendous adoption across multiple teams at Apple. Yathi Peddyshetty, Software Engineer @ Apple

The commonplace adhoc and BI analytics use cases make up a lot of how Apple uses Trino today. They also have increasing uses in federated querying and A/B testing.

To deploy Trino as a service, Apple has an in-house Kubernetes operator to manage the Trino cluster lifecycles. They also created an orchestrator to provision and simplify cluster creation and management. They make this a self-service console that allows users to provision their own clusters per request. Their custom orchestrator also takes care of autoscaling and other technical complexities of maintaining a scalable Trino system.

Apple primarily uses Iceberg, Hive, and Cassandra connectors. They have a heavy focus on Apache Iceberg as their table format and have contributed a significant amount of PRs to improve interoperability between Trino and Spark, and increased coverage of Iceberg APIs. Other challenges Apple face stem from the lack of flexible routing of queries to achieve zero downtime, and having pluggable optimizer rules and operators.

Apple has various features on their roadmap to eventually contribute to the community. This includes, exposing remaining functionality in the Iceberg APIs, support all partition transforms, predicate pushdowns, bucketed joins, simple aggregate pushdowns, Iceberg native views in Trino, and more.

If you thought this talk was interesting, please consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/11/28/trino-summit-2022-apple-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

Trino Summit 2022 recap: The state of Trino

2022-11-22T00:00:00+00:00

To kick off the Trino Summit 2022, we heard from Trino co-creators Martin Traverso, Dain Sundstrom, and David Phillips. Martin gave a talk on the state of Trino and project plans for 2023, then opened the floor to questions from the community. You can watch a recording of the talk, or read on if you’re only interested in the highlights.

Check out the slides!

Recap

So what has happened in Trino over the last year?

We celebrated Trino’s 10th birthday!
It was the busiest year in project history, with 600+ contributors, 4000+ commits, and near-weekly releases.
Tons of new features were added, including MERGE, JSON functions, table functions, fault-tolerant execution (look forward to a lot of talking about it in later recaps!), upgrading to Java 17, and a slide so dense with other goodies that it needed two columns.

And what’s coming down the pipeline?

Project Hummingbird, a large set of core engine improvements.
Expanded table function support, including accepting tables as arguments.
Extra community support, so that contributors have an easier and better time getting code merged into Trino.
New connectors, CREATE/DROP CATALOG, query tracing, and more!

There were also tons of great questions asked by live and online attendees answered by Dain, David, and Martin, so if you want to hear more, take a listen to the full talk!

If you thought this talk was interesting, consider sharing this on Twitter, Reddit, LinkedIn, HackerNews or anywhere on the web. Use the social card and link to https://trino.io/blog/2022/11/22/trino-summit-2022-state-of-trino-keynote-recap.html. If you think Trino is awesome, give us a 🌟 on GitHub !

Trino Summit 2022 recap

2022-11-21T00:00:00+00:00

Trino Summit 2022 was in a word, invigorating. I’m still coming off the high from the amount of energy I gained from being at this summit, meeting many of you face-to-face for the first time. Most surprisingly, I learned that Trino contributor James Petty from AWS was actually not famous painter Bob Ross.

If you’ve ever planned a conference, you know that there are a lot of details to iron out, and you can be left exhausted by the end. After this year’s Trino Summit though, rather than being worn out, I felt like it ended too quickly and I simply wanted more time to chat with everyone. A single day was simply not enough, and now all I can think about is the next summit. We not only got to hear an incredible lineup of talks and discussions from first-time Trino Summit speakers like Apple, Shopify, and Lyft, but also had many engaging discussions outside the auditorium.

There were cross-community discussions between Delta Lake, Airflow, and Alluxio about how to turbo-charge Trino integrations with these communities. There were many companies talking about best practices and gotchas while migrating from Hive to Iceberg or Delta Lake. Others wanted to learn how to use fault-tolerant execution. I spoke with managers of companies like LinkedIn and Bloomberg who wanted to help develop their engineers to get more involved with contributing to Trino. We all finally got to see the faces of people we had been talking to for the past two to three years for the first time. People were getting their free copies of Trino: The Definitive Guide signed by Manfred, Matt, and Martin and brought home other swag. After a long day of talks, we wrapped Trino Summit up with two happy hours on the roof of the Commonwealth club watching the sunset over the San Francisco bay bridge.

Session summaries

I would like to quickly summarize a few short takeaways I had from each talk at the summit. I highly recommend you watch the full videos on the Trino YouTube which are linked in the titles:

Keynote: State of Trino (Read more)

Trino co-creator, Martin, covers recently developed features, community statistics, and discusses roadmap features like Project Hummingbird.
Dain and David join Martin on the stage to answer audience questions.

Trino at Apple (Read more)

Apple has an in-house k8s operator to manage Trino cluster lifecycles, and an orchestrator to provision and simplify cluster creation and management.
Apple has a heavy focus on Apache Iceberg as their table format and has contributed a significant amount of PRs to improve interoperability between Trino and Spark and increased coverage of Iceberg APIs.

Enterprise-ready Trino at Bloomberg: One Giant Leap Toward Data Mesh! (Read more)

Bloomberg uses Trino to centralize access to their massive amounts of catalogs under many different departments.
To offer Trino-as-a-Service for varying workloads, they use a Trino Load Balancer (a fork of the popular presto-gateway project at Lyft) to add new functionality. In talking with them after their presentation, the Bloomberg team expressed an interest in wanting to open source this work to the community as a more generalized solution than the gateway project.

Optimizing Trino using spot instances (Read more)

In an attempt to minimize costs, Zillow is measuring the efficacy of running Trino ETL jobs on spot instances.
This currently runs the risk of retries for failure but future work will look at utilizing the new fault-tolerant execution method to mitigate retries in the event of failure.

Leveraging Trino to Power Data at Goldman Sachs (Read more)

Goldman Sachs uses Trino to power their data quality service, taking advantage of the fact that Trino centralizes all visibility across their platform.

Elevating data fabric to data mesh: Solving data needs in hybrid datalakes (Read more)

Comcast takes us through their Trino architecture journey by providing the history of their Data Fabric service, and now discusses the data governance and culture changes required to realize a Data Mesh with Trino.

Rewriting History: Migrating petabytes of data to Apache Iceberg using Trino (Read more)

Shopify has recently migrates of its workloads to Trino. One of the first hurdles was dealing with many issues in the Hive table format, so they quickly upgraded to the Iceberg table format.
They initially encountered numerous issued, but experienced incredibly fast turnaround of fixes from the Trino project that resolved their issues during the migration.
There’s also a benchmark of how updating to a columnar format and Iceberg table format drastically improves the results.

Trino for Large Scale ETL at Lyft (Read more)

Lyft is using Trino to perform ETL jobs scanning 10PB of data per day, and writing 100TB per day. They are not using fault-tolerant execution.
In the last year, Lyft cut their number of Trino nodes in half, while increasing the volume of their workloads due to recent improvements in Trino and upgrades in Java versions.
Keeping up with the rapid release cycle of Trino was a challenge and Lyft showcases their regression testing using their query replay framework.

Federating them all on Starburst Galaxy (Read more)

Running and scaling Trino is difficult. Starburst showcases Starburst Galaxy, a SaaS data platform built around the Trino query engine.
This demoes running federated queries over Pokémon data scattered across MongoDB and Iceberg tables.

Trino at Quora: Speed, Cost, Reliability Challenges and Tips (Read more)

Quora uses a large number of Trino clusters for ad-hoc, ETL, time series, A/B testing, and backfill data.
Quora faced some initially high costs on Trino due to inefficient uses of resources.
To address this they migrated to use Graviton instances, implemented autoscaling, and optimized query efficiency.

Journey to Iceberg with SK Telecom (Read more)

The speakers travelled all the way from South Korea to join us in person.
SK Telecom had a multitude of performance issues that all stemmed from the lack of flexibility in the Hive model and metastore.
They migrated to Iceberg to address performance issues and had added benefits of Iceberg’s table format to improve developer workflow.
Housekeeping operations like optimize were already addressed by the Iceberg community and quickly added to Trino.
This reduced query processing time by 80%.

Using Trino with Apache Airflow for (almost) all your data problems (Read more)

Airflow is a highly functional and well-adopted workflow management platform to schedule jobs on your data platform.
The Trino integration for Airflow recently landed and this coincided with the GA arrival of fault-tolerance execution mode in Trino.

How we use Trino to analyze our Product-led Growth (PLG) user activation funnel (Read more)

Upsolver solves a lot of common data problems on their platform.
One such problem is measuring activation rates in a product-led growthteam. This requires taking action on many sources of data.
Trino makes a natural fit to address the issues of joining this data together.

Federate ‘em all

After a whole day of throwing Trino balls out to the crowd, we got to see a nice metaphor for federated data by throwing them all in the air and yelling, “Federate ‘em all!”

Trino Contributor Congregation

The day after the summit, we invited a relatively small group of our contributors to meet for the inaugural Trino Contributor Congregation (TCC). This gathered many of our long-time and heavy Trino contributors. We had folks from companies like Starburst, AWS, Apple, Bloomberg, Lyft, Comcast, LinkedIn, Treasure Data, and others. Let’s dive into some of the topics we discussed.

We discussed feature proposals like:

The Trino loadbalancer which is an adaption of the popular gateway project from Lyft.
A Ranger plugin to be maintained by the Trino community rather than rely on the Ranger project.
A Snowflake connector that was traditionally held back by the lack of infrastructure.

We discussed the need for better shared testing datasets outside of the TPC-H and TPC-DS that are more representative of real workloads that many are using.

We discussed the need for a clearer process for contributors to follow to minimize the time to get features merged and avoid stale PRs. This is being addressed by the backlog grooming performed by the developer relations team, and assigning maintainers to own various PRs. While there is never a promise to merge a PR, improving the turnaround and communication on PRs is crucial to keep happy contributors and improve the health of the project.

While we were sad that not everyone could make the in-person TCC, we plan to have virtual TCCs on a more frequent cadence and have the in-person TCCs alongside larger in-person events. Getting these TCCs right is core to growing the maintainership and continued success of the Trino project.

We hope all of you who could join us in-person and online enjoyed yourselves. We all had such a blast! Stay tuned for updates on the next Trino Summit location!

42: Trino Summit 2022 recap

2022-11-17T00:00:00+00:00

Hosts

Brian Olsen, Developer Advocate at Starburst (@bitsondatadev)
Cole Bowden, Developer Advocate at Starburst
Manfred Moser, Director of Information Engineering at Starburst (@simpligility)

Guests

Brian Zhan, Product Manager at Starburst (@brianzhan1)
Claudius Li, Product Manager at Starburst
Dain Sundstrom, Trino creator and CTO at Starburst (@daindumb)
Martin Traverso, Trino creator and CTO at Starburst (@mtraverso)

Releases 402 to 403

Official highlights from Martin Traverso:

Trino 402

Support for column comments in Hive and Iceberg views.
Support predicate pushdown on temporal types in MongoDB connector.
Faster OR, nullif, and arithmetic operations in SQL Server connector.

Trino 403

Support for DELETE in MongoDB.
Faster aggregations.
Faster data transfers with fault-tolerant execution.
Faster SHOW SCHEMAS in BigQuery.
Faster expire_snapshots in Apache Iceberg.

More detailed information is available in the release notes for Trino 402, and Trino 403.

Trino Summit 2022 recap

This episode we’re doing a recap of both the Trino Summit and the first Trino Contributor Congregation. We dive into what everyone’s favorite Trino Summit sessions were. Then we cover key takeaways from the Trino Contributor Congregation, which took place the day after.

Check out the in-person and virtual Trino Meetup groups.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Top five reasons to attend Trino Summit 2022

2022-10-31T00:00:00+00:00

This blog post wraps up a series of previous posts
teasing Trino Summit 2022. The conference is free and takes place in San Francisco, California on November 10th. Join us either in-person or virtually!

Lets dive right into the five reasons you should attend Trino Summit 2022. If you’re not into these lists, go ahead and register now!

1. Hear speakers from industry leading companies talk about their Trino architecture and use cases

This year’s summit contains leaders in the industry with varying workloads and use cases. There are also sessions on tips and tricks to scale and lower the cost of running Trino in production. Users from the following companies speak about their challenges and how they use Trino to help overcome them:

Apple
Astronomer
Bloomberg
Comcast
Goldman Sachs
Lyft
Quora
Shopify
SK Telecom
Starburst
Upsolver
Zillow

To see more information about the talks and the agenda for the conference, check out the Trino Summit 2022 agenda.

2. Meet the authors of the Trino: The Definitive Guide and get that Trino swag

This year, we are giving away autographed copies of the recently updated, Trino: The Definitive Guide to members who are attending. Already have a physical copy? Visit the Trino booth to get your book signed and meet authors Manfred Moser, Matt Fuller, and Martin Traverso who literally wrote the book on Trino.

We will be giving away swag packs containing an autographed copy of Trino: The Definitive Guide, a Trino Summit 2022 shirt, a Commander Bun Bun plushie, and more to both virtual and in-person attendees! This will be done during our sponsored giveaway breaks between sessions where we challenge both online and virtual attendees in a race against time to bag the swag!

3. Federate ‘em all

This year’s summit will be a free event that federates both data and humans. The theme extends from a popular show that many of you know called Pokémon. To understand the connection here, let’s break down what we mean by federate ‘em all. In the same way Pokémon protagonist, Ash Ketchum, catches and trains heterogeneous creatures called Pokémon, Trino queries and filters heterogeneous data sets from various data sources.

If you’re not familiar with Pokémon, a losing strategy is to train just one or two Pokémon as different types of Pokémon are better suited to different tasks. In the same way, centralizing all of your data to a single data warehouse or data lake doesn’t make sense either. There are different use cases and different needs across the company. Rather than spending your time building brittle one-size-fits-all architectures, Trino enables you to connect to multiple data sources using ANSI SQL.

4. Experience beautiful San Francisco

For those attending in-person, you will get to enjoy the beautiful San Francisco area. The Commonwealth club, is located right on the San Francisco Bay. The building is beautiful with a large auditorium for the main event, and plenty of floors and rooms for socializing.

At the end of the summit, we will have a happy hour on the scenic roof-deck that gazes over the San Francisco bay at the iconic Oakland Bay Bridge.

We know this only applies to our in-person attendees, but remember if you join us virtually, there are still plenty of resources to network and interact throughout the conference. We will be taking questions from our virtual audience and there will also be a chat forum to discuss with attendees from across the globe. Plus, unlike those of us attending in-person, no travel is required and pajamas are optional during the event!

5. Collaborate with some of the best minds working on Trino

Trino is a relatively new paradigm compared to the rest of data world. If you just realized that you don’t have to move all your data into one location, you’re on the right track. However, there’s still a lot to learn when it comes to scaling out a query engine that over time grows in usage. To get this right, you need a community to be successful. The creators Martin, Dain, and David and many of the core contributors of Trino will be attending, along with a large list of folks that are using multiple clusters over hundreds of petabytes of data.

Tap into this incredibly passionate group of Trino enthusiasts to augment your experience with this revolutionary query engine!

Register for the summit

Make sure you register quickly for in-person registration, as it is limited to 250 seats. Spots are running out quickly so don’t wait!

Announcing the final round of sessions and the agenda!

Now for the final list of sessions to announce for this year’s Trino Summit! This week is quite the reveal as we are showcasing a talk of how engineers at Apple use Trino for their analytics challenges! 🎉🤯

We also have three more amazing guests that are heavy hitters in the data and analytics tech scene.

Trino at Apple

In this talk the audience will learn how Apple uses Trino to accelerate analytics, the challenges we face deploying analytics at scale at Apple, and the areas we would like to collaborate on with the community.

Vinitha Gankidi, Software engineer at Apple
Yathindranath Peddyshetty, Software engineer at Apple

Enterprise-ready Trino at Bloomberg: One Giant Leap Toward Data Mesh!

Enterprises like Bloomberg love Trino. It allows us to embrace the data mesh with ease. Providing Trino as a service in a highly available, configurable, and access-controlled manner has been a key enabler for us in this paradigm shift. Join us to learn how we have leveraged open-source components to achieve these goals at Bloomberg.

Pablo Arteaga, Software Engineer at Bloomberg
Vishal Jadhav, Software Engineer at Bloomberg

Leveraging Trino to power data quality at Goldman Sachs

Data is at the core of today’s business processes. We are responsible for making accurate, timely, and modeled data available to our analytics and application teams. The source of these datasets can be quite heterogeneous like HDFS, S3, Sybase, Snowflake, Elasticsearch, and more. Also with an increase in data volume, velocity, and variety; data quality assurance is extremely critical to ensure the trustworthiness of data and mark it usable for consumers to use with confidence. We have leveraged Trino to make high-quality data centrally accessible through an efficient, secure, governed, and unified way of performing analytics.

Sumit Halder, Vice President at Goldman Sachs
Ramesh Bhanan, Vice President at Goldman Sachs
Siddhant Chadha, Associate at Goldman Sachs
Suman Baliganahalli Narayan Murthy, Vice President at Goldman Sachs

Optimizing Trino using spot instances

Trino is a critical tool used at Zillow for doing analytics on datalake. In this talk we aim to give a general overview of how we leverage Trino and dive deeper into the optimizations we have done for scaling Trino at Zillow using Spot instances.

In this session, we will show how fault-tolerant execution mode enables a more cost-effective and resilient execution running Trino on Spot.

Rupesh Kumar Perugu, Senior Software Engineer at Zillow
Santhosh Venkatraman, Software Engineer at Zillow

That finalizes all of our sessions! To see them all, check out the Trino Summit 2022 agenda.

Conclusion

Get excited, the conference is in less than two weeks so don’t forget to register, and always, Federate them all! It is really shaping up to be an educational and fun-filled event with Trino experts and aficionados.

A huge thanks to our sponsors: Starburst, Privacera, Monte Carlo, Immuta, CubeJS, Delta Lake, Hightouch, Backblaze, Databricks, Alluxio, and Tabular!

Well that’s a wrap, we’ll see you all in T-minus ten days!

41: Trino puts on its Hudi

2022-10-27T00:00:00+00:00

Hosts

Brian Olsen, Developer Advocate at Starburst (@bitsondatadev)
Cole Bowden, Developer Advocate at Starburst

Guests

Sagar Sumit, Software Engineer at Onehouse (@sagarsumit6)
Grace (Yue) Lu, Software Engineer at Robinhood

Register for Trino Summit 2022!

Trino Summit 2022 is coming around the corner! This free event on November 10th will take place in-person at the Commonwealth Club in San Francisco, CA or can also be attended remotely!

Read about the recently announced speaker sessions and details in these blog posts:

You can register for the conference at any time. We must limit in-person registrations to 250 attendees, so register soon if you plan to attend in person!

Releases 396 to 401

Official highlights from Martin Traverso:

Trino 396

Improved performance when processing strings.
Faster writing of array, map, and row types to Parquet.
Support for pushing down complex join criteria to connectors.
Support for column and table comments in BigQuery connector.

Trino 397

S3 Select pushdown for JSON data in Hive connector.
Faster date_trunc predicates over partition columns in Iceberg connector.
Reduced query latency with Glue catalog in Iceberg connector.

Trino 398

New Hudi connector.
Improved performance for Parquet data in Delta Lake, Hive and Iceberg connectors.
Support for column comments in Accumulo connector.
Support for timestamp type in Pinot connector.

Trino 399

Faster joins.
Faster reads of decimal values in Parquet data.
Support for writing array, row, and timestamp columns in BigQuery.
Support for predicate pushdown involving datetime types in MongoDB.

Trino 400

Support for TRUNCATE in BigQuery connector.
Support for the Pinot proxy.
Improved latency when querying Iceberg tables with many files.

Trino 401

Improved performance and reliability of INSERT and MERGE.
Support for writing to Google Cloud Storage in Delta Lake.
Support for IBM Cloud Object Storage in Hive.
Support for writes with fault-tolerant execution in MySQL, PostgreSQL, and SQL Server.

Additional highlights worth a mention according to Cole:

The new Hudi connector is worth mentioning twice. It was in the works for a while, and we’re really excited it has arrived and continues to improve.
Trino 396 added support for version three of the Delta Lake writer, then Trino 401 added support for version four, so we’ve jumped from two to four since the last time you saw us!
There have been a ton of fixes to table and column comments across a wide variety of connectors.

More detailed information is available in the release notes for Trino 396, Trino 397, Trino 398, Trino 399, Trino 400, and Trino 401.

Concept of the week: Intro to Hudi and the Hudi connector

This week we’re talking about the Hudi connector that was added in version 398.

What is Apache Hudi?

Apache Hudi (pronounced “hoodie”) is a streaming data lakehouse platform by combining warehouse and database functionality. Hudi is a table format that enables transactions, efficient upserts/deletes, advanced indexing, streaming ingestion services, data clustering/compaction optimizations, and concurrency.

Hudi is not just a table format, but has many services aimed at creating efficient incremental batch pipelines. Hudi was born out of Uber and is used at companies like Amazon, ByteDance, and Robinhood.

Merge on read (MOR) and copy on write (COW) tables

The Hudi table format and services aim to provide a suite of tools that make Hudi adaptive to realtime and batch use cases on the data lake. Hudi will lay out data following merge on read, which optimizes writes over reads, and copy on write, which optimizes reads over writes.

Hudi metadata table

The Hudi metadata table can improve read/write performance of your queries. The main purpose of this table is to eliminate the requirement for the “list files” operation. It is a result from how Hive-modelled SQL tables point to entire directories versus pointing to specific files with ranges. Using files with ranges help prune out files outside the query criteria.

Hudi data layout

Hudi uses multiversion concurrency control (MVCC), where compaction action merges logs and base files to produce new file slices a cleaning action gets rid of unused/older file slices to reclaim space on the file system.

Robinhood Trino and Hudi use cases

One of the well-known users of Trino and Hudi is Robinhood. Grace (Yue) Lu, who joined us at Trino Summit 2021, covers Robinhood’s architecture and use cases for Trino and Hudi.

Robinhood ingests data via Debezium and streams it into Hudi. Then Trino is able to read data as it becomes available in Hudi.

Hudi and Trino support critical use cases like IPO company stock allocation, liquidity risk monitoring, clearing settlement reports, and generally fresher metrics reporting and analysis.

The current state of the Trino Hudi connector

Before we had the official Hudi connector, many, like Robinhood, had to use the Hive connector. They were therefore not able to take advantage of the metadata table and many other optimizations Hudi provides out of the box.

The connector gets around that and now enables using some Hudi abstractions. However, the connector is currently limited to read-only mode and doesn’t support writes. Spark is the primary system used to stream data to Trino in Hudi. Check out the demo to see the connector in action.

Upcoming features in Hudi connector

First we want to get the read support improved and support all query types. As a next step we aim to add DDL support.

The connector only supports copy on write tables, and soon we will add merge on read table support.
Hudi has multiple query types. Adding snapshot querying support will be coming shortly.
Integration with metadata table.
Utilize the column statistics index.

PR 14445: Fault-tolerant execution for PostgreSQL and MySQL connectors

This PR of the episode was contributed by Matthew Deady (@mwd410). The improvements enable writes to PostgreSQL and MySQL when fault-tolerant execution is enabled (retry-policy is set to TASK or QUERY). This update included a few changes to core classes used for connectors using JDBC clients for Trino to connect to the database. For example, Matthew was able to build on this PR by adding a few additional changes to get this working in SQL Server in PR 14730.

Thank you so much to Matthew for extending our fault-tolerant execution to connectors using JDBC clients! As usual, thanks to all the reviewers and maintainers who got these across the line!

Demo: Using the Hudi Connector

Let’s start up a local Trino coordinator and Hive metastore. Clone the repository and navigate to the hudi/trino-hudi-minio directory. Then start up the containers using Docker Compose.

git clone git@github.com:bitsondatadev/trino-getting-started.git
cd community_tutorials/hudi/trino-hudi-minio
docker-compose up -d

For now, you will need to import data using the Spark and Scala method we detail in the video. Eventually we will provide a SparkSQL in the near term, and update this to show the Trino DDL support when it lands.

SHOW CATALOGS;

SHOW SCHEMAS IN hudi;

SHOW TABLES IN hudi.default;

SELECT COUNT(*) FROM hudi.default.hudi_coders_hive;

SELECT * FROM hudi.default.hudi_coders_hive;

Events, news, and various links

Blog posts

Fresher Data Lake on S3

Check out the in-person and virtual Trino Meetup groups.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Trino Summit 2022: Federating humans and data

2022-10-19T00:00:00+00:00

Trino has long been the de facto standard to querying large data sets over your cloud or on-prem storage, also known as data lakes. This Trino Summit’s theme instead will showcase Trino’s other claim to fame: query federation. Trino is a query engine providing an access point that exposes ANSI SQL across multiple data sources.

I urge you to join us either in-person or virtually if you are a fan of Trino, big data, open source, data engineering, Java, or all the above! This conference is free and takes place in San Francisco, California on November 10th.

Register for the summit

I can’t help but bring up the analogy of how Trino federates heterogeneous data while this Trino Summit will federate many of us in the community form all corners of the world. It really brings an appreciation to the international reach of Trino and makes me look forward to more in-person events!

Trino Summit will be held at the Commonwealth Club in San Francisco, California. Make sure you register quickly for in-person registration, as it is limited to 250 seats. Virtual registration is also picking up quickly so register today!

Get an autographed copy of Trino: The Definitive Guide, 2nd ed.

Want to meet the authors who literally wrote the book on Trino? Visit Manfred Moser, Matt Fuller, and Martin Traverso at the Trino booth during the conference. Bring your hard copy of Trino: The Definitive Guide to get it signed by the authors!

Don’t have a book? We’ll be giving away autographed copied of the book throughout the conference!

Trino Summit 2022 teaser

Check out the teaser for this year’s Trino Summit and get ready to Federate ‘em all!

Announcing the second round of sessions and speakers

As mentioned in the previous summit teaser, we announced some of our exciting lineup of speakers! The topics range from architectures like data mesh and data lakehouse, to running Trino at scale with fault-tolerant execution, and of course, query federation.

We have a full roster planned, but check out the next round of fully confirmed sessions. Stay tuned for one more blog post as we announce the final sessions in our agenda as they are confirmed!

SK Telecom’s journey to Iceberg

SK Group is one South Korea’s largest conglomerates in the nation covering industries from manufacturing to telecommunications. SK Telecom uses an on-premise data platform at petabyte scale using Trino as a query engine. We chose Trino for its ability to connect to heterogeneous data sources and ensures fast performance that plays a key role in our data platform.

As data along with user demands to analyze long-term data increased, the Trino Hive connector faced several challenges. Queries with an input data size exceeding a terabyte put a great burden on the cluster. This caused many jobs to fail which can be problematic as Trino’s resource sharing architecture affects multiple users when a heavy query occurs.

To address this situation, we optimized the data structure, tuned queries, and used the resource group to isolate queries, but none of this fixed the problem. We investigated Apache Iceberg and realized it could address some of these scaling issues we were facing. In this talk, we will share our journey.

JaeChang Song, Data Engineer at SKTelecom and Trino/Iceberg Contributor
Jennifer OH, Data Engineer at SKTelecom

Elevating Data Fabric to Data Mesh: solving data needs in hybrid data lakes

At Comcast, we have long had a complex hybrid data lakes that consists of data lakes in on-prem and multiple cloud environments. Comcast uses Trino to bridge the data in these environments using an architecture we call Data Fabric. Data Fabric is an abstraction layer that uses an internally built connector that connects to multiple instances of Trino. This enables us to query across all of these environments from a single Trino instance.

In recent years, emerging architectures like Data Mesh have nicely complemented the goals we have been building to for years. While we have effectively implemented some aspects of a Data Mesh, there are still core tenants that cannot be addressed by Trino alone. This is the journey we are on at Comcast, and we like to share our experience so far, challenges we overcame, and the ones yet to be resolved. Data abstraction, availability, movement, and governance are the various topics we will touch upon in this session.

Sajumon Joseph, Sr Principal Architect
Pavan Madhineni, Sr. Manager; Product Development Engineering

Trino at Quora: Speed, Cost, Reliability Challenges and Tips

Trino has become an essential part of Quora’s tech stack and a major component of our A/B testing framework that powers our decision-making on the product. Trino has brought a lot of advantages to us. However, at Quora’s scale, we face cost, speed, and reliability challenges when operating Trino.

In this session, we will talk about how we resolve the challenges. Some approaches are: auto-scale Trino clusters, experiment with different cluster and JVM configurations, and instance types, build checkers to detect slow workers and inefficient queries, and set up extensive monitoring.

Yifan Pan, Software Engineer of Data Infrastructure Team at Quora; Administrator/Primary Owner of Trino infrastructure at Quora

How we use Trino to analyze our Product-led Growth (PLG) user activation funnel

Being a PLG company, we must track and analyze every action our users perform within the product to remove friction and maximize usage and satisfaction. To understand how effectively and quickly users become educated and then active in the product, we had to instrument the user journey from signup to the Aha moment and beyond.

There are many tools on the market that can be used to analyze user behavior, but none met our needs. In this session you will learn how we built a data architecture to collect, model, and enrich user behavior events to optimize Trino query performance that accelerated our ability to understand and improve user conversion rates.

Roy Hasson, Head of Product at Upsolver

Conclusion

I hope you all are as excited as we are to finally federate the Trino community face-to-face! This conference is shaping up to be educational, fun, and filled with Trino experts and aficionados.

Stay tuned for new developments in upcoming blog posts, don’t forget to register, and always, Federate them all!

Release of the second edition of Trino: The Definitive Guide

2022-10-03T00:00:00+00:00

It was time for a refresh. A little while ago in April 2021, we announced the Trino version of our definitive guide. But again, Trino as a project and community has continued to innovate and grow. Numerous smaller and larger details changed, and the examples and resources needed to be fixed.

Today, we are happy to announce that after a few months of updates, testing, and editing, the second edition of Trino: The Definitive Guide is available.

Get a free copy from Starburst now!

The new edition of the book from O’Reilly is available in digital formats as well as physical copies. You can find more information about the book on our permanent page about it.

The book is now updated to Trino release 392 for all filenames, installation methods, commands, names and properties. We addressed all problems that our readers found and reported to us as well.

We updated to Java 17 usage, added more SQL statements, and added info about Python tools like dbt and clients like Metabase. We talk about the lakehouse architecture and new connectors like Iceberg and Delta Lake.

So what are you waiting for? Go get a copy, check out the updated example code repository, give us a star, provide feedback, and contact us on Slack.

Manfred, Martin, and Matt

And one last tip, join us at Trino Summit 2022 in San Francisco in November for a chat and maybe even a signed hardcopy of the book.

Trino Summit 2022 will be legendary

2022-09-22T00:00:00+00:00

Commander Bun Bun is back and this year we have an exciting lineup of speakers. Topics range from architectures like data mesh and data lakehouse, to running Trino at scale with fault-tolerant execution, and query federation. This conference is free and takes place on November 10th. The summit is a hybrid event for in-person and virtual attendance. Find out more details below!

Register for the summit

This year’s Trino Summit will be hosted at the Commonwealth Club in San Francisco, CA. In-person registration is limited to 250 seats so make sure you register quickly before spots run out!

Trino Summit 2022 teaser

Get ready to federate them all this year! Many times when folks think of Trino, their first instinct is to consider the data lake use case where it replaces Hive or other data lakehouse query engines. However, this summit will also drill into the lesser discussed query federation use case. Federate ‘em all!

Announcing the first sessions and speakers

We have a full roster planned but here is a glance at a few full confirmed sessions. Stay tuned for future blog posts as we announce more session as they are confirmed!

State of Trino keynote

Hear the latest on the state of the open source Trino project. Trino is the award-winning MPP SQL query engine. In this session, Trino creators discuss the latest features that have landed in the last year, the roadmap for the year ahead, and community growth highlights.

Martin Traverso, Co-Creator of Trino and CTO, Starburst
Dain Sundstrom, Co-Creator of Trino and CTO, Starburst
David Phillips, Co-Creator of Trino and CTO, Starburst

Trino for large scale ETL at Lyft

At Lyft, we are processing petabytes of data daily through Trino for various use cases. A single query can execute as long as 4 hours with terabytes of memory reserved. There are quite many challenges to operate Trino ETL at such a scale: how to make all queries as performant as possible with low failures rates; how should we define clusters, routing groups and resource groups for changing volume across a day; how to keep commitment to user SLOs during unexpected spikes, etc.

We’ll share what we’ve done with our config tunings, large query/user identifications, autoscaling and fault tolerant features to execute Trino at such a scale. We’ll also share our upcoming challenges and plans to move steps further with Trino adoption across the company.

Charles Song, Senior Software Engineer at Lyft

Rewriting history: Migrating petabytes of data to Apache Iceberg using Trino

Dataset interoperability between data platform components continues to be a difficult hurdle to overcome. This short coming often results in siloed data and frustrated users. Although open table formats like Apache Iceberg aim to break down these silos by providing a consistent and scalable table abstraction, migrating your pre-existing data archive to a new format can still be daunting. This talk will outline challenges we faced when rewriting petabytes of Shopify’s data into Iceberg table format using the Trino engine. A rapidly evolving landscape, I will highlight recent contributions to Trino’s Iceberg integration that made our work possible while also illustrating how we designed our system to scale. Topics will include: what to consider when designing your migration strategy, how we optimized Trino’s write performance and how to recover from corrupt table states. Finally, I will compare the query performance of old and migrated datasets using Shopify’s datasets as benchmarks.

Marc Laforet, Senior Data Engineer at Shopify

Federating them all on Starburst Galaxy!

You’ve federated them all on Trino, but to beat the elite four at Indigo Plateau, every data trainer needs help. In this talk, I will cover how Starburst Galaxy is the fastest path to query federation and cover a demo that trainers can follow later. We’ll also cover cool features like schema discovery and fault-tolerance execution. The queries we’ll run will be with Pokémon data so that you don’t have to witness yet another taxi cab or iris data set.

Monica Miller, Developer Advocate at Starburst*

Using Trino with Apache Airflow for (almost) all your data problems

Trino is incredibly effective at enabling users to extract insights quickly and effectively from large amount of data located in dispersed and heterogeneous federated data systems. However, some business data problems are more complex than interactive analytics use cases, and are best broken down into a sequence of interdependent steps, a.k.a. a workflow. For these use cases, dedicated software is often required in order to schedule and manage these processes with a principled approach. In this session, we will look at how we can leverage Apache Airflow to orchestrate Trino queries into complex workflows that solve practical batch processing problems, all the while avoiding the use of repetitive, redundant data movement.

Philippe Gagnon, Solutions Architect at Astronomer

Conclusion

Stay tuned for new developments in upcoming blog posts, don’t forget to register, and always, federate them all!

Trino charms Python

2022-09-20T00:00:00+00:00

Wow, have we ever come a long way with Python support for Trino. It feels like ages ago that we talked about DB-API, trino-python-client, SQLAlchemy, Apache Superset, and more in Trino Community Broadcast episode 12. More recently we talked about dbt in episode 21 and episode 30, but there is so much more for Pythonistas, Pythonians, Python programmers, and simply users of Python-powered tools.

Where are we now

Python usage shows up with nearly every Trino deployment these days, and we had some really great developments for you all recent months:

Starburst has really ramped up the contributions to the foundation of a lot of Python tools connecting to Trino. The trino-python-client receives improvements regularly and is definitely a first-class client at the same level as the JDBC driver or the CLI.
dbt Labs and Starburst have worked hard on launching and improving the dbt-trino project and enabling automated data transformation flows.
Apache Airflow use cases are abound and the integration is improving
Apache Superset and Preset continue to add features and treat Trino as a major data source and integration, and we should probably have another Trino Community Broadcast episode to see that all in action.
Airbyte was demoed at Cinco de Trino and is widely used by companies such as Lyft.

And of course there are well-known usages such as notebooks everywhere, on your workstation, in your company, and out in the cloud. But is there more? There must be!

What else could we do

All of these developments are great for our users. I want to encourage you all to try these tools and learn how amazing they are with Trino. At the same time it feels like there got to be even more. The Python ecosystem is so large, and there are probably dozens of use cases we never heard about, have not considered, or dreamed about in our wildest dreams.

On the other hand I am sure there are still problems with these tools and integrations. What is an edge case for us, might be a daily task for you. What we consider hard and complicated, might be just what you have to deal with anyway. And in the spirit of constant improvement, we really want to fix these things and make it all amazing. But we need your help.

Let us know what you think

This is now your opportunity to tell us what need to make your Trino and Python experience better.

Help Trino and Python

Conclusion

Trino, Python, and all the tools in the ecosystem go from strength to strength. With your help we want to supercharge the tooling to hero levels. With your help and input we can do it.

Join us in the python-client on Trino slack, and don’t forget to answer that survey.

Thanks, and see you at the Trino Summit 2022.

Manfred, Brian, and Dain

Trino's tenth birthday celebration recap

2022-09-12T00:00:00+00:00

What an exciting month we had in August! August marked the ten-year birthday of the Trino project. Don’t worry if you missed all the excitment as we’ve condensed it all in this post.

Blog posts

We felt it necessary to chronicle the larger events that happened in the last decade of the project through the lens of where we are today.

We shared these posts on HackerNews and the Facebook and the query innovation posts both hit the front page. This resulted in one of the largest amount of page views on the Trino website in a given day - more than 25k views!

Trino ten-year timeline video

Another way we celebrated was creating an epic ten-year montage video that chronicles the incredible journey starting with the Presto project’s humble beginnings, and how it evolved into the success that Trino is today:

Birthday celebration with the creators of Trino

To cap things off last month, we hosted a meetup with the creators to reflect on the last ten years, laugh and listen to some stories from the early days, talk about the exciting features currently launching, and speculate on the next ten years of Trino. Here are some highlights you missed:

Adding dynamic catalogs

Dain discusses what dynamic catalogs could look like in Trino. Currently, to add catalogs in Trino, you need to add the new catalog configuration file and then restart Trino. With dynamic catalogs, you can add and remove these catalogs at runtime with no restart required. There is still no guarantee of exactly when this feature would arrive, but some of the foundations are currently being added. Dain dives into this a bit more in this clip

Vectorization and performance

As more marketing around vectorized databases has come up recently many have asked if Trino will be following the trend. This question comes up at an interesting time as Trino now requires Java 17 to run. Java 17 comes with a lot of capabilities to vectorize, and while we are excited to start looking into these capabilities, simply updating workloads to use vectorization doesn’t pack the performance punch that many would expect it to. The answer is more complex:

Do modern workloads benefit from vectorization? See Martin’s answer to this
Is there a benefit to vectorization over Java’s auto-vectorization? Sometimes, but Dain elaborates on when
If not vectorization, what type of performance improvements does Trino focus on? Martin and Dain list some simple but impactful ones
The debate around query time optimization versus runtime adaption. Which should you optimize first?

Polymorphic table functions

One feature that is top-of-mind for everyone in the Trino project are polymorphic table functions or simply “table functions” as Dain prefers to call them.

What is a table function? David and Dain discuss standard and polymorphic table functions
Could we rewrite the Google Sheets connector as a table function?. David and Dain discuss how this would work
Why table functions are so incredibly powerful. Eric and Dain talk about why PTFs are a game changer

If you want to learn more about polymorphic table functions, check out the recent Trino Community Broadcast episode that covers the potential of these functions in much more detail.

The early days of Presto and Trino

We wanted to get some insight into what the early days of the project looked like, and how Martin, Dain, David, and Eric began the daunting task of designing and building a distributed query engine from scratch. Some of the discussions were interesting while others were downright hilarious. Here are some steps you can take to write your own query engine, at least if you want to do it the way the Trino creators did it:

Look up a bunch of research papers to see how others are doing this 📑. Video
- Side note: Papers tend to be highly aspirational and skip important fundamentals. Video
Address the real challenges of making a query engine. Video
Take your initial version and just throw it away 😂🗑🚮. Video
Expand outside the initial use cases by learning from other companies and building community 👥. Video
Cause a brownout on the Facebook network 📉. Video
Realize the system you replaced was actually faster in some cases, but for all the wrong reasons ❌🙅. Video

After a lot of the initial work was done, Presto was deployed at Facebook and soon after open sourced. From here, we know that the velocity of the project picked up and once the project was independent of Facebook, the features took off even more. While everything may seem calculated in hindsight, it was a lot of hard work to grow the community and adoption around Presto and now Trino. The creators knew they were making a project that would be utilized outside the walls of Facebook, but they could never have anticipated the sheer scale of adoption Trino would see.

Conclusion

We hope you enjoyed all the fun we had celebrating these first ten years of the Trino project. We are thrilled to think of what the following decades will bring. We’d like to leave you with closing thoughts from Dain:

40: Trino's cold as Iceberg!

2022-09-08T00:00:00+00:00

Looks like Commander Bun Bun is safe on this Iceberg
https://joshdata.me/iceberger.html

Hosts

Brian Olsen, Developer Advocate at Starburst (@bitsondatadev)
Cole Bowden, Developer Advocate at Starburst

Guests

Ryan Blue, creator of Iceberg and CEO at Tabular (@rdblue)
Sam Redai, Developer Advocate at Tabular (@samuelredai)
Tom Nats, Director of Customer Solutions at Starburst

Register for Trino Summit 2022!

Trino Summit 2022 is coming around the corner! This free event on November 10th will take place in-person at the Commonwealth Club in San Francisco, CA or can also be attended remotely! If you want to present, the call for speakers is open until September 15th.

You can register for the conference at any time. We must limit in-person registrations to 250 attendees, so register soon if you plan on attending in person!

Releases 394 to 395

Official highlights from Martin Traverso:

Trino 394

JSON output format for EXPLAIN.
Improved performance for LIKE expressions.
query table function in BigQuery connector.
INSERT support in BigQuery connector.
TLS support in Pinot connector.

Trino 395

Faster INSERT queries.
Better performance for large clusters.
Improved memory efficiency for aggregations and fault tolerant execution.
Faster aggregations over decimal columns.
Support for dynamic function resolution.

Additional highlights worth a mention according to Cole:

The improved performance of inserts on Delta Lake, Hive, and Iceberg is a huge one. We’re not entirely sure how much it’ll matter in production use cases, but some of the benchmarks suggested it could be massive - one test showed a 75% reduction in query duration.
Dynamic function resolution in the SPI is going to unlock some very neat possibilities down the line.

More detailed information is available in the release notes for Trino 394, and Trino 395.

Concept of the week: Latest features in Apache Iceberg and the Iceberg connector

It has been over a year since we had Ryan on the Trino Community Broadcast as guest to discuss what Apache Iceberg is and how it can be used in Trino. Since then, the adoption of Iceberg in our community has skyrocketed. Iceberg is delivering as a much better alternative to the Hive table format.

The initial phase of the Iceberg connector in Trino aimed to provide fast and interoperable read support. A typical usage was Trino alongside other query engines like Apache Spark which supported many of the data modification language (DML) SQL features on Iceberg. One of the biggest requests we got as adoption increased was the ability to do everything through Trino. This episode dives into some of the latest features that were missing from the early iterations of the Iceberg connector and what has changed in Iceberg as well!

What is Apache Iceberg?

Iceberg is a next-generation table format that defines a standard around the metadata used to map data to a SQL query engine. It addresses a lot of the maintainability and reliability issues many engineers experienced with the way Hive modeled SQL tables over big data files.

One common confusion to point out is that table format is not equivalent to file formats like ORC or Parquet. The table format is the layer that maintains metadata mapping these files to the concept of a table and other common database abstractions.

This episode assumes you have some basic knowledge of Trino and Iceberg already. If you are new to Iceberg or need a refresher, we recommend the two older episodes about Iceberg and Trino basics:

Why Iceberg over other formats?

There has been some great advancements to big data technologies that brought back SQL and data warehouse capabilities. However, Hive and Hive-like table formats are still missing some capabilities due to limitations that Hive tables have, such as dropping and reintroducing stale data unintentionally. On top of that, Hive tables require a lot of knowledge of Hive internals. Some recent formats aim to remain backwards compatible with Hive, but inadvertently reintroduce these limitations.

This is not the case with Iceberg. Iceberg has the most support for query engines and puts a heavy emphasis on being a format that is interoperable. This improves the level of flexibility users have to address a wider array of use cases that may involve querying over a system like Snowflake or a data lakehouse running with Iceberg. All of this is made possible by the Iceberg specification that all these query engines must follow.

Finally, a great video presented by Ryan Blue that dives into Iceberg is, “Why you shouldn’t care about Iceberg.”

Metadata catalogs

Catalogs, in the context of Iceberg, refer to the central storage of metadata. Catalogs are also used to provide the atomic compare-and-swap needed to support serializable isolation in Iceberg. We’ll refer to them as metadata catalogs to avoid confusion with Trino catalogs.

The two existing catalogs supported in Trino’s Iceberg connector are the Hive Metastore Service and the AWS metastore counterpart of the Hive Metastore, Glue. While this provides a nice migration from the Hive model, many are looking to replace these rather cumbersome catalogs with something that’s lightweight. It turns out that the Iceberg connector only uses the Hive Metastore Service to point to top-level metadata files in Iceberg while the majority of metadata exist in the metastore files in storage. This makes it even more compelling to get rid of the complex Hive service in favor of simpler services. Two popular catalogs outside of these are the JDBC catalog and the REST catalog.

There are two PRs in progress to support these metadata catalogs in Trino:

Branching, tagging, and auditing, oh my!

Another feature set that is coming in Iceberg is the ability to use refs to alias your snapshots. This would enable branching and tagging behavior similar to git and treating the snapshot as a commit. This is yet another way that simplifies moving between known states of the data in Iceberg.

On a related note, branching and tagging will eventually be used in the audit integration in Iceberg. Auditing allows you to push a soft commit by making a snapshot available, but it is not initially published to the primary table. This is achieved using Spark and setting the spark.wap.id configuration property. This enables interesting patterns like Write-Audit-Publish (WAP) pattern, where you first write the data, audit it using a data quality tool like Great Expectations, and lastly publish the data to be visible from the main table. Currently, auditing has to use the cherry-pick operation to publish. This becomes more streamlined with branching and tagging.

The Puffin file format

The Puffin file format is an alternative to Parquet and ORC. This format stores information such as indexes and statistics about data managed in an Iceberg table that cannot be stored directly within the Iceberg manifest. A Puffin file contains arbitrary pieces of information called “blobs”, along with metadata necessary to interpret them.

This format was proposed by long-time Trino maintainer, Piotr Findeisen @findepi, to address a performance issue noted when using Trino on Iceberg. The Puffin format is a great extension for those using Iceberg tables, as it enables better query plans in Trino at the file level.

pyIceberg

The pyIceberg library is an exciting development that enables users to read their data directly from Iceberg into their own Python code easily.

Trino Iceberg connector updates

MERGE (PR)
UPDATE (PR)
DELETE (PR)
Time travel (PR) was initially released in version 385, the @ syntax for snapshots/time travel was deprecated in version 387, and there were two bug fixes for this feature in versions 386 and 388.
Partition migration (PR) While Trino was able to read tables with these migrations applied by other query engines, this feature allows Trino to write these changes.
The following three features are table maintenance commands.
- optimize (PR) which is the equivalent to the Spark SQL rewrite_data_files.
- expire_snapshots (PR) and uses the equivalent name in Spark.
- remove_orphan_files (PR) and uses the equivalent name in Spark.
Iceberg v2 support (PR1, PR2, PR3, PR4, PR5, and many more…)

Almost every release has some sort of Iceberg improvement around planning or pushdown. If you want all the latest features and performance improvements described here, it’s important to keep up with the latest Trino version.

PR 13111: Scale table writers per task based on throughput

This PR of the episode was contributed by Gaurav Sehgal (@gaurav8297) to enable Trino to automatically scale writers. This PR aims to the number of task writers per worker.

You can enable this feature by setting scale_task_writers true in your configuration. Its initial test results are showing a sixfold speed increase.

Thank you so much to Gaurav and all the reviewers that got this PR through!

Demo: DML operations on Iceberg using Trino

For this demo of the episode, we use the same schema as the demo we ran in episode 15, and revise the syntax to include new features.

Let’s start up a local Trino coordinator and Hive metastore. Clone the repository and navigate to the iceberg/trino-iceberg-minio directory. Then start up the containers using Docker Compose.

git clone git@github.com:bitsondatadev/trino-getting-started.git
cd iceberg/trino-iceberg-minio
docker-compose up -d

Now open up your favorite Trino client and connect it to localhost:8080 to run the following commands:

/**
 * Make sure to first create a bucket names "logging" in MinIO before running
 */
CREATE SCHEMA iceberg.logging
WITH (location = 's3a://logging/');

/**
 * Create table
 */
CREATE TABLE iceberg.logging.logs (
   level varchar NOT NULL,
   event_time timestamp(6) with time zone NOT NULL,
   message varchar NOT NULL,
   call_stack array(varchar)
)
WITH (
   format_version = 2, -- New property to specify Iceberg spec format. Default 2
   format = 'ORC',
   partitioning = ARRAY['day(event_time)','level']
);

/**
 * Inserting two records. Notice event_time is on the same day but different hours.
 */

INSERT INTO iceberg.logging.logs VALUES
(
  'ERROR',
  timestamp '2021-04-01 12:23:53.383345' AT TIME ZONE 'America/Los_Angeles',
  '1 message',
  ARRAY ['Exception in thread "main" java.lang.NullPointerException']
),
(
  'ERROR',
  timestamp '2021-04-01 13:36:23' AT TIME ZONE 'America/Los_Angeles',
  '2 message',
  ARRAY ['Exception in thread "main" java.lang.NullPointerException']
);

SELECT * FROM iceberg.logging.logs;
SELECT * FROM iceberg.logging."logs$partitions";

/**
 * Notice one partition was created for both records at the day granularity.
 */

/**
 * Update the partitioning from daily to hourly 🎉
 */
ALTER TABLE iceberg.logging.logs
SET PROPERTIES partitioning = ARRAY['hour(event_time)'];

/**
 * Inserting three records. Notice event_time is on the same day but different hours.
 */
INSERT INTO iceberg.logging.logs VALUES
(
  'ERROR',
  timestamp '2021-04-01 15:55:23' AT TIME ZONE 'America/Los_Angeles',
  '3 message',
  ARRAY ['Exception in thread "main" java.lang.NullPointerException']
),
(
  'WARN',
  timestamp '2021-04-01 15:55:23' AT TIME ZONE 'America/Los_Angeles',
  '4 message',
  ARRAY ['bad things could be happening']
),
(
  'WARN',
  timestamp '2021-04-01 16:55:23' AT TIME ZONE 'America/Los_Angeles',
  '5 message',
  ARRAY ['bad things could be happening']
);

SELECT * FROM iceberg.logging.logs;
SELECT * FROM iceberg.logging."logs$partitions";

/**
 * Now there are three partitions:
 * 1) One partition at the day granularity containing our original records.
 * 2) One at the hour granularity for hour 15 containing two new records.
 * 3) One at the hour granularity for hour 16 containing the last new record.
 */

SELECT * FROM iceberg.logging.logs
WHERE event_time < timestamp '2021-04-01 16:55:23' AT TIME ZONE 'America/Los_Angeles';

/**
 * This query correctly returns 4 records with only the first two partitions
 * being touched. Now let's check the snapshots.
 */


SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging."logs$snapshots";

/**
 * Update
 */
UPDATE
  iceberg.logging.logs
SET
  call_stack = call_stack || 'WHALE HELLO THERE!'
WHERE
  lower(level) = 'warn';

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging."logs$snapshots";

/**
 * Read data from an old snapshot (Time travel)
 *
 * Old way: SELECT * FROM iceberg.logging."logs@2806470637437034115";
 */

SELECT * FROM iceberg.logging.logs FOR VERSION AS OF 2806470637437034115;

/**
 * Merge
 */
CREATE TABLE iceberg.logging.src (
   level varchar NOT NULL,
   message varchar NOT NULL,
   call_stack array(varchar)
)
WITH (
   format = 'ORC'
);

INSERT INTO iceberg.logging.src VALUES
 (
   'ERROR',
   '3 message',
   ARRAY ['This one will not show up because it is an ERROR']
 ),
 (
   'WARN',
   '4 message',
   ARRAY ['This should show up']
 ),
 (
   'WARN',
   '5 message',
   ARRAY ['This should show up as well']
 );

MERGE INTO iceberg.logging.logs AS t
USING iceberg.logging.src AS s
ON s.message = t.message
WHEN MATCHED AND s.level = 'ERROR'
        THEN DELETE
WHEN MATCHED
    THEN UPDATE
        SET message = s.message || '-updated',
            call_stack = s.call_stack || t.call_stack;

DROP TABLE iceberg.logging.logs;

DROP SCHEMA iceberg.logging;

This is just the tip of the iceberg that shows the powerful MERGE statement and the other features we have added to Iceberg!

Events, news, and various links

Blog posts

Check out the in-person and virtual Trino Meetup groups.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Make your Trino data pipelines production ready with Great Expectations

2022-08-24T00:00:00+00:00

An important aspect of a good data pipeline is ensuring data quality. You need to verify that the data is what you’re expecting it to be at any given state. Great Expectations is an open source tool created in Python that allows you to write detailed tests called expectations against your data. Users write these expectations to run validations against the data as it enters your system. These expectations are expressed as methods in Python, and stored in JSON and YAML files. One great advantage of expectations is the human readable documentation that results from these tests. As you roll out different versions of the code, you get alerted to any unexpected changes and have version-specific generated documentation for what changed. Let’s learn how to write expectations on tables in Trino!

The need for data quality

Managing data pipelines is not for the faint of heart. Nodes fail, you run out of memory, bursty traffic causes abnormal behavior, and that’s just the tip of the iceberg. Lots of Trino community members build sophisticated data pipelines and data applications using Trino. Building data pipelines in Trino became more common with the addition of a fault-tolerant execution mode to safeguard against failures when executing long-running and resource-intensive queries.

Aside from all the infrastructure problems that concern data teams, another category of problems that have been the silent problem for quite some time is data quality. Faulty data comes in, which can either cause data pipelines to fail, or it can possibly go unnoticed and cause inaccurate downstream reporting. Knowledge is scattered among domain experts, technical experts, and the code and data itself. Maintenance becomes time-consuming and expensive. Documentation gets out of date and unreliable. This is why using data quality checks using libraries like Great Expectations is so important when writing ETL applications.

Improve data quality in Trino with Great Expectations

As data quality moves to the forefront of the Trino community, the Great Expectations and Trino communities have partnered to do some events together:

Trino meetup to discuss Great Expectations
Great Expectations meetup to discuss Trino.
Superconductive joined this year’s mini Trino Summit event Cinco de Trino to showcase using managed solutions for Great Expectations and Trino.

Today, we’re walking through a demo that showcases a scenario with Trino running as the datalake query engine with multiple phases of data transformations on some Pokemon data sets. At each phase, we need to validate that the data is in the correct schema, counts, and various other factors to validate. We use Trino with Hive table with CSV for ingest and then move to Iceberg table for the structure and consume tables. This is one of the great uses of Trino in that you can operate using any of the popular table formats.

Trino and Great Expectations demo

In this scenario, we’re going to ingest Pokemon pokedex data and Pokemon Go spawn location data which lands as raw CSV files in our data lake. We then use Trino’s Hive catalog to read the data from the landing files, clean up, and optimize that raw data into more performant ORC files in the structure tables.

The last step is to join and transform the spawn data and pokedex data into a single table that is cleaned and ready to be utilized by a data analyst, data scientist, or other data consumer. Every area of the pipeline where the data is transformed opens up a liability. The state can go from good to bad when infrastructure fails or is updated as newer versions of the pipeline roll out. This is where adding Great Expectations is crucial.

Now that you have a better understanding of the scenario, feel free to watch the video, and try running it yourself!

Try this Trino demo yourself »

Conclusion

While data quality has always been a requirement, the standards for it increase as the complexity of data lakes increase. It is a necessity that improves the trust that data consumers have in the data. Dive into the Great Expectations documentation to learn more about the existing Trino support. If you run into any issues while running the demo, reach out on Slack and let us know!

39: Raft floats on Trino to federate silos

2022-08-18T00:00:00+00:00

Guests

In this episode, we are talking to two engineers from Raft and discuss how they use Trino to connect data silos that exist across different departments in various government sectors:

Edward Morgan, Senior Platform Engineer/DevSecOps Manager at Raft
Steve Morgan, Chief Data Engineer at Raft

Register for Trino Summit 2022!

Trino Summit 2022 is coming around the corner! This will be a hybrid event on November 10th that will take place in-person at the Commonwealth Club in San Francisco, CA and can also be attended remotely! If you want to present, the call for speakers is open until September 15th.

You can register for the conference at any time. We must limit in-person registrations to 250 attendees, so register soon if you plan on attending in person!

Releases 392 to 393

Official highlights from Martin Traverso:

Trino 392

Support for dynamic filtering with fault-tolerant query execution.
Support for correlated subqueries in DELETE queries.
Support for Amazon S3 Select pushdown for JSON files.
Support for Avro format in Iceberg connector.
Faster queries when filtering by __time column in Druid.

Trino 393

Add support for MERGE.
Improved performance of highly selective LIMIT queries.
Experimental docker image for ppc64le.
Dynamic filtering support for various connectors.
Support for JSON and bytes type in Pinot.

Additional highlights worth a mention according to Manfred:

Lots of other improvements on Delta Lake, Hive, and Iceberg connectors.
Merge support in a bunch of connectors.
OAuth 2.0 refresh token fixes

More detailed information is available in the release notes for Trino 392, and Trino 393.

Concept of the episode: Trino at Raft

Raft provides consulting services and is particularly skilled at DevSecOps. One particular challenge they face is dealing with fragmented government infrastructure. In this episode, we dive in to learn how Trino enables Raft to supply government sector clients with a data fabric solution. Raft takes a special stance on using and contributing to open source solutions that run well on the cloud.

Intro to software factories

A “software factory” is an organized approach to software development that provides software design and development teams a repeatable, well-defined path to create and update software. It results in a robust, compliant, and more resilient process for delivering applications to production” – VMWare

This is a push against the previous attempts from larger government contractors who tried to build one-size-fits-all solutions that ultimately failed. The new wave of government solutions relies on methodologies similar to the software industry that append more rules and standards around technologies they can adopt in the stack.

Software factories are now a common practice for government agencies to use, as they are able to take standardized software stacks that go through rigorous validation to make sure the meet the standards of the government. One important element to these stacks are that they can be deployed in virtually any environment. A common way to do this is using Kubernetes and containers.

Standards and anatomy of a stack

With the movement towards standardization, government contractors will generally build their stack using Kubernetes templates. Kubernetes underpins each of these stacks while telemetry, monitoring, and policy agents are layered on after that. For Raft, they wanted to provide a “single pane of glass” over the existing fragmented systems that the Department of Defense (DoD) operates on. They began to develop a stack that included Trino as their method to connect data over various silos.

Data Fabric at Raft

Data Fabric is an attempt to provide government agencies the ability to set up a data mesh that is backed by Trino. Trino fits well in this narrative as it provides SQL-over-everything. Data analysts and data scientists only need to know SQL.

Data Fabric MVP is an end-to-end DataOps capability that can be deployed at the edge, in the cloud, and in disconnected environments within minutes. It provides a single control plane for normalizing and combining disparate data lakes, platforms, silos, and formats into SQL using Trino for batch data and Apache Pinot for user facing streaming analytics.

Data Fabric is driven by cloud native policy using Open Policy Agent (OPA) integrated with Trino and Kafka to provide row and column level obfuscation. It provides enterprise data catalog to view data lineage, properties, and data owners from multiple data platforms. – Raft

Security concerns around Trino

A common first question the Raft team gets asked is around Trino being a high security concern. The idea that Trino can connect to multiple data sources from one location brings up fear that individuals may gain access to information at a higher classification level than they have. The team has to educate the different users on the best practices and how to ensure this problem doesn’t occur. You will need a separate deployment of Data Fabric for each classification level and correctly identify policies in OPA that restrict visibility to information above a users’ clearance.

Iron Bank container repository

Iron Bank is a central repository of digitally-signed container images, including open-source and commercial off-the-shelf software, hardened to the DoD’s exacting specifications. Approved containers in Iron Bank have DoD-wide reciprocity across all classifications, accelerating the security approval process from months or even years down to weeks.

To be considered for inclusion into Iron Bank, container images must meet rigorous DoD software security standards. It is an extensive, continuous, complicated effort for even the most sophisticated IT teams. Continuously maintaining and managing hardening pipelines while incorporating evolving DoD specifications and addressing new vulnerabilities (CVEs) can severely stretch your resources, even if you have advanced tooling and experience in-house. (Source)

The Trino Docker image is available in Iron Bank and is maintained by folks at Booz Allen Hamilton. Their hard work makes it possible for Trino to be deployed in DoD environments.

Pull requests of the episode: PR 13354: Add S3 Select pushdown for JSON files

This PR of the episode was contributed by preethiratnam. This pull request enables S3 pushdown during a SELECT operation for JSON files. The pushdown logic is restricted to only root JSON fields, similar to CSV. S3 select does support nested column filtering on JSON files, which is planned for another PR at a later time to limit the scope.

It’s already expensive enough to query JSON files, as you pay a hefty penalty for deserialization. This at least filters out a lot of rows. Thanks to Andrii Rosa arhimondr for the review.

Demo of the episode: Running Great Expectations on a Trino Data Lakehouse Tutorial

For this episode’s demo, you’ll need a local Trino coordinator, MinIO instance, Hive metastore, and an edge node where various data libraries like Great Expectations can run. Clone the trino-datalake repository and navigate to the root directory in your cli. Then start up the containers using Docker Compose.

git clone git@github.com:bitsondatadev/trino-datalake.git

cd trino-datalake

docker-compose up -d

The rest of the demo is available in this markdown tutorial and is covered in the video demo below.

Question of the episode: How can I deploy Trino on Kubernetes without using Helm charts?

Full question from Trino Slack

This user was not able to use Helm, due to some restriction in his company. They needed the raw kubernetes yaml files to deploy Trino.

Answer: While there are very nice ways that Helm offers to directly deploy to a service that understands Helm charts, you can also use Helm on your machine to generate all the kubernetes yaml configuration files. This can be done using the helm template command. See more on this from the Trinetes episode that details this command.

Events, news, and various links

Blogs

Check out the in-person and virtual Trino Meetup groups.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Happy tenth birthday Trino!

2022-08-08T00:00:00+00:00

It’s inspiring and mindblowing to reflect on the ten year journey that has produced the community around Trino. Trino is the community-driven fork from Presto, the distributed big data SQL query engine created at Facebook in 2012. We are a community of engineers, scientists, analysts, and visionaries that work in a fast paced world where the expectations on the time to insights from our analytics and the scale of the data are ever-increasing. Sometimes words only do so much justice to encompass a journey like this one, so we created a video to let you experience it yourself! Enjoy!

Trino’s first ten years video

As we watch the video and think back to the five years Presto and Trino shared, you begin to appreciate the organic development of the community, and the excitement around the solution space that the project brought to big data. As a baseline, Trino offers a faster and more interactive alternative to accessing data stored in HDFS via Hive. But the project didn’t stop there. Development of the SPI abstracted metadata and storage access to different systems, making Trino a suitable engine to query an entire data ecosystem from one location using ANSI SQL! Since the projects split, Trino has skyrocketed in development from the original project and added an array of features that we’ve listed out in the evolution of the Trino architecture blog post.

To really celebrate this milestone, we wanted to offer some exciting ways for you to learn more about Trino, and spin up Trino on your own system to play around with it. We have a list of blogs, project stats, and ways to get involved below. Starburst is also celebrating by offering free Trino birthday t-shirts when you complete their Space Quest League mission. Also don’t forget to attend our annual Trino Summit in November!

Learn more about Trino

Getting started with Trino

Community statistics

28250+ commits 💻 in GitHub
5750+ stargazers ⭐ in GitHub
7350+ members 👋 in Slack
6950+ pull requests merged ✅ in GitHub
4000+ issues 📝 created in GitHub
3750+ followers 🐦 on Twitter
650+ average weekly members 💬 in Slack
1050+ subscribers 📺 in YouTube
38 Trino Community Broadcast ▶️ episodes
264 Presto + Trino 🚀 releases (not including PrestoDB releases since the fork)

Join our community

Join the Trino Slack workspace
Watch the Trino Community Broadcast
Subscribe to the Trino YouTube channel
Follow us on the trinodb Twitter account
Give us a star on the Trino GitHub repository
Follow us on the Trino LinkedIn account

Trino Summit 2022

We hope you all join us in celebrating Trino’s birthday today. If you want to learn even more, sign up for our hybrid event, Trino Summit, on the 10th of November 2022. If you have a talk you’d like to give around Trino, the call for speakers is open until September 15th.

Join our community. We look forward to having you!

A decade of query engine innovation

2022-08-04T00:00:00+00:00

It’s amazing how far we have come! Our massively-parallel processing SQL query engine, Trino, has really grown up. We have moved beyond just querying object stores using Hive, beyond just one company using the project, beyond usage in Silicon Valley, beyond simple SQL SELECT statements, and definitely also beyond our expectations. Let’s have a look at some of the great technical and architectural changes the project underwent, and how we all benefit from the commitment to quality, openness and collaboration.

Runtime and deployment

Starting with how you even run Trino and install it, numerous changes came about in the last decade. We moved from Java 7 to Java 8, then to Java 11, and only recently to the latest supported Java LTS release - Java 17. Each time we benefited from the innovations in the runtime performance as well as the improved Java language features. With Java 17, we are just about to start a lot of these improvements.

When it comes to actually running and deploying Trino, the tarball is still a good choice for simple installation and as a base for other packages. Over time we added RPM archive support, which is being replaced more and more by Docker containers. The container images also enable modern deployment on Kubernetes with our Helm chart.

And let us add one last note about deployments. Trino was always designed to work on large servers. However the actual growth in a decade in the real world has amazing to see. Machine sizes keep growing to hundreds of CPU cores and closer to a terabyte of memory, and these truly large machines are now running as clusters with many workers of that size. And more and more of these deployments take advantage of our added support for the ARM processor architecture and the increasing availability of suitable servers from the cloud providers.

Security

What is security, authentication, authorization? In the beginning none of this existed in the first releases of Trino. Two years after launch we added first simple authentication and authorization support. Today the days when Kerberos was critical, and you needed to use the Java KeyStore in most deployments are long gone. The wide adoption of Trino led to improvements such as support for automatic certificate creation and TLS for internal communication, secret injection from environment variables, and the many authentication types starting with LDAP and password file, to the modern OAuth2.0 and SSO systems. Trino supports fine-grained access control and security management SQL commands like GRANT and REVOKE. You can secure connections from client tools, and use numerous methods to ensure secured access to your data sources.

Client tools and integrations

In the very beginning all you could do is submit a query to the client REST API. Very quickly we added the Trino CLI and the JDBC driver. And while it has continued to be widely used in the community, and gathered great features such as command-completion and history, different output formats, and much more, the Trino CLI is not the only tool anymore. The JDBC driver, the Python client, the Go client, and the ODBC driver from Starburst, all expanded the support for different client tools. You can query Trino in your Java-based IDE, such as IntelliJ IDEA, or database tool, such as DBeaver or Metabase. You can take advantage of visualizations in Apache Superset, or automate with Apache Airflow, dbt, or Apache Flink. And many commercial tools such as Tableau, Looker, PowerBI, or ThoughtSpot also proudly support Trino users.

SQL

All the client tools and integrations rely on the rich SQL support of Trino, which has grown tremendously. Purely analytics-related support for SELECT and all its complexities was not enough. Trino gained support for data management to create schema and tables, but also views and materialized views. And with that write support we needed INSERT, UPDATE, and DELETE. That’s all done and MERGE is next. But the core language features were not able to satisfy the needs of our users. We added functions for a large variety of topics ranging from simple string and date functions to JSON support, geospatial functions, and many others.

From the core language perspective we added newer SQL functionality, such as window functions and MATCH_RECOGNIZE support. Currently we are on a journey to implement support for table functions, including polymorphic table functions.

Connectors and data sources

When it comes to the new SQL language features, there are two categories. There are generic functions and statements that build on top of commonly used functionality like SELECT. These typically work with any connector and therefore any data sources. And then there are SQL language features that need support in a connector. After all, inserting data in PostgreSQL and an object storage system are very different. Our community has been hard at work however, and numerous connectors have gone way beyond simple read-only access.

Looking at the number of available connectors, innovation has been tremendous. The original Hive connector with support for HDFS and a Hive Metastore Service, became a powerhouse of features. Support for object storage systems including Amazon S3 and compatible systems, Azure Data Lake Storage, and Google Cloud Storage, was supplemented by support for Amazon Glue as metastore. We also constantly added support for different file formats in these systems, and improved performance for ORC, Parquet, Avro, and others.

The initial idea to support other data sources led to connectors for over a dozen other databases, including relational systems such PostgreSQL, Oracle, SQL Server, and many others. We also gained support for Elasticsearch and OpenSearch, MongoDB, Apache Kafka, and other systems that traditionally are not available to query with SQL. Trino unlocks completely new use cases for these systems.

The wide range of supported systems includes traditional data lakes and data warehouses. With the emerging new table formats and the related Trino connectors, our project is a powerful tool to run your lakehouse system. Delta Lake and Apache Iceberg connectors are already capable of full read and write operations and include numerous other features. An Apache Hudi connector is in the works and coming soon.

We also have robust and widely used connectors for real-time analytics systems like Apache Pinot, Apache Druid and Clickhouse, that are constantly improved by the community.

Query processing and performance

Last but not least, these queries also need to be processed. From the start high efficiency and low latency were a core design goal, and with features like native compilation the resulting performance surpassed other systems. Over the years our query analyzer and planner was supplemented by more and more sophisticated algorithms and features. Connectors learned to retrieve and manage table statistics, the optimizer was created and morphed into a cost-based optimizer, and we added further improvements that benefit query processing performance. We added dynamic filtering, dynamic partition pruning, predicate pushdown, join pushdown, aggregate function pushdown and numerous others. Each of these improvements was also finely tuned, and runs in production with huge workloads providing us more data on how to improve next.

One large pivot we recently added was the addition of fault-tolerant query execution mode. Queries execution can survive cluster node failures when this feature is enabled. Parts of the execution can be retried and query processing can proceed. Trino is moving on from the best analytics engine to be the best query engine for many more use case!

Looking forward

As you can see there is a lot to look back to and celebrate. But while we are definitely proud of our successes working with the community, we see no time to rest. There are many more improvements we are working on. Just to tease you a bit, let us just mention that there will be more polymorphic table functions, new lakehouse connectors and features, more client tools, and maybe even dynamic configuration of the cluster.

What would you like to add? Join us to celebrate and innovate towards your favorite features. And who knows, we might see you in the Trino Summit in November, or in a future episode of the Trino Community Broadcast.

Why leaving Facebook/Meta was the best thing we could do for the Trino Community

2022-08-02T00:00:00+00:00

It might surprise some that our departure from Facebook was one of the simplest decisions we’ve ever made. Many posts that discuss leaving a FAANG company focus on leaving some grand sum of money or prestige of working at the company. For us, we were leaving the company where we had launched a project that we knew would quickly outgrow the walls of Facebook, and solve a much larger set of problems in the analytics domain. At the time we didn’t quite anticipate that Presto, a distributed SQL query engine for big data analytics, would be adopted around the globe by thousands of companies and an overwhelming number of industries. We appreciate Facebook for serving as the launchpad that inspired others to adopt Presto. Despite the harmonious beginnings, once the needs of the community and Facebook no longer aligned, we had to leave, but we’ll get to that part shortly.

People make up communities, not companies

When we created Presto, it was clear to us that it needed to be open source. Presto started in 2012, just before the Facebook IPO. The culture was very conducive to starting an open source project. At that time, Facebook was working on Open Compute which ended up disrupting the hardware industry, and we wanted to achieve a similar impact for the analytics industry with Presto. We lobbied for and gained approval from the VP of Infrastructure, Jay Parikh, and released Presto as an open source project. It’s something that we wanted to do from the beginning, because we had worked with open source projects and believed that the most successful projects are open source.

Getting other people and companies involved makes for a healthier project. You end up not just building something that satisfies your needs, but needs from everyone else, and in turn, you benefit. We reached out personally to people from companies like Airbnb, Dropbox, Netflix, and LinkedIn to get them involved because we wanted to bootstrap a real community. Five people at Facebook hacking away was not enough. We actually had these companies beta test Presto, so that when we launched, the problems that they had found were fixed.

It’s important to understand why that’s beneficial to really grasp our philosophy behind open source. In reality, when we say we’re getting more companies involved, that’s true, but more importantly, we’re getting people involved. Individuals in the tech space are interested in solving technology problems. Companies are interested in solving problems that benefit their board, investors, and their customers. It’s incredibly common to see an overlap in the problems that engineers, analysts, and scientists are interested in solving with the problems that companies need to solve, but it’s never guaranteed.

Moreover, the interest of a company is very susceptible to change from company growth, IPOs, acquisitions, directional pivots, and general political and cultural changes. As people start to put their time and energy into a project, their own identity starts to blend with the success of the project. This is much less the case with corporations. Since corporations include many people, it only takes a small set of people in the right position to decide that a project is no longer aligned with the direction or goals of a company.

Those of us in the Trino Software Foundation believe that individuals that work on Trino actually make up the community and not the companies who so graciously allow their employees to contribute. We view our community as visionaries that want to solve problems and build systems that last for decades into the future. We don’t allow near-sighted decisions that may affect the quality of the system, or that may diminish the value of the application to the greater problem space. Most people do not want to work on something for years, and then have the company change direction and throw away all their work.

To be clear, we’re not saying it’s a bad thing when a company moves in another direction. That is the nature of business and having corporate involvement can also be a healthy component of open source. To us, however, the core of what makes a project long-lasting and beneficial for everyone using the product are the people who are there building the system and interested in the problem space. So what happened at Facebook that caused us to leave?

Why we left Facebook

As Presto became central to the infrastructure of prominent projects in Facebook, it attracted the attention of engineers and managers at Facebook who wanted to work on this project. This is a strong sign of success, but some of these folks did not have the same commitment to the open-source community. This was the source of much of the conflict as engaging in open-source takes a lot of time and effort, and we had a strict policy of “no one is special”. This means that everyone’s code is reviewed, and just because you work for Facebook you still have to earn commit rights. Engineers at Facebook are strongly motivated to create “memorable” works to advance in the company, and this means this extra work is just slowing things down. Feedback from these engineers ultimately culminated in the managers making the decision to give automatic contributor rights to any Facebook engineer working on Presto, so that these engineers could move faster.

You may think Facebook engineers or managers are the big bad wolf in this scenario, but they really are not. Engineers at these highly competitive companies must create memorable work, or they will not get the promotions they deserve. And if you are a junior engineer and do not get promoted, you get fired. Corporate leaders also have the right to change how they allocate resources to work on open-source projects. There’s nothing inherently wrong with any of this. The problem was changing the commitment we made to keep the open-source community neutral. It was at that point we knew that we had to create a fork of the project if we wanted to keep the community’s interest at the forefront for the project to remain healthy.

It was also at this point we made our single biggest mistake. We didn’t change the name away from Presto. It was admittedly hard to walk away from a name we all knew and loved. We believed that we had set up the project, so that the name “Presto” was owned by the community and not Facebook. The truth is that once the community walked out of the project, Facebook was the only one left in Presto and they became the sole owner. But, the biggest reason this was absolutely the wrong choice is much simpler; it made the people that stayed at Facebook really angry. We expected Facebook to do what they really wanted: stop doing the extra open-source work, fork internally, and leave the community alone. Instead, they somehow found the motivation to do a lot of work to set up a competing project. Finally, we spent two additional years continuing to build the Presto name rather than building the new name and brand. In hindsight, all of this was just dumb, and we were suffering from our own sunk cost fallacy. So we continued under the Presto name with the distinguishing suffix of PrestoSQL versus the original project’s PrestoDB.

Building the Trino community

The new PrestoSQL project gave a new home to the existing Presto community. It provided a project that focused on the open source community and not just the needs of Facebook. It also gave us time to troubleshoot problems of people who used Presto. This is what we were doing internally at Facebook but instead we applied our knowledge of the system towards the community. This was one of the reasons why leaving Facebook was so beneficial. As we worked closer with everyone else, we started learning what areas of the project we should focus on and it turns out that many of the things we were working on at Facebook were simply not problems that all the other people in the community were facing. This wasn’t the only benefit to us leaving Facebook, though.

The hardest part about making a new project successful is user adoption. Building great software doesn’t organically build a community. Presto gained some of its initial popularity because Facebook used it. We never had to try very hard to develop the community initially as the Facebook brand did a great job at getting people’s attention. But this community was exclusive to Silicon Valley companies. Leaving Facebook acted as a forcing function for us to build the community in a classic grassroots way. We went out and started talking to people, getting people connected, doing more promotions and events. We were pretty motivated after we left. However, all of this is a lot of work for a few programmers and while it’s great to see people respond to your work, it takes a lot out of you. This provided the conditions that gave rise for members to step up in the new project and become more involved.

We saw the pattern repeat when we were forced to rebrand and changed the name to Trino. We doubled down again on developing the community, and again participation accelerated. It’s because of this that we believe the Trino community is stronger than ever before.

Since the split, Trino release cycles have increased and far surpassed the speed we had when we were running Presto. Once brand confusion was settled with the change to the Trino name, the community numbers skyrocketed and we saw unprecedented growth in metrics like GitHub stars, YouTube subscribers, and Slack members. We have many new community-driven features released in Trino that we will be discussing in more detail in another blog post coming soon. To name a few, Trino now supports fault-tolerant execution mode, revamped its timestamp support, dynamic partition pruning, polymorphic table functions, advanced window functions, and much much more!

Conclusion

These metrics help confirm our experience in previous open source projects and with Trino. In the long run, individual-driven open source projects tend to lead to healthier communities and healthier ecosystems over company-driven open source projects. We believe that, we practice that, and we are now reaping the benefits of it as we close the pages of the first decade of this remarkable project. We can’t begin to express how thankful we are to all of you who believed in us and have helped grow Trino to what it is today. Also, we do thank the Facebook leadership, especially Jay Parikh, who gave us the green light to create and open source Presto from the beginning. We are looking forward to the twentieth and thirtieth anniversaries as we continue to disrupt the analytics industry and improve the lives of those who work in it.

Diving into polymorphic table functions with Trino

2022-07-22T00:00:00+00:00

In the Trino community, we know that being the coolest query engine is a tough job. We boldly face the intricacies of the SQL standard to bring you the newest and most powerful features. Today, we proudly announce that as of release 381, Trino is on its way to full support for polymorphic table functions (PTFs).

In this blog post, we are explaining the concept of table functions and exploring how they can be leveraged. We also look at what we have already implemented, and take a sneak peek into the future.

Definition time

There are several kinds of functions you can call in a SQL query: scalar functions, aggregate functions, and window functions. They might process the input row by row (scalar) or all at once (aggregate). One thing they have in common is that they return scalar values. Table functions are different. They return tables. In a query, they can appear in any place where a table reference shows up such as a FROM clause:

SELECT
  *
FROM
  TABLE(my_table_function('foo'));

You can also use table functions in joins:

SELECT
  *
FROM
  TABLE(my_table_function('bar'))
JOIN
  TABLE(another_table_function(1, 2, 3))
ON true;

Polymorphic table functions (PTFs) are a subset of table functions where the schema of the returned table is determined dynamically. The returned table schema can depend on the arguments you pass to the function.

OK, but why are we so excited?

We are excited because this feature is a real game changer! Polymorphic table functions make SQL extensible, provide a framework for processing data in previously impossible ways, and can act as a bridge between the Trino engine and external systems or resources you might need for processing your data. Additionally, polymorphic table functions are standard SQL, and they are very convenient to use.

What is available in Trino today?

So far, we have added a framework for table functions which can be executed by the connector. Although this is not the full PTF feature yet, we couldn’t wait to bring it to life. We added query pass-through table functions for JDBC-based connectors and ElasticSearch. They mostly go by the name query, and they take a single argument, that being the query text:

SELECT
  *
FROM
  TABLE(
    postgresql.system.query(query =>
        'SELECT
          name
        FROM
          tpch.nation
        WHERE
          nationkey = 0'
    )
  );

And this will return:

  name
---------
 ALGERIA
(1 row)

Something you can’t notice from that example is that when you’re passing that “query” argument, it’s taking the entire query and having PostgreSQL execute it. Whatever connector you’re using, the query argument you pass needs to be written so that it works on the underlying database. On the opposite and more exciting side of that, if you have a legacy query specific to a database which has non-standard SQL syntax and would be difficult to rewrite for Trino, now you can pass that entire query down to the connector by wrapping it in the query function, skipping the need to migrate it.

Besides PostgreSQL, the query table function has equivalent implementations for Druid, MySQL, Oracle, Redshift, SQL Server, MariaDB, and SingleStore. ElasticSearch has a similar function called raw_query. You can check out the Trino docs for each supported connector for full details.

But while we’re here, another cool example to showcase is using query pass-through to take advantage the MODEL clause in Oracle:

SELECT
  SUBSTR(country, 1, 20) country,
  SUBSTR(product, 1, 15) product,
  year,
  sales
FROM
  TABLE(
    oracle.system.query(
      query => 'SELECT
        *
      FROM
        sales_view
      MODEL
        RETURN UPDATED ROWS
        MAIN
          simple_model
        PARTITION BY
          country
        MEASURES
          sales
        RULES
          (sales['Bounce', 2001] = 1000,
          sales['Bounce', 2002] = sales['Bounce', 2001] + sales['Bounce', 2000],
          sales['Y Box', 2002] = sales['Y Box', 2001])
      ORDER BY
        country'
    )
  );

You can pass an entire query through to leverage a feature that isn’t a part of the SQL standard, and with that MODEL clause, Oracle can do some fancy multidimensional array processing for you right then and there, returning the results as a table back into Trino. We don’t want to get too sidetracked delving into the specifics of non-Trino tech, so if you want to learn more about what you can do, check out the connectors you use, and see what cool possibilities are out there!

What’s next?

Now that we’ve discussed what PTFs are, how they work in Trino, and what they do today, it’s useful to look forward to what’s coming next. The next thing we’re working on is adding the query function to BigQuery.

Big ideas

Beyond what’s currently planned, there’s a lot that polymorphic table functions can do for us. One common function that engineers and analysts commonly request in Trino is PIVOT. This is a capability that dynamically groups different values of an input column and converts each value as a set of columns in the output table. A potential use of PTFs would enable a PIVOT-like transformation on data, which otherwise isn’t included in the standard SQL specification.

Another exciting potential is the ability to write scripts to transform or generate tables in popular languages like Python, Scala, or Javascript. These can be used to add even more new capabilities that SQL is missing.

Looking forward

The journey to full PTF support in Trino has just begun. A dedicated operator for table functions is the next big thing. Right now, Trino can handle PTFs, but they must be pushed down to the connector and executed there. The Trino engine does not yet know how to execute them. With an operator, the Trino engine will be able to control and handle table function execution, and we will be able to pass tables as arguments to table functions. This will unlock the full potential of PTFs in Trino, and empower Trino to solve a new class of problems and expand its potential for application in many new domains.

If you have any questions or ideas for table functions that you would find useful, reach out to us on the Trino Slack, and we would love to hear your thoughts and feedback. We’ll also be doing a Trino Community Broadcast on PTFs on July 28th @ 1pm EDT, so tune in then to have your questions answered live!

If you want to learn more about how to implement PTFs, we are working on another blog post for you already.

Happy querying!

38: Trino tacks on polymorphic table functions

2022-07-21T00:00:00+00:00

Guests

In this episode we have the pleasure to chat with a couple familiar faces who have been hard at work building and understanding the features we’re talking about today:

Kasia Findeisen, Trino Maintainer
Martin Traverso, Trino Cocreator and Maintainer

Releases 387 to 391

Trino 387

Support for writing ORC Bloom filters for varchar columns.
Support for querying Pinot via the gRPC endpoint.
Support for predicate pushdown on string columns in Redis.
Support for OPTIMIZE on Iceberg tables with non-identity partitioning.

Trino 388

Support for JSON output in EXPLAIN.
Improved performance for row data types.
Support for OAuth 2.0 refresh tokens.
Support for table and column comments in Delta Lake.

Trino 389

Improved performance for row type and aggregation.
Faster joins when spilling to disk is disabled.
Improved performance when writing non-structural types to Parquet.
New raw_query table function for full query pass-through in Elasticsearch.

Trino 390

Support for setting comments on views.
Improved UNNEST performance.
Support for Databricks runtime 10.4 LTS in Delta Lake connector.

Trino 391

Support for AWS Athena partition projection.
Faster writing of Parquet data in Iceberg and Delta Lake.
Support for reading BigQuery external tables.
Support for table and column comments in BigQuery.

Additional highlights and notes according to Manfred:

Java 17 arrived as required runtime in 390.
Remove support for Elasticsearch versions below 6.6.0, add testing for OpenSearch 1.1.0.
New raw query table function in Elasticsearch can replace old full text search and query pass-through support.

More detailed information is available in the release notes for Trino 387, Trino 388, Trino 389, Trino 390, and Trino 391.

Concept of the episode: Polymorphic table functions

We normally cover a broad variety of topics in the Trino community broadcast, exploring different technical details, pull requests, and neat things that are going on in Trino at large. This episode, however, we’re going to be more focused, only taking a look at a particular piece of functionality that we’re all very excited about: polymorphic table functions, or PTFs for short. If you’re unfamiliar with what this means, that can sound like technobabble word soup, so we can start exploring this with a simple question…

What is a table function?

The easiest answer to this question is that it’s a function which returns a table. Scalar, aggregate, and window functions all work a little differently, but ultimately, they all return a single value each time they are invoked. Table functions are unique in that they return an entire table. This gives them some interesting properties that we’ll dive into, but it also means that you can only invoke them in situations where you’d use a full table, such as a FROM clause:

SELECT
    *
FROM
    TABLE(my_table_function('foo'));

You can also use table functions in joins:

SELECT
    *
FROM
    TABLE(my_table_function('bar'))
JOIN
    TABLE(another_table_function(1, 2, 3))
    ON true;

And while that’s all neat, it begs the question…

What can you do with table functions?

While standard table functions are cool, they have to return a pre-defined schema, which limits their flexibility. However, they still have some interesting uses as means of shortening queries or performing multiple operations at once. If you frequently find yourself selecting from the same table with a WHERE clause checking equality to a specific column but with a different value each time, you could define a table function which takes that value as a parameter and allows you to skip all the copying and pasting just for the sake of one line changing. You could take an extremely lengthy sub-query with multiple joins and abbreviate it to something as short as one of the examples above, and then use that in other queries. Or, if you want to update a table, but you also want to insert into another table as part of the same operation, you could combine those two steps into one table function, ensuring that users won’t forget the second part of that process.

So table functions are functions that return tables. It really is that simple, and we’re already two-thirds of the way to understanding what polymorphic table functions are. And now it’s time to add in that fun ‘polymorphic’ word.

What makes a table function polymorphic?

A polymorphic table function is a type of table function where the schema of the returned table is determined dynamically. This means that the returned table data, including its schema, can be determined by the arguments you pass to the function. And you might imagine, that makes PTFs a lot more powerful than an ordinary, run-of-the-mill table function.

What can you do with polymorphic table functions?

When you’re not determining the schema of the returned table well in advance, you get the flexibility to do some pretty crazy things. It can be as simple as adding or removing columns as part of the function, or it can be as complex as building and returning an entirely new table based on some input data.

Demo of the episode: The many ways you can leverage PTFs

But we’ve talked enough at a high level about what PTFs are, so now it’s a good time to look at what PTFs can actually do for you to make your life as a Trino user easier, better, and more efficient.

Possible polymorphic table functions

One thing to note - all the examples we’re about to look at are hypothetical. We’re working to bring functions similar to these to Trino soon, but there’s a few things left to implement before we get there, so for now, this is meant to highlight why we’re implementing PTFs, and we’ll take a look at what you can currently do with them a little later. When it does come time to implement these functions, they will not be exactly the same as you see them here.

Select except

Imagine a table with 10 columns, named col1, col2, col3, etc. If you want to select all the columns except the first one from that table, you end up with a query that looks like:

SELECT
    col2, col3, col4, col5, col6, col7, col8, col9, col10
FROM
    my.table;

But that’s long, and it’s a pain to type, and it gets messy, especially if your column names aren’t extremely short due to being part of a contrived example. With a simple PTF, you could get the same result with:

SELECT
    *
FROM
    TABLE(
        excl_function(
            data => TABLE(my.table), columns_to_exclude => DESCRIPTOR("col1")
        )
    );

Now, this isn’t a great PTF, because it’s going to take more time to implement than it takes to just write out your column names, and at least when we’re using only 10 columns and short column names, invoking the function takes more writing than doing it the old-fashioned way. Also, this is going to perform worse than writing the query the ordinary way. As a rule of thumb, if it can be written with normal SQL, it will be more performant when done that way. There are plans to work on optimizing PTFs, but that’s not going to happen soon, so for the time being, we’re focusing on how they enable things which previously couldn’t be done at all, rather than making queries look nicer or cleaner.

All that said, we wanted to include this example because this does a good job at demonstrating how polymorphic table functions can work and what they can do for you. But it’s a simple example, and now we can look at some which are a little more complex and a little more practical.

CSVreader

If you’ve ever tried to create a table from a CSV file, you know it can be a painful experience. It has to be very explicit, very diligent, and there’s a lot of manual cross-checking involved in ensuring that each column aligns perfectly and is correctly typed for the columns present in the CSV. Enter polymorphic table functions, here to save the day.

Remember, this is hypothetical, so by the time we get to implementing something similar to this in Trino, it will certainly look different. But a table function like this will be defined on the connector, so all the end user needs to worry about is what its signature might look like:

FUNCTION CSVreader (
    Filename VARCHAR(1000),
    FloatCols DESCRIPTOR DEFAULT NULL,
    DateCols DESCRIPTOR DEFAULT NULL
    )
RETURNS TABLE

One key thing to note here is the DESCRIPTOR type. It is a type that describes a list of column names, and there will be a function to convert a parameterized list to the DESCRIPTOR type. Other than that, everything else here does what you’d expect - you pass the function the name of the CSV file, the columns which should be typed as floats, and the columns which should have a date typing. All unspecified columns will still be handled as varchar. Calling the function might look something like:

SELECT
  *
FROM
    TABLE(
        CSVreader(
            Filename => 'my_file.csv',
            FloatCols => DESCRIPTOR("principle", "interest")
            DateCols => DESCRIPTOR("due_date")
        )
    );

Given a CSV with this content:

docno,name,due_date,principle,interest
123,Alice,01/01/2014,234.56,345.67
234,Bob,01/01/2014,654.32,543.21

Such a function would return a table that looks like:

docno	name	due_date	principle	interest
123	Alice	2014-01-01	234.56	345.67
234	Bob	2014-01-01	654.32	543.21

With a well-written PTF, the days of toiling over parsing a CSV into SQL are over!

Pivot

Pivot is an oft-requested feature which hasn’t been built in Trino because it isn’t a part of the standard SQL specification. A PIVOT keyword or built-in function isn’t planned, but with PTFs, we can support PIVOT-like functionality without needing to deviate from SQL.

A PIVOT PTF might have the following definition:

FUNCTION Pivot (
    Input_table TABLE PASS THROUGH WITH ROW SEMANTICS,
    Output_pivot_columns DESCRIPTOR,
    Input_pivot_columns1 DESCRIPTOR,
    Input_pivot_columns2 DESCRIPTOR DEFAULT NULL,
    Input_pivot_columns3 DESCRIPTOR DEFAULT NULL,
    Input_pivot_columns4 DESCRIPTOR DEFAULT NULL,
    Input_pivot_columns5 DESCRIPTOR DEFAULT NULL
)
RETURNS TABLE

But before we look at how you can invoke this, there’s a few clauses here that are worth explaining…

PASS THROUGH means that the input data (and all of its rows) will be fully available in the output. The alternative to this is NO PASS THROUGH.
WITH ROW SEMANTICS means that the result will be determined on a row-by-row basis. The alternative to this is WITH SET SEMANTICS.

And of course, the function takes some parameters, so a good function author defines what those parameters do.

‘Input’ is the input table. It’s any generic table.
‘Output_pivot_columns’ is the names of the columns to be created in the pivot table.
Input_pivot_columns are all the columns to be pivoted into the output columns. The first parameter is required, but you can specify more groupings. The number of input columns in a group to be pivoted and the number of output columns must be the same.

So you’ve got a PIVOT function, and you understand how to invoke it, so all you need to do is listen to Ross from Friends and make it happen:

SELECT
    D.id,
    D.name,
    P.accttype,
    P.acctvalue
FROM
    TABLE(
        Pivot(
            Input_table => TABLE (My.Data) AS D,
            Output_pivot_columns => DESCRIPTOR (accttype, acctvalue),
            Input_pivot_columns1 => DESCRIPTOR (accttype1, acctvalue1),
            Input_pivot_columns2 => DESCRIPTOR (accttype2, acctvalue2)
        )
    ) AS P;

If we presume we have this data in My.Data:

ID	Name	accttype1	acctvalue1	accttype2	acctvalue2
123	Alice	external	20000	internal	350
234	Bob	external	25000	internal	120

The output of that query will be:

ID	Name	accttype	acctvalue
123	Alice	external	20000
123	Alice	internal	350
234	Bob	external	25000
234	Bob	internal	120

You can see the PASS THROUGH clause in action when you select D.id and D.name.

ExecR

As a bonus cherry on top, and as an example of something very fun that you can do with PTFs, how about executing an entire script written in R?

A connector could provide a function with the signature:

FUNCTION ExecR (
    Script VARCHAR(10000),
    Input_table TABLE NO PASS THROUGH WITH SET SEMANTICS,
    Rowtype DESCRIPTOR
)
RETURNS TABLE

The inputs here are the script, which can simply be pasted into the query as text, the input table which contains the data for the script to run on, and then a descriptor for row typing, as there’s otherwise no way for the engine to know after running the R script. Worth pointing out and contrary to the PIVOT example, this function has NO PASS THROUGH because the R script will not have the ability to copy input rows into output rows.

Invoking this function is relatively straightforward:

SELECT
    *
FROM
    TABLE(
        ExecR(
            Script => '...',
            Input => TABLE(My.Data),
            Rowtype => DESCRIPTOR(col1 VARCHAR(100), col2 REAL, col3 FLOAT)
        )
    ) AS R;

And depending on your script and your data, you can make this as simple or as extreme as you’d like!

Pull request of the episode: PR 12325: Support query pass-through for JDBC-based connectors

We’ve spent a lot of time talking about hypothetical value that we will be able to derive from polymorphic table functions sometime down the line, but we should also pump the brakes a little and take a look at what we already have in Trino in terms of polymorphic table functions. This PR, authored by Kasia Findeisen, was the first code to land in Trino that allowed access to PTFs. It’s just one particular PTF, but it’s pretty neat, so we can jump into it with a demo and an explanation for how we’re already changing the game with PTFs.

Demo of the episode #2: Using connector-specific features with query pass-through

Trino sticks to the SQL standard, which means that custom extensions and syntax aren’t supported. If you’re using a Trino connector where the underlying database has a neat feature that isn’t a part of the SQL standard, you previously were unable to take advantage of that, and you knew it wasn’t going to be added to Trino. But now with query pass-through, you can leverage any of the cool non-standard extensions that belong to connectors! We’ll look at a couple different examples, but keep in mind, because this is pushing an entire query down to the connector, the possibilities will be based on what the underlying database is capable of.

`GROUP_CONCAT()` in MySQL

In a table where we have employees and their manager ID, but no direct way to list managers with all their employees, we can push down a query to MySQL and use GROUP_CONCAT() to combine them all into one column with this query:

SELECT
  *
FROM
  TABLE(
    mysql.system.query(
      query => 'SELECT
        manager_id, GROUP_CONCAT(employee_id)
      FROM
        company.employees
      GROUP BY
        manager_id'
    )
  );

MODEL clause in Oracle

The MODEL clause in Oracle is an incredibly powerful way to manipulate and view data. As it’s non-ANSI compliant, it’s specific to Oracle, but if you want to use it, now you can! Through polymorphic table functions, you can generate and perform sophisticated calculations on multidimensional arrays - try saying that five times fast. We don’t have the time to explain everything about how this feature works, but if you want clarification, you can check out the Oracle documentation on MODEL and try it out for yourself.

SELECT
  SUBSTR(country, 1, 20) country,
  SUBSTR(product, 1, 15) product,
  year,
  sales
FROM
  TABLE(
    oracle.system.query(
      query => 'SELECT
        *
      FROM
        sales_view
      MODEL
        RETURN UPDATED ROWS
        MAIN
          simple_model
        PARTITION BY
          country
        MEASURES
          sales
        RULES
          (sales['Bounce', 2001] = 1000,
          sales['Bounce', 2002] = sales['Bounce', 2001] + sales['Bounce', 2000],
          sales['Y Box', 2002] = sales['Y Box', 2001])
      ORDER BY
        country'
    )
  );

Funnily enough, Oracle also supports polymorphic table functions, so if you wanted to, you could use the query function to then invoke a PTF in Oracle, including any of the hypothetical examples we went into above! PTFs inside of PTFs are possible! …though probably not the best idea.

Question of the episode: Where are we at, and what’s coming next?

Right now, there’s a few things on the radar for moving forward with PTFs. The first and more simple task at hand is expanding the query function to other connectors. We started with the JDBC connectors, but we have also landed a similar function called raw_query for ElasticSearch, are working on a BigQuery implementation, and there may still be more yet to come.

On a broader scope, the reason this was the first PTF that was implemented is because Trino doesn’t have to do anything to make it work. The next big step in powering PTFs up is to create an operator and make the engine aware of them, so that the engine can handle and process PTFs itself, which will open the door to the wide array of possibilities we explored earlier.

And finally, once that’s done, we plan on empowering you, the Trino community, to go out and actually make some polymorphic table functions. You already can implement them today, but with those limitations: you can’t use table or descriptor arguments, and the connector has to perform the execution. But once the full framework for PTFs has been built, those examples from earlier (and many possible others) still need to be implemented. There is a developer guide on implementing table functions which exists today, but there are plans to expand it so that it’s easier to go in and add the PTFs which will make a difference for you and your workflows.

Events, news, and various links

Check out the in-person and virtual Trino Meetup groups.

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Slowikowski.

Trino updates to Java 17

2022-07-14T00:00:00+00:00

You’ve already read the title, and it’s exciting news - as of Trino version 390, which releases today, Trino has officially been updated from Java 11 to Java 17. This has a few implications, the most important of which is that if you aren’t running the Docker image (which automatically comes with the correct version of Java) and you’ve been running Trino on Java 16 or older, you’ll need to update Java to run Trino versions 390 and later. It’s also worth mentioning that newer versions of Java, such as Java 18 or 19, are not supported - they might work, but they haven’t been tested or benchmarked - Java 17 is the new, recommended version for Trino.

The reason this change is exciting is that using a new and better version of Java will make Trino better, too! This initial change is an update to the runtime version, or what the Trino engine uses while it runs. Because the Java language performs slightly better on the whole with this update, you may see some small, across-the-board performance improvements when switching from Java 11 to Java 17. So when you’ve got the time, we strongly recommend making the upgrade!

The plan is to update the build to Java 17 a few weeks from now, which will also allow us to use Java 17 APIs and the changes to the language in Trino code. With new language features, there are more tools in the development toolkit, and it’ll allow us to write cleaner and better code moving forwards.

This upgrade has been in the works for a while and been a long time coming, so if you want to learn more about the specifics, one of the best places to check that out is the Trino Community Broadcast. Updating to Java 17 was the focus of episode 36, and we also talked about it previously in episode 35. If you want to check out the code changes that made this happen, you can view the tracking issue on Github for more information.

And finally, we want to give a shoutout to Mateusz Gajewski for all the hard work in driving this change.

How to use Airflow with Trino

2022-07-13T00:00:00+00:00

The recent addition of the fault-tolerant execution architecture, delivered to Trino by Project Tardigrade, makes the use of Trino for running your ETL workloads an even more compelling alternative than ever before. We’ve set up a demo environment for you to easily give it a try in Starburst Galaxy.

With Project Tardigrade providing an out-of-the-box solution with advanced resource-aware task scheduling and granular retries at the task/query level, we still need a robust tool to schedule and manage workloads themselves. Apache Airflow is a great choice for this purpose.

Apache Airflow is a widely used workflow engine that allows you to schedule and run complex data pipelines. Airflow provides many plug-and-play operators and hooks to integrate with many third-party services like Trino.

To get started using Airflow to run data pipelines with Trino you need to complete the following steps:

Install of Apache Airflow 2.10+
Install the TrinoHook
Create a Trino connection in Airflow
Deploy a TrinoOperator
Deploy your DAGs

Installing Apache Airflow in Docker

The best way to get you going, if you don’t already have an Airflow cluster available, is to run Airflow in a container using docker compose. Just be aware that this is not best practice for a production environment.

Requirements for the host:

Docker
Docker Compose 1.28+

Step 1) Create a directory named airflow for all our configuration files.

$ mkdir airflow

Step 2) In the airflow directory create three subdirectory called dags, plugins, and logs.

$ cd airflow
$ mkdir dags plugins logs

Step 3) Download the Airflow docker compose yaml file.

$ curl -LfO 'https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml'

Step 4) Create an .env configuration file:

$ echo -e "AIRFLOW_UID=$(id -u)" > .env
$ echo "AIRFLOW_GID=0" >> .env 

Step 5) Start the Airflow containers

$ docker-compose up -d

Installing the TrinoHook

If running Airflow in docker, you need to install the TrinoHook in all the docker containers using the apache/airflow:x.x.x image.

$ docker ps 
CONTAINER ID   IMAGE                  PORTS                              NAMES
cffdfaeb757e   apache/airflow:2.3.0   0.0.0.0:8080->8080/tcp             airflow_airflow-webserver_1
b0e72f479a66   apache/airflow:2.3.0   8080/tcp                           airflow_airflow-worker_1
4cdb11b3e5e3   apache/airflow:2.3.0   8080/tcp                           airflow_airflow-triggerer_1
41d3c3107ddb   apache/airflow:2.3.0   0.0.0.0:5555->5555/tcp, 8080/tcp   airflow_flower_1
229a11e9cdd3   apache/airflow:2.3.0   8080/tcp                           airflow_airflow-scheduler_1
68160240857d   postgres:13            5432/tcp                           airflow_postgres_1
a96b98da85df   redis:latest           6379/tcp                           airflow_redis_1

To install the TrinoHook you run pip install apache-airflow-providers-trino in the first five containers. Run the following command replacing the container id of each of the containers in your deployment.

$ docker exec -it <container_id> pip install apache-airflow-providers-trino

Once you have done that you need to restart all five containers:

$ docker container restart <container_id_1> ... <container_id_5>

Creating a Trino connection

After you have installed the TrinoHook and restarted Airflow you can create a connection to your Trino cluster through the Airflow web UI. If you just installed Airflow, then go to http://localhost:8080 on your browser and login. The default credentials unless changed are airflow for username and password.

Go to Admin > Connections.

Click on the blue button to Add a new record.

Select Trino from the Connection Type dropdown and provide the following information:

Connection Id	Whatever you want to call your connection.
Host	The hostname or host ip of your trino cluster, e.g., `localhost`, `10.10.10.1`, or `www.mytrino.com`
Schema	A schema in your Trino cluster.
Login	The username of the user that Airflow uses to connect to Trino. Best practice would be to create a service account like ‘airflow’. Just understand that this user access level is used to execute SQL statements in Trino.
Password	The password of the user that Airflow uses to connect to Trino if authentication is enabled.
Port	The port where the Trino Web UI can be accessed, e.g., `8080`, `8443`.
Extra	Additional settings, like `protocol:https` if using TLS, or `verify:false` if you are using a self-signed certificate.

Be aware that the test button might not actually return any feedback for Trino connections.

Deploying a TrinoOperator

At the time of writing this article there is no TrinoOperator, so you have to write your own. You find an implementation in the following section, to get you started. This operator allows you to execute any SQL statements that Trino supports such as SELECT, INSERT, CREATE, SET SESSION, and others. You can run multiple statements in a single task so they are part of a single Trino session.

To create the TrinoOperator use your favorite text editor to create a file called trino_operator.py with the following code in it and place it in the airflow/plugins directory you created earlier. Airflow automatically compiles the code and you are ready to start writing DAGs.

For those new to Airflow, DAG (Directed Acyclic Graph) is a core Airflow concept, a collection of tasks with dependencies and relationships that indicate to Airflow how they should be executed. DAGs are written in Python.

from airflow.models.baseoperator import BaseOperator
from airflow.utils.decorators import apply_defaults
from airflow.providers.trino.hooks.trino import TrinoHook
import logging
from typing import Sequence, Callable, Optional

def handler(cur):
    cur.fetchall()

class TrinoCustomHook(TrinoHook):

    def run(
        self,
        sql,
        autocommit: bool = False,
        parameters: Optional[dict] = None,
        handler: Optional[Callable] = None,
    ) -> None:
        """:sphinx-autoapi-skip:"""

        return super(TrinoHook, self).run(
            sql=sql, autocommit=autocommit, parameters=parameters, handler=handler
        )

class TrinoOperator(BaseOperator):

    template_fields: Sequence[str] = ('sql',)

    @apply_defaults
    def __init__(self, trino_conn_id: str, sql, parameters=None, **kwargs) -> None:
        super().__init__(**kwargs)
        self.trino_conn_id = trino_conn_id
        self.sql = sql
        self.parameters = parameters

    def execute(self, context):
        task_instance = context['task']

        logging.info('Creating Trino connection')
        hook = TrinoCustomHook(trino_conn_id=self.trino_conn_id)

        sql_statements = self.sql

        if isinstance(sql_statements, str):
            sql = list(filter(None,sql_statements.strip().split(';')))

            if len(sql) == 1:
                logging.info('Executing single sql statement')
                sql = sql[0]
                return hook.get_first(sql, parameters=self.parameters)

            if len(sql) > 1:
                logging.info('Executing multiple sql statements')
                return hook.run(sql, autocommit=False, parameters=self.parameters, handler=handler)

        if isinstance(sql_statements, list):
            sql = []
            for sql_statement in sql_statements:
                sql.extend(list(filter(None,sql_statement.strip().split(';'))))

            logging.info('Executing multiple sql statements')
            return hook.run(sql, autocommit=False, parameters=self.parameters, handler=handler)

Deploying a DAG

Now that you have deployed the TrinoOperator you can start writing DAGs for your data pipelines. Let’s write and deploy a simple sample DAG. DAGs just like the TrinoOperator are deployed into the airflow/dags directory you created earlier.

Create a file called my_first_trino_dag.py with the following code, and save it in the airflow/dags directory.

import pendulum

from airflow import DAG
from airflow.operators.python_operator import PythonOperator

from trino_operator import TrinoOperator

## This method is called by task2 (below) to retrieve and print to the logs the return value of task1
def print_command(**kwargs):
        task_instance = kwargs['task_instance']
        print('Return Value: ',task_instance.xcom_pull(task_ids='task_1',key='return_value'))

with DAG(
    default_args={
        'depends_on_past': False
    },
    dag_id='my_first_trino_dag',
    schedule_interval='0 8 * * *',
    start_date=pendulum.datetime(2022, 5, 1, tz="US/Central"),
    catchup=False,
    tags=['example'],
) as dag:

    ## Task 1 runs a Trino select statement to count the number of records 
    ## in the tpch.tiny.customer table
    task1 = TrinoOperator(
      task_id='task_1',
      trino_conn_id='trino_connection',
      sql="select count(1) from tpch.tiny.customer")

    ## Task 2 is a Python Operator that runs the print_command method above 
    task2 = PythonOperator(
      task_id = 'print_command',
      python_callable = print_command,
      provide_context = True,
      dag = dag)

    ## Task 3 demonstrates how you can use results from previous statements in new SQL statements
    task3 = TrinoOperator(
      task_id='task_3',
      trino_conn_id='trino_connection',
      sql="select { { task_instance.xcom_pull(task_ids='task_1',key='return_value')[0] } }")

    ## Task 4 demonstrates how you can run multiple statements in a single session.  
    ## Best practice is to run a single statement per task however statements that change session 
    ## settings must be run in a single task.  The set time zone statements in this example will 
    ## not affect any future tasks but the two now() functions would timestamps for the time zone 
    ## set before they were run.
    task4 = TrinoOperator(
      task_id='task_4',
      trino_conn_id='trino_connection',
      sql="set time zone 'America/Chicago'; select now(); set time zone 'UTC' ; select now()")

    ## The following syntax determines the dependencies between all the DAG tasks.
    ## Task 1 will have to complete successfully before any other tasks run.
    ## Tasks 3 and 4 won't run until Task 2 completes.
    ## Tasks 3 and 4 can run in parallel if there are enough worker threads. 
    task1 >> task2 >> [task3, task4]

Just like with the TrinoOperator DAGs are picked up and compiled by Airflow automatically. When Airflow fails to compile your DAG it displays an error message at the top of the page in the main page where all the DAGs are listed. You can refresh this page a few times until your DAG is either added to the list or you see an error message. You can expand the message to see the source of the error. Usually the information provided is enough to understand the issue.

Once the DAG shows up on your list you can trigger a manual run, using the play button on the right to activate your DAG. I recommend switching to the Graph view, using the action links on the right to see how tasks change status as they run.

You can see logs for each task by clicking on the corresponding box and selecting Log from the options at the top.

Check out the logs for the print_command task to see the return value of select statement from task_1

As you can see, output from print() commands can be found in these logs.

Conclusion

Apache Airflow has been around for many years now. It is used by many large companies in production environments. The open source project has an active community, and I expect that in the near future we will have an official TrinoHook with additional out-of-the-box functionality. While there might be a slight learning curve for new users I think that is worth it.

On the Trino side there are some exciting enhancements for fault-tolerant execution on the roadmap of Project Tardigrade that will make Trino and Airflow an even better combination.

Stay tuned.

Note from Trino community: We welcome blog submissions from the community. If you have blog ideas, send a message in the #dev chat. We will mail you Trino swag as a token of appreciation for successful submissions. Enter the Trino Slack and join the conversation in the #project-tardigrade channel.

Discuss on Reddit

Discuss On Hacker News

Announcing the 2022 Trino Summit

2022-06-30T00:00:00+00:00

We are pleased to announce the upcoming 2022 Trino Summit. The summit is scheduled as hybrid event on the 10th of November 2022, and attendance is free! You will be able to join us online, or you can make the trip to San Francisco and meet us at the Commonwealth Club on the downtown waterfront. Please be aware that spots at the live event are limited, so register soon if you want to attend. Please also be aware that you need to register regardless of whether you’ll be joining us in-person or online.

Starburst is the lead sponsor for the summit, but they welcome other sponsors to help make this a successful event for the Trino community. If that interests you or your employer, you should contact the Starburst team for more information.

If you’d like to share your knowledge and information about Trino usage and give a talk at this year’s Trino Summit, we’re putting out a call for speakers. We will be accepting submissions from now until September 15th, but we recommend submitting soon, because slots are filling up fast.

We’re looking for intermediate to advanced-level talks on a variety of themes. If you have an interesting story about how you were able to leverage Trino, found a neat way to extend it with a custom plugin, or swapped to Trino for a performance win, we’d love to hear about it. We’re excited to expand our speaker lineup with talks from the broader Trino community. If you’re interested, you can check out the speaker registration page for more information.

And of course, we’re looking forward to seeing you there, whether in-person or online!

Update from 15th September 2022: The call for speakers is closed. Thank you for all your submissions.

Using Trino as a batch processing engine

2022-06-24T00:00:00+00:00

This past week, Andrii Rosa hosted a virtual Trino meetup on the topic of using Trino as a batch processing engine. You can view the talk from the meetup embedded below. Andrii dives into the history of Trino as an engine for Batch ETL (extract, transform, load) processing, some challenges related to that, as well as the new fault-toleration execution capabilities being added to Trino and how they improve it for Batch ETL use cases.

Andrii also gives an update on the work in progress with fault-tolerant execution, where we are today, and what’s planned for the near future. The meetup wraps up a with an attendee Q&A at the end. If you’d like to learn more, go check out the talk!

37: Trino powers up the community support

2022-06-16T00:00:00+00:00

Guests

In this episode we have the pleasure to chat with our colleagues, who now make the Trino community better every day:

Cole Bowden, Developer Advocate at Starburst
Jan Waś, Software Engineer at Starburst
Kostas Pardalis, Group Project Manager at Starburst
Monica Miller, Developer Advocate at Starburst

Releases 382 to 386

Official highlights from Martin Traverso:

Trino 382

Support for reading wildcard tables in the BigQuery connector.
Support for adding columns in the Delta Lake connector.
Support updating Iceberg table partitioning.
Improved INSERT performance in the MySQL, Oracle, and PostgreSQL connectors.
Basic authentication in the Prometheus connector.
Exchange spooling on Google Cloud Storage.

Trino 383

New json_exists, json_query, and json_value functions.
Support for table comments in the Delta Lake connector.
Support IAM roles for exchange spooling on S3.
Improved performance for aggregation queries.

Trino 384

Support for new pass-through query table function for Druid, MariaDB, MySQL, Oracle, PostgreSQL, Redshift, SingleStore and SQL Server.

Trino 385

New json_array and json_object functions.
Support for time travel syntax in the Iceberg connector.
Support for timestamp(p) type in MariaDB connector.
Performance improvements in Iceberg connector.

Trino 386

Improved performance for fault-tolerant query execution
Faster queries on Delta Lake

Additional highlights worth a mention according to Manfred:

383 had a regression, don’t use it.
As mentioned last time, exchange spooling is now supported on the three major cloud object storage systems.
Query pass-through table function is a massive feature. We are adding this to other connectors, and more details are coming in a future special episode.
Special props to Kasia for all the new JSON functions.
Phoenix 4 support is gone.

More detailed information is available in the release notes for Trino 382, Trino 383, Trino 384, Trino 385, and Trino 386.

Concept of the episode: How to strengthen the Trino community

What is community, and why has this word seen more use around technical projects, particularly those in the open-source space. There’s really no formal definition of community in the context of technology. David Spinks, author of the book, “The Business of Belonging”, defines community as:

A group of people who feel a shared sense of belonging.

For technical projects, this sense of belonging generally comes from the shared affinity towards a specific product, like Trino, or it could be a brand that hosts many products, like Google or Microsoft. There’s a lot that could be discussed here regarding why communities have become an essential ingredient to a project’s success. The quick answer I like to offer is that projects, open-source or proprietary, that have strong communities behind them innovate and grow faster, and are more successful overall.

As such, the Trino Software Foundation (TSF) recognizes that Trino will only be as successful as the health of the community that builds, tests, uses, and shares it. The activities around building a technical community fall in between engineering, marketing, and customer enablement. A common name that encompasses the individuals that work in this space is developer relations, DevRel for short. The goal of our work with the maintainers, contributors, users, and all other members of the community is the following:

Grow all aspects of the Trino project, and the Trino community to empower current and future members of the community.

We introduce some new faces who are stewards in our journey to growing the adoption of our favorite query engine, what each of them does, and how their work impacts you as a community member! Most importantly, you can learn how to get involved and help us learn how to best navigate ideas, issues, or any other contributions you may have that helps Trino to be the best query engine.

Improving the onboarding and getting started pages

We don’t really have a seamless onboarding experience for new users. Many members have asked questions on where to get started. One logical place people tend to go to when browsing on the front page of the Trino site is the getting started tab, which is ironically still on the trino.io/download.html page. When you open this page, you are brought to a page primarily containing the latest binary downloads, some community links, and some reading material to books and other resources.

The main thing you don’t really see is much getting started material. A lot of the material is intermediate level at best. There is not much beginner level guides to offer the self-service onboarding many are looking for when they just want to play around without having to bother or wait for anyone to respond. As it stands today, there is some work that Brian and Monica have started to create in this area to make the onboarding simpler.

A very common self-service getting started material is the trino-getting-started repo that Brian created to host demonstrations for the broadcast to show off some new feature or connector capabilities. This has been a good way to offer a simple environment to get them started. The only way to find this repository is to ask someone first. It would be ideal to showcase getting started materials as part of the default experience of learning about Trino.

Monica is working now on building up some demos using SaaS products like Starburst Galaxy as another method of using Trino without needing to install Docker among having to use any of your hardware to run through some examples. These options are typically more UI driven and much more approachable by members of the community that aren’t engineers or administrators.

Release process

Filling out a pull request

We’ve got a handy PR template that exists for all contributors to use when they’ve submitting a pull request to Trino. Most of it is simple and self-explanatory. We ask you to describe what’s happening, where the change is happening, and what type of change it is. These are for the sake of the reviewers, giving them some important context so they understand what’s going on when they review the code. For simpler changes, it’s not usually necessary to go into a ton of detail here, but it’s nice to give a little summary for anyone looking at the PR.

The next steps are what really matter for every single PR that’s going to be merged - the documentation and release notes for a change. These are about communicating to our users. Documentation refers to Trino docs, not code comments. If Trino users need to be told how to use the feature you’re changing because of how you’re changing it, that means we need to have documentation for it. The PR template gives the options for how to go about this, but it’s incredibly helpful to have this filled out. Similarly, we ask whether or not release notes are necessary for the change, and what release notes you propose for your change. Generally speaking, if it needs to be documented, it almost always should have a release note. Even if it isn’t documented, a release note is often a good idea - things like performance improvements don’t require our users to change how they use Trino, but they won’t mind knowing that something has gotten better! The release process involves heavy editing of release notes, so it’s ok for the suggested note to be imperfect.

What is developer experience (DevEx)?

Trino is a technology that is built by developers, but also heavily used by developers. We want to ensure that the experience of both contributors and users of Trino is the best possible. To do that, we have to focus on many different aspects of this experience, from committing code to the CLIs and tools we offer for debugging queries and most importantly to building a sustainable community that can give answers and drive the future of the project. This is what DX is for Trino.

Community metrics

A while ago we started gathering metrics related to the Trino GitHub repository. This helped us identify issues like huge CI queue times. Most importantly we. can verify that the changes we made improved things, and how much.

In February this year, the 95th percentile of the CI queue time (not even the total run time!) was as high as almost 7 hours. Trino uses public GitHub runners, and there can only be 60 jobs running concurrently at the same time. This is a bottleneck because Trino has extensive test coverage for the core engine, all connectors, and other plugins. Because we can’t increase the number of runners, we looked into doing impact analysis to skip tests for modules not impacted by any change in a pull request.

Since April, the 95th percentile of the CI queue time is under 1 hour, even though the number of contributions is at an all-time high.

We keep track of these selected metrics in reports we create by running queries using the Trino CLI, saving the results in a markdown file, and publishing them as static pages using GitHub pages. The data is gathered using Trino connectors for the GitHub API and Git repositories. There’s a GitHub actions workflow running on a schedule, that spins up a Trino server, so there’s no infrastructure to maintain, except for a single S3 bucket. All of it is publicly available in the nineinchnick/trino-cicd repository. On the right, there’s a link to GitHub pages with reports.

We continue to add more reports, like tracking flaky tests or pull request activity:

By being data-driven and transparent, we make sure to provide a good experience for everyone, and this also helps us figure out where we need more resources to focus on.

We’re open to suggestions on what to track and which metrics to report on, so feel free to open issues and pull requests in the repository mentioned above, or start a thread on the Trino Slack.

Pull request triage

One of the things we’ve been tracking over the last couple weeks has been the state of incoming PRs. We want to make sure that each PR reaches a maintainer, and that they all receive timely feedback after asking for a review. The goal in looking into this process is to help streamline and improve the time-to-initial-comment. The pleasant discovery is that it doesn’t seem like we have a lot of room to improve on that front. Not to pat ourselves on the back too heavily, but PRs find their way to maintainers, and get an initial review quite quickly, and there’s little work to be done on that front.

Our next exploration is tracking PRs that don’t quickly get approved and merged, and monitoring their life cycle and making sure follow-up reviews are happening in a timely manner as well. We now know that we are effective at giving initial feedback on a PR, but we also want to make sure that these PRs aren’t falling off a cliff or turning into a long, drawn-out process where each development iteration is slower than the last.

Pull requests of the episode: PR 12259: Support updating Iceberg table partitioning

This months PR of the episode was contributed by alexjo2144. This feature is an exciting update on the ability to modify the partition specification of a table in Iceberg. This is an update since Brian wrote about this feature

At the time of writing, Trino is able to perform reads from tables that have multiple partition spec changes but partition evolution write support does not yet exist.

This brings us much closer to having more feature parity with other query engines to manage Iceberg tables entirely through Trino. Thanks to our friend Marius Grama findinpath for the review.

Demo of the episode: Iceberg table partition migrations

For this episode’s demo, you’ll need a local Trino coordinator, MinIO instance, and Hive metastore backed by a database. Clone the trino-getting-started repository and navigate to the iceberg/trino-iceberg-minio directory. Then start up the containers using Docker Compose.

git clone git@github.com:bitsondatadev/trino-getting-started.git

cd iceberg/trino-iceberg-minio

docker-compose up -d

This demo is actually very similar to a demo we did in episode 15, except now we get to showcase one of Iceberg’s most exciting features, partition evolution.

/**
 * Make sure to first create a bucket names "logging" in MinIO before running
 */

CREATE SCHEMA iceberg.logging
WITH (location = 's3a://logging/');

CREATE TABLE iceberg.logging.logs (
   level varchar NOT NULL,
   event_time timestamp(6) with time zone NOT NULL,
   message varchar NOT NULL,
   call_stack array(varchar)
)
WITH (
   format = 'ORC',
   partitioning = ARRAY['day(event_time)']
);

/**
 * Inserting two records. Notice event_time is on the same day but different hours.
 */

INSERT INTO iceberg.logging.logs VALUES 
(
  'ERROR', 
  timestamp '2021-04-01 12:23:53.383345' AT TIME ZONE 'America/Los_Angeles', 
  '1 message',
  ARRAY ['Exception in thread "main" java.lang.NullPointerException']
),
(
  'ERROR', 
  timestamp '2021-04-01 13:36:23' AT TIME ZONE 'America/Los_Angeles', 
  '2 message', 
  ARRAY ['Exception in thread "main" java.lang.NullPointerException']
);

SELECT * FROM iceberg.logging.logs;
SELECT * FROM iceberg.logging."logs$partitions";

/**
 * Notice one partition was created for both records at the day granularity.
 */

/**
 * Update the partitioning from daily to hourly 🎉
 */
ALTER TABLE iceberg.logging.logs 
SET PROPERTIES partitioning = ARRAY['hour(event_time)'];

/**
 * Inserting three records. Notice event_time is on the same day but different hours.
 */
INSERT INTO iceberg.logging.logs VALUES 
(
  'ERROR', 
  timestamp '2021-04-01 15:55:23' AT TIME ZONE 'America/Los_Angeles', 
  '3 message', 
  ARRAY ['Exception in thread "main" java.lang.NullPointerException']
), 
(
  'WARN', 
  timestamp '2021-04-01 15:55:23' AT TIME ZONE 'America/Los_Angeles', 
  '4 message', 
  ARRAY ['bad things could be happening']
), 
(
  'WARN', 
  timestamp '2021-04-01 16:55:23' AT TIME ZONE 'America/Los_Angeles', 
  '5 message', 
  ARRAY ['bad things could be happening']
);

SELECT * FROM iceberg.logging.logs;
SELECT * FROM iceberg.logging."logs$partitions";

/**
 * Now there are three partitions:
 * 1) One partition at the day granularity containing our original records.
 * 2) One at the hour granularity for hour 15 containing two new records.
 * 3) One at the hour granularity for hour 16 containing the last new record.
 */

SELECT * FROM iceberg.logging.logs 
WHERE event_time < timestamp '2021-04-01 16:55:23' AT TIME ZONE 'America/Los_Angeles';

/**
 * This query correctly returns 4 records with only the first two partitions
 * being touched. 
 */

There’s been a lot of cool things going into the Iceberg connector these days, and another exciting one that came out in release 381 was the support for UPDATE in Iceberg. So we’re gonna showcase that:

/**
 * Update
 */
UPDATE
  iceberg.logging.logs
SET
  call_stack = call_stack || 'WHALE HELLO THERE!'
WHERE
  lower(level) = 'warn';

DROP TABLE iceberg.logging.logs;

DROP SCHEMA iceberg.logging;

Question of the episode: Can I force a pushdown join into a connected data source?

Full question from Trino Forum

Is there a way to “quote” a sub query, to tell the Trino planner just pushdown the query and don’t bother making a sub plan?

I have a star schema, with one huge table (>100M rows) and a dimension table that has static attributes of the huge table. The dimension table is filtered to create a map, that is joined to the huge table. The result is group by on a dimension and finally some of the metrics from the huge table are aggregated to calculate stats.

Answer: We’ve recently introduced Polymorphic Table Functions to Trino in version 381.

In version 384, which was just released a few days ago, the query table function was added in PR 12325.

For a quick example in MySQL:

trino> USE mysql.tiny;
USE
trino:tiny> SELECT * FROM TABLE(system.query(query => 'SELECT 1 a'));
a
---
1
(1 row)

trino:tiny> SELECT * FROM TABLE(system.query(query => 'SELECT @@version'));
@@version
-----------
8.0.29
(1 row)

So this will run exactly the command on the underlying database (not exactly a pushdown but a pass-through) and return the results to Trino as a Table. SELECT @@version is MySQL specific syntax that returns MySQL output as a table that now Trino is able to further process.

Events, news, and various links

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Building A Modern Data Stack for QazAI

2022-06-08T00:00:00+00:00

At QazAI, we build data lakes as a service for companies. In the original architecture, we get raw data in S3, transform the S3 data with Hive, and then delivered the data to business units via our datamart built on Clickhouse (for optimal delivery speeds). Over time, we were dragged down by the slower speeds and high costs of running Hive, and started shopping for a faster and cheaper open source engine to do our ETL data transformations.

This diagram shows our existing stack. The big problem to solve was that the Hadoop cluster was extremely inefficient. This leads to slow queries, and up to 10x higher costs.

Like many others, I was initially drawn to Trino to run analytics over Hive tables because of its speed, but found many other advantages as well. Key among them are the following characteristics.

Speed

Queries ran 10 to 100 times faster, compared to our old stack. It was fantastic, simply beyond our expectations.

Standard SQL

Standard SQL dialect that everyone already knew. Data analysts loved getting to use a dialect they were already familiar with.

Federated analytics

Ability to connect with other databases and run federated queries. After I had connected all the available data sources, I showed the results to the data analysts. They were simply amazed, some were shocked when the ‘join’ operation between the tables of various databases had been completed successfully. To emphasize - this saved days of work. You could join data from other data sources straight away, avoiding the need to create a staging layer in the data warehouse.

Simplicity of setup

Trino just works out of the box. This is what makes it great. As open source users, we’re used to going through a complicated software setup process. But with Trino, there’s no need to deploy anything else. You simply install packages from the open source repository, and things work. It’s magical. To top that off, Trino feels like a commercial product with its detailed documentation and active Slack community that is willing to help you out on everything.

Exploring Trino as an option for ETL

A great number of connectors, standard SQL, high processing speed - all these advantages raise an obvious question: ‘Why not use Trino for ETL processes as well?

At QazAI, the key blocker to using Trino for ETL was that Trino doesn’t have fault tolerance. As a result, our pipelines did not have reliable landing times, and required a lot of manual monitoring.

This is precisely what made Project Tardigrade so exciting for us. Proving that Trino is indeed a true community-driven project, Trino community members have embarked on the Tardigrade project. The main feature of this technology is the ability to divide the query into phases, and restart the failed phases. We’ve been running tests to explore this. The ETL pipeline on Trino running on 5 bare metal nodes is 20 times faster compared to ETL running on the stack consisting of Sqoop, HDFS, Hive, and custom Python scripts.

Testing Trino for ETL

Let’s play a bit with the rental database called DVD.

For instance, we create the database shown above in PostgreSQL and work with the rental table.

First, we move the table from PostgreSQL to our warehouse in HDFS and Hive.

CREATE TABLE hive.test.dvd_rental  
WITH (format = 'PARQUET')
AS (SELECT 
	rental_id,
	cast(rental_date AS timestamp) AS rental_date,
	inventory_id,
	cast(customer_id AS integer) AS customer_id,
	cast(return_date AS timestamp) AS return_date,
	cast(staff_id AS integer) AS staff_id,
	cast(last_update AS timestamp) AS last_update 
FROM postgresqldvd.public.rental)

Now we perform the same operation but we use the table of Iceberg format on S3 with hidden partitioning.

CREATE TABLE iceberg2.ice.dvd_rental  
WITH (partitioning = ARRAY['month(rental_date)', 'bucket(inventory_id, 10)'],
    format = 'PARQUET')
AS (SELECT 
	rental_id,
	rental_date,
	inventory_id,
	cast(customer_id AS integer) AS customer_id,
	return_date,
	cast(staff_id AS integer) AS staff_id,
	last_update 
FROM postgresqldvd.public.rental)

Now we perform the same operation:

CREATE TABLE hive.test.dvd_staff
WITH (format = 'PARQUET')
AS (SELECT 
	staff_id,
	first_name,
	last_name,
	cast(address_id AS integer) AS address_id,
	email,
	cast(store_id AS integer) AS store_id,
	active,
	username,
	password,
	cast(last_update AS timestamp) AS last_update,
	picture
FROM postgresqldvd.public.staff)

CREATE TABLE hive.test.dvd_customer
WITH (format = 'PARQUET')
AS (SELECT 
	customer_id,
	cast(store_id AS integer) AS store_id,
	first_name,
	last_name,
	email,
	cast(address_id AS integer) AS address_id,
	activebool,
	create_date,
	cast(last_update AS timestamp) AS last_update,
	active
FROM postgresqldvd.public.customer)

Great. What if there is a need to enrich the data with the employees’ and clients’ names? To do this, we create a table, move it to the core layer, and then apply denormalization.

Here we move the measurements table.

CREATE TABLE hive.test.dvd_staff
WITH (format = 'PARQUET')
AS (SELECT 
	staff_id,
	first_name,
	last_name,
	cast(address_id AS integer) AS address_id,
	email,
	cast(store_id AS integer) AS store_id,
	active,
	username,
	password,
	cast(last_update AS timestamp) AS last_update,
	picture
FROM postgresqldvd.public.staff)

CREATE TABLE hive.test.dvd_customer
WITH (format = 'PARQUET')
AS (SELECT 
	customer_id,
	cast(store_id AS integer) AS store_id,
	first_name,
	last_name,
	email,
	cast(address_id AS integer) AS address_id,
	activebool,
	create_date,
	cast(last_update AS timestamp) AS last_update,
	active
FROM postgresqldvd.public.customer)

Let’s union the Staff and Customers tables.

CREATE TABLE hive.test.dvd_core_rental
WITH (format = 'PARQUET')
AS (SELECT
	rental_id,
	rental_date,
	inventory_id,
	cst.first_name AS customer_name, --cast(customer_id as integer) as customer_id,
	cst.last_name AS customer_lastname,
	cast(return_date AS timestamp) AS return_date,
	stf.first_name AS staff_name, --cast(staff_id as integer) as staff_id,
	stf.last_name AS staff_lastname,
	rnt.last_update
FROM hive.test.dvd_rental rnt
LEFT JOIN hive.test.dvd_customer cst ON rnt.customer_id = cst.customer_id
LEFT JOIN hive.test.dvd_staff stf ON rnt.staff_id = stf.staff_id)

If this table is required by data analysts, then we can easily move it to the data mart (the Clickhouse layer we use to deliver data to end users).

CREATE TABLE clickhouse.default.rental_analysis_table
(
	rental_id integer NOT NULL,
	rental_date date,
	inventory_id integer,
	customer_name varchar NOT NULL, 
	customer_lastname varchar NOT NULL,
	return_date date,
	staff_name varchar,
	staff_lastname varchar,
	last_update date   
)
WITH (engine = 'MergeTree',
    order_by = ARRAY['customer_name', 'customer_lastname']);

A simple insert/select query and nothing more.

INSERT INTO clickhouse.default.rental_analysis_table
SELECT * FROM hive.test.dvd_core_rental

Alternatively we can easily move the datamart to Clickhouse directly from PostgreSQL without intermediate data layers.

INSERT INTO clickhouse.default.rental_analysis_table
SELECT
	rental_id,
	rental_date,
	inventory_id,
	cst.first_name AS customer_name, 
	cst.last_name AS customer_lastname,
	cast(return_date AS timestamp) AS return_date,
	stf.first_name AS staff_name, 
	stf.last_name AS staff_lastname,
	rnt.last_update
FROM postgresqldvd.public.rental rnt
LEFT JOIN postgresqldvd.public.customer cst ON rnt.customer_id = cst.customer_id
LEFT JOIN postgresqldvd.public.staff stf ON rnt.staff_id = stf.staff_i

Great.

One may suggest that this sample dataset is a small one with only 16 000 rows. The production ETL is mostly run over huge tables containing millions or billions of rows. Let’s test. We work with the tpch database with the scaling factor 3000.

For testing, we consider three tables: lineitem (18 billion rows), orders (450 million rows) and partsupp (2.4 billion rows).

CREATE TABLE iceberg2.ice.tpch_sf3000_customer –(450 M)
WITH (format = 'ORC')
AS
SELECT *
FROM tpch.sf3000.customer

CREATE TABLE iceberg2.ice.tpch_sf3000_lineitem –(18 B)
WITH (format = 'ORC')
AS
SELECT *
FROM tpch.sf3000.lineitem

CREATE TABLE iceberg2.ice.tpch_sf3000_partsupp –(2,4 B)
WITH (format = 'ORC')
AS
SELECT *
FROM tpch.sf3000.partsupp

Then, we try to join all three of these tables as it is shown in the ER diagram. Let’s make it more challenging by turning off one of the workers, which should result in a query failure. To enable the automatic query rerun of the failed one we set retry_policy=QUERY in config. properties.

CREATE TABLE iceberg2.ice.tpch_sf3000_lineitem_joined 
WITH (format = 'ORC')
AS
SELECT litem.orderkey ,
	litem.partkey ,
	litem.suppkey ,
	litem.linenumber ,
	litem.quantity ,
	litem.extendedprice ,
	litem.discount ,
	litem.tax ,
	litem.returnflag ,
	litem.linestatus ,
	litem.shipdate ,
	litem.commitdate ,
	litem.receiptdate ,
	litem.shipinstruct ,
	litem.shipmode ,
	litem.comment,
	psupp.availqty ,
	psupp.supplycost ,
	ord.shippriority ,
	ord.totalprice 
FROM iceberg2.ice.tpch_sf100000_lineitem litem
LEFT JOIN iceberg2.ice.tpch_sf100000_partsupp psupp ON litem.partkey = psupp.partkey and litem.suppkey = psupp.suppkey 
LEFT JOIN iceberg2.ice.tpch_sf100000_orders ord ON litem.orderkey = ord.orderkey 

The query has been completed in 4 hours. Also, at query processing, worker 22 has been turned off. The query has been automatically started over and completed successfully. At the query processing, three tables have been joined (the triple join): 18 billion rows x 2.4 billion rows x 450 million rows.

This experiment gave us the confidence to move forward in our plans to rebuild our architecture with Trino in order to perform analytical and transformational manipulations upon data directly in S3, which will allow us to exclude HDFS and Hive interference in these processes.

As a result we will achieve faster pipelines.

A huge thanks to the Trino development team and the Trino community for an excellent product, which I enjoy using and allows me to go beyond conventional usage patterns.

If you are looking for help building your data warehouse, or if you’re interested in joining us at QazAI, feel free to reach out to me at Baurzhan Kuspayev on the Trino Slack.

Note from Trino community: We welcome blog submissions from the community. If you have blog ideas, please send a message in the #dev chat. We will mail you Trino swag as a token of appreciation for successful submissions. Trino Slack.

Discuss on Reddit

Discuss On Hacker News

An opinionated guide to consolidating our data

2022-05-24T00:00:00+00:00

Maximizing your experience with zero choices.

I’m publishing this blog post in partnership with the Trino community to go along a lightning talk I’m giving for their event, Cinco de Trino. This article was originally published on Abhi’s Medium site

“My data is all over the place and attempting to analyze or query it is not only time consuming and expensive, but also emotionally taxing.”

Maybe you haven’t heard those exact words before, but data consolidation is a real problem. It is common for organizations to have correlated data stored in various silos or APIs. Performing consistent operations across these various data sources requires understanding both architecture and surgery, skills that you may not have picked up as a data practitioner. If you’re part of the Trino community and are reading this post, you’ve likely encountered unperformant queries due to unconsolidated data.

In the past, the data engineering world was not graced with the same level of love and tooling as other communities, so we were expected to make do with whatever came our way. In order to perform the wildly basic task of moving our data around, we were asked to tithe large sums of money to the closed-source ELT overlords.

So where does that leave us? Thankfully things have changed, so here’s how you can move all your data to a central location for free (well, minus the infrastructure costs) while making few architectural choices.

The tool

You don’t have too many choices for FOSS ELT/ETL.

Airbyte has been recently making waves as the main contender for open-source ELT. As of writing this article, it’s only been around for about two years, during which its established itself as one of the fastest growing startups in existence. It requires three terminal commands to deploy and is managed entirely through a UI, so it’s operable by many. It also supports syncing your data incrementally, so you don’t need to resync existing data when you want to sync new data. It is relatively new, so some of the polish that comes with an established project is not there yet. Think of it like a precocious child.

You could use Meltano to take advantage of the large Singer connector ecosystem, but it’s more complicated to set up and is more of a holistic ops platform, which may be excessive for your use case.

You could also use this esoteric project called KETL that is only available at this sketchy SourceForge link. But maybe don’t do that.

For consolidating your data, use Airbyte. It’s straightforward to setup, requires minor configuration, and has tightly scoped responsibilities.

The destination

Let’s use a data lake. Its unstructured nature leaves more flexibility for purpose and we’ll assume that our data has not been processed or filtered yet.

Data warehouses are more expensive, require more upkeep, and benefit from the ETL paradigm as opposed to ELT. Airbyte is an ELT tool focused mostly on the EL bit, which makes it easier to use with the unstructured data lakes.

Additionally, S3 supports query engines such as Trino, which will allow us to query and analyze our data once its been consolidated. Trino also functions as a powerful data lake transformation engine, so if you’re on the fence due to data malleability, this might help bring you over.

We could use Azure Blob Storage or GCS, but for this tutorial, I’ll be keeping it simple with Amazon S3. If you’ve set up an S3 bucket and IAM, skip the next paragraph.

Create a S3 bucket with default settings and grab an access key from IAM. To do this, head to the top right of the screen in the AWS Management Console where it says your email provider and then click on Security Credentials. Click Create New Access Key and save that information for later.

The deployment

Today, we’ll be deploying Airbyte locally on a workstation. Alternatively, you can deploy it on your own infrastructure, but this requires managing networking and security, which is unpalatable for a quick demonstration. If you want your syncs to continue running in perpetuity, you’ll want to deploy Airbyte externally to your machine. For a guide to deploying Airbyte on EC2 click here. For a guide to deploying Airbyte on Kubernetes, click here.

To begin, install Docker and docker-compose on your workstation.

Then clone the repository and spin up Airbyte with docker-compose.

git clone git@github.com:airbytehq/airbyte.git
cd airbyte
docker-compose up

Once you see the following banner, you’re good to go.

The data sources

Head over to localhost:8000 on your machine, complete the sign-up flow, and you’ll be greeted with an onboarding workflow. We’re going to skip this workflow to emulate a traditional usage of Airbyte. Click on the Sources tab in the left sidebar and click on +New Source. This is where we’ll be setting up all of our disparate data sources.

Search for your data sources in the drop down and fill out the required configuration. If you’re having trouble setting up a particular data source, head to the Airbyte docs. There’s a dedicated page for every connector; for example, this is the setup guide for the Google Analytics source. If you’re just testing Airbyte out, use the PokeAPI source, as it lets you sync dummy data with no authentication. If your required data source doesn’t exist, you can request it here or build it yourself by heading here (isn’t open-source great?)

Once you have all of your data sources set up, it will look something like this.

Now we just need to set up our connection to S3 and we are good to go.

The destination (again)

Head over to the Destinations tab in the left sidebar and follow the same process for setting up our connection to S3. Click on +New Destination and search for S3. Then fill out the configuration for your bucket. We’ll now use that access key that we generated earlier!

For output format, I recommend using Parquet for analytics purposes. It’s a columnar storage format, which is optimized for reads. JSON, CSV, and Avro are supported, but will be less performant on read.

The connection

Finally, head over to the Connections tab in the sidebar and click +New Connection. You will need to do this process for each data source that you have set up. Select any existing source and click your S3 Destination that you set up from the drop down. I failed to set up a connection with my GitHub source, so I navigated to the Airbyte Troubleshooting Discourse and filed an issue. Response times are really fast there, so I’ll likely be able to resolve this within a day or two.

You will then be greeted with the following connection setup page. For most analytics jobs, syncing more frequently than every 24 hours is expensive and overkill, so stick with the default. For sources that support it, click on the sync mode in the streams table to use the Incremental / Append sync mode. This ensures that every time you sync, Airbyte will check for new data and only pull in data that you haven’t synced before.

Once you hit Set up connection, Airbyte will run your first sync! You can click into your connection to get access to the sync logs, replication settings, and transformation settings if supported.

Checking our S3 bucket, we can see that our data has successfully reached! If you’re just testing things out, you’re done.

The analysis

Now that you’ve set up your data pipelines, if you want to run transformation jobs, Trino enables that use case well — Lyft, Pinterest, and Shopify have all done this to great success. There’s also a dbt-trino plugin managed by the folks over at Starburst. Alternatively, you could also accomplish this using S3 Object Lambda if you want to stay within the AWS landscape when possible.

Once your data is in a queryable state, you can now use Trino or your favorite query engine to your heart’s content! If you want to get started with querying these heterogenous data sources using Trino, here’s a getting-started guide on how to do that. Finally, join the Airbyte and Trino communities to find more about how others are consolidating and querying their data.

36: Trino plans to jump to Java 17

2022-05-19T00:00:00+00:00

Releases 379 to 381

Official highlights from Martin Traverso:

Trino 379

New MariaDB connector
Performance improvements for JOIN, UNION, and GROUP BY
Support for Google Cloud Storage in the Delta Lake connector
Support for Pinot 0.10

Trino 380

Update Cassandra connector to support v5 and v6 protocols.
Rename properties controlling Hive view parsing.
Allow changing file and table format with the Iceberg connector.
Add support for bulk data insertion in SQL Server connector.

Trino 381

Support for UPDATE in Iceberg connector.
Experimental support for table functions.
Support for exchange spooling on Azure Blob Storage.
Support reading snapshot tables and materialized views in BigQuery connector.

Additional highlights worth a mention according to Manfred:

Next is exchange spooling on Google Cloud Storage.
Framework for table functions is in place, implementations in connectors are coming.
ldap.ssl-trust-certificate as legacy config removes upgrade failures.
Introduce the least-waste low memory task killer policy.
Disable auto-suggestion in CLI

More detailed information is available in the release notes for Trino 379, Trino 380, and Trino 381.

Cinco de Trino recap blog post

Check out this blog post that details all the cool talks that took place at Cinco de Trino and includes video resources. This was a mini version of the Trino Summit which will take place later this year.

Question of the episode: Will Trino be making a vectorized C++ version of Trino workers?

Full question from Trino Slack

Answer: Writing a C++ worker would require each plugin to be implemented in C++ as well. However, you don’t need C++ for vectorization. Java already does a technique called auto-vectorization which we will demonstrate later in the show! Java 17 also introduces the new Vector API which unlocks complex usage patterns that we can invest in moving forward. However, there’s so much more to making operations faster than just bare metal speed that we are going to focus on.

To demonstrate this, I’d like to use an analogy about how I think of this. Comparing C++ and Java implementation is like comparing the two fastest men in the world. Usain Bolt holds the most world records for mens track to this date, and teammate Yohan Blake holds many of the second place titles. Most of us know Usain Bolt is the fastest of the two, and you may not have known or remembered Yohan’s name before. Want to hear something crazy, Yohan has beaten Usain Bolt in a few races. The two are so close in speed, it’s seconds to milliseconds difference. The main difference in this analogy is that speed is the only thing that matters in an olymic race. Howver, programming languages and frameworks have a lot more tradeoffs.

The point is, Java is fast and more importantly, it removes a lot of burden maintaining and scaling out the code. This is conducive to a healthy open-source project, and lowers the barrier for collaboration. Rather than go against this and take on the feat of having to rewrite an entire system in C++, why not lean into the incredible innovation recent Java features have to offer to improve performance even more.

Another important aspect is rather than chasing the fastest bare metal speed, it’s also incredibly important to dedicate time into ensuring that Trino’s optimizer is producing the best possible plans to avoid doing unnecessary work. To continue with the analogy, in a 100m race on a 400m track, imagine we have Usain and Yohan go head to head. We may expect that Usain will likely win, given his track record. However, if Usain is given the wrong instructions and runs in the wrong direction (300m), my bets are that Yohan will win the race.

In essence, the direction of Trino while still including bare metal performance improvements in the JVM, will instead focus on not wasting time with suboptimal query plans before or during runtime. There are so many optimizations that are constantly being added to every release that ultimately makes for a work-smarter-not-harder query engine.

Concept of the episode: Java 17 and rearchitecting Trino

As Trino prepares to update to Java 17, we wanted to give a glimpse at what has happened between the current required JDK version, JDK 11, and future version JDK 17. Both of these versions are long-term support versions, and in the four years from 11 to 17 a lot of exciting improvements were added.

Java 17 updates

Here are some updates coming up in Java 17.

Performance

There were several JDK Enhancement Proposals (JEP) that improve performance as well as many small changes to the JVM:

Performance is a multifaceted topic that includes factors like throughput, latency, memory footprint, startup, ramp up, pause times, and shut down time.

You can used standardized benchmarks like SPECjbb® 2015 to test a Java application in most of these performance factors. Aside from the formalized benchmarks, it’s interesting to see the Java community come up with microbenchmarks to test relative speedups of JVMs on their own applications. This user benchmark found an 8.66% improvement in speed when using hte G1 garbage collector. They isolated modules of their application to measure each microbenchmark separately.

Martin did a similar test late last year, and reported anywhere from 10-15% improvement in speed in Java 17 using the G1 garbage collector. This is an exciting development and we hope to publish more about this as we get closer to updating.

Garbage collectors

Although garbage collectors are performance enhancements in their own right, there is a lot of exciting changes around garbage collectors in Java 17 since Java 11 which earns garbage collectors their own section.

First not one, but two concurrent garbage collectors have made their way out of incubation, and are ready for use.

Aside from that, there are a bunch of big improvements to G1.

In a fantastic writeup and benchmark by Stefan Johansson, they ran the SPECjbb® 2015 to evaluate the improvements of different garbage collectors over different LTS versions.

Source: Stefan Johansson's Blog

Pay attention to this chart, as it showcases the advantage of having a concurrent garbage collector like ZGC or Shenandoah that doesn’t interfere with your application code. It’s incredible that 99% of the GC operations only took 0.1ms. Wild!

Source: Stefan Johansson's Blog

Take particular note of the massive improvement of G1. This is especially exciting because G1 is recommended for Trino usage. It’s still too early to determine if ZGC or Shenendoah will have overall better performance depending on the context in which the JVM is running. One thing to look forward to is the incredible drop in memory footprint over the different versions!

Source: Java YouTube Channel

Vector API (2nd incubator status)

One available capability that is still incubating is the Vector API. Trino currently takes advantage of the auto-vectorization that comes for free when the compiler detects that a loop like this one used from Daniel Strecker’s auto-vectorization blog:

/**
 * Run with this command to show native assembly:<br/>
 * Java -XX:+UnlockDiagnosticVMOptions
 * -XX:CompileCommand=print,VectorizationMicroBenchmark.square
 * VectorizationMicroBenchmark
 */
public class VectorizationMicroBenchmark {

    private static void square(float[] a) {
        for (int i = 0; i < a.length; i++) {
            a[i] = a[i] * a[i]; // line 11
        }
    }

    public static void main(String[] args) throws Exception {
        float[] a = new float[1024];

        // repeatedly invoke the method under test. this
        // causes the JIT compiler to optimize the method
        for (int i = 0; i < 1000 * 1000; i++) {
            square(a);
        }
    }
}

Without auto-vectorization, a command vmulss (multiply scalar single-precision) versus with auto-vectorization the vmulps (multiply packed single-precision) which is a SIMD instruction the JIT compiler updated for us without manual intervention.

However, this isn’t always so straightforward to detect. As you can see from the comments in the example, special criteria need to be met. For this, you can use the Vector API to directly interface with SIMD and GPU instructions. We will show more on this in the demo.

Language features

Beyond the performance improvements, Java 17 includes some exciting new Java language updates and improvements. While some may not consider this as exciting as performance boosts, language enhancements make it easier to write higher quality and maintainable code. This is especially important for an open source project that is maintained by many individuals.

A very useful change for Trino is the new support for multiline text blocks. This allows you to go from having to write a SQL query represented in a one-dimensional string literal like this:

  String query = "SELECT \"emp_id\", \"last_name\" FROM \"employee\"\n" +
                 "WHERE \"city\" = 'Indianapolis'\n" +
                 "ORDER BY \"emp_id\", \"last_name\";\n";

to a much more readable two-dimensional string block like this:

  String query = """
                 SELECT "emp_id", "last_name" FROM "employoee"
                 WHERE "city" = 'Indianapolis'
                 ORDER BY "emp_id", "last_name";
                 """;

The new switch expressions remove the difficult-to-read syntax of switches that led to many bugs and confusing code in the past. Particularly the ambiguity of the break; statement logic:

  switch (day) {
      case MONDAY:
      case FRIDAY:
      case SUNDAY:
          System.out.println(6);
          break;
      case TUESDAY:
          System.out.println(7);
          break;
      case THURSDAY:
      case SATURDAY:
          System.out.println(8);
          break;
      case WEDNESDAY:
          System.out.println(9);
          break;
  }

is made much easier to reason about using a functional clause to define the correct code to execute for a set of labels:

  switch (day) {
      case MONDAY, FRIDAY, SUNDAY -> System.out.println(6);
      case TUESDAY                -> System.out.println(7);
      case THURSDAY, SATURDAY     -> System.out.println(8);
      case WEDNESDAY              -> System.out.println(9);
  }

Always having to do a cast after checking for a type has always been an annoyance to many Java developers. Pattern Matching for instanceof makes this go away. Look at this example you may be familiar with:
```
  if (obj instanceof String) {
      String s = (String) obj;    // grr...
      ...
  }
```
Now imagine, you don’t have to have a cast statement for every one of these laying around in your codebase:
```
  if (obj instanceof String s) {
      // Let pattern matching do the work!
      ...
  }
```
Helpful NullPointerExceptions are particularly exciting as the ever confusing nulls for no reason don’t come up, and require you to chase down where it happened in the code. Instead there is new information added to the message that ideally gives you a more unique message.

Rearchitecting Trino

With all these exciting changes, what does this mean for Trino? Let’s first dive into the thing that many of our users dread…upgrading.

Upgrade to Java 17 (When it’s time)

As mentioned before, Java 17 is the current LTS version, following Java 11. Java 17 provides significant improvements that we outlined before. We believe that once we update, everyone should be running version 17 to get the best experience out of Trino. Moving to Java 17 allows us to take advantage of many improvements to the JDK and the Java language that were introduced since Java 11. There are some reasons people say they can’t update.

Updating Java in all the clients and code that calls Trino is tedious.

You luckily only need to update the server that Trino is running on. The client or CLI can still run any version of Java.
There are conflicting Java versions on the node Trino servers run.

If you are running another application depending on Java you shouldn’t be. Ideally Trino runs on its own servers. If there’s a smaller application to, for example, monitor Trino, then you should be able to install a separate version of Trino.
There is a company policy requiring specific JDKs be installed on all servers.

You can have side-by-side installs of multiple versions of the JDK and use the appropriate one. You just need to launch Trino with the correct Java
command. If your company is against using a newer JDK, you can point out the arguments above to update the policy to at least include JDK17.

Iterating and improving Trino

We’re also in the process of revamping the core execution engine, which enables us to implement the following improvements:

Perform adaptive evaluation of expressions based on runtime cost.
Specialize evaluation for different data encodings (rle, dictionary, etc).
Implement tighter evaluation loops that make it easier for the VM to vectorize automatically and generate better machine code.
Implement evaluation of certain operations more efficiently by taking advantage of SIMD or GPU-based processing.
Columnar evaluation.

Project Hummingbird

Just as we did with the efforts around Project Tardigrade we want to centralize these efforts under a project name that includes a set of motivated community members and give it a cool name.

After some discussion, we would like to announce *Project Hummingbird* is the new banner for the efforts around improving performance and concentrated updates to the core of Trino.

We chose hummingbirds as mascots because they are adaptive, light, and fast. Hummingbirds are the only birds with the incredible capability to fly in any direction and are super fast. It made sense as Trino evolves into a query engine that is capable of adapting to its environment during query runtime, it is akin to these agile and beautiful creatures.

Vectorization is not a silver bullet

There are many ways to parallelize the operations that we run on the Trino server. There’s inter-node parallelization which split data to be operated on across nodes. There’s intra-node parallelization, which generally refers to multithreading across a CPU.

As we start to move towards vectorizations, we start to become hardware dependent and just like with any other hardware setting, your mileage may vary depending on the limitations of the resources Trino is running on.

Further, any time parallelization is applied, there is generally some overhead to coordinate lookups, shuffling more data across processors, etc..

Pull requests of the episode: PR 4649: Disable JIT byte code recompilation cutoffs in default jvm.config

This episodes pull request was added by Shubham Tagra to increase the amount of memory needed to avoid JIT recompilation cutoffs for large methods in the JVM. If these limits are hit, the JIT compiler calls an uncommon_trap to deoptimize the code. If the function is continually retried, continuous deopt or a “deopt storm” can occur, and can cause a large CPU loss. The handling of this is actually a bug in the JVM so this pull request provided a workaround.

-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000

This had been reported by multiple companies from Comcast to Shopify that had these “random slowness” issues that were resolved when these JVM settings were added.

Demo of the episode: FizzBuzz - SIMD style!

Today I’m stealing, no wait, borrowing a project created by our friend Gunnar Morling. This showcases the well known FizzBuzz game, but programmatically generates the resulting patterns from the game.

Make sure you install JDK 17 before running this code.

git clone git@github.com:bitsondatadev/simd-fizzbuzz.git

mvn clean verify

java --add-modules=jdk.incubator.vector -jar target/benchmarks.jar -f 1 -wi 5 -i 5

Events, news, and various links

Blogs and Documentation

Videos

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Cinco de Trino recap: Learn how to build an efficient data lake

2022-05-17T00:00:00+00:00

When Trino (formerly PrestoSQL) arrived on the scene almost 10 years ago, it immediately became known as the much faster alternative to the data warehouse of big data, Apache Hive. The use cases that you, as the community, have built had far exceeded anything we had imagined in complexity. Together we’ve made Trino not only the fastest way to interactively query large data sets, but also a convenient way to run federated queries across data sources to make moving all the data optional.

At Cinco de Trino, we came full circle back to the next iteration of analytics architecture with the data lake. This conference offers advice from industry thought leaders about how to use best lakehouse tools with Trino to manage that data complexity. Hear from industry thought leaders like Martin Traverso (Trino), Dain Sundstrom (Trino), James Campbell (Great Expectations), Jeremy Cohen (DBT Labs), Ryan Blue (Iceberg), Denny Lee (Delta Lake), Vinoth Chandar (Hudi). You can watch the talks on-demand on the Cinco de Trino playlist.

In this post, I’d like to cover the key items from each talk you won’t want to miss.

Keynote: Trino as a data lakehouse

Trino co-creator, Martin Traverso, covers where Trino fits into the data lake and brings you a sneak peak of the future of a Trino. Polymorphic Table Functions, adaptive query planning, are some of the many exciting features Martin walks us through.

Project Tardigrade

If you have one takeaway from the conference, let it be this: there’s a new way in town to get 60% cost savings on your Trino deployment. Cory Darby walks through how utilizing the fault-tolerant execution architecture has enabled BlueCat to auto-scale their Trino clusters, and run over spot instances, which yielded massive cost savings. Zebing Lin goes through how this happens behind the scenes, and how you can run resource-intensive ETL jobs using failure recovery delivered by the team behind Project Tardigrade.

Learn more in the Project Tardigrade blog »

Try Project Tardigrade Yourself »

Starburst Galaxy lab

Starburst Galaxy enables you to get Trino up and running rather than spending your time focusing on the setup, scaling, and maintaining the infrastructure. Trino co-creator, Dain Sundstrom, walks you through a fun-filled lab that demonstrates how to use Trino as a service solution, Starburst Galaxy, to generate database rankings by ingesting, cleaning, and analyzing Twitter and Stack Overflow data.

Engineering data reliability with Great Expectations

Let’s be honest: when we claim to have run “tests” for our data pipelines, we usually mean we checked that input !=NULL, or that the dashboard isn’t broken. James Campbell showcases the Great Expectations connector for Trino. The Great Expectations connector is officially launched as the new way to write expectations (data quality checks) for your code.

What excites us the most?

The ability to take advantage of far more sophisticated data quality tests than what any of us would write.
Having a really awesome UI to manage expectations.
The data source view that makes it easy to dynamically test your custom data quality checks against backends.

Bring your data into your data lake with Airbyte

The first step of doing any analytics is bringing your data into the data lake. Ingestion engines are a gamechanger for centralizing your data in the data lake. Up until recently, there were no open software to choose from in this category. In just 10 minutes, Abhi Vaidyanatha takes us through the journey of taking in data from various places into your choice of data lake.

Read Abhi’s article about Airbyte + Trino »

Transforming your data with dbt

Ever had 300 lines of SQL in front of you, and wasted lots of time sifting through the code to find which part of the code to edit to check for duplicate customers?

Imagine having to update decimal precision used frequently throughout that SQL statement? What we <3 the most about DBT is that data engineering becomes much more like software engineering, where you code in a much more modular way. Along the way, you get many benefits: the one we love the most? Data lineage graph and automatic documentation. That’s stuff we always say is important, but never do.

Even for dbt experts, there’s something new to learn. Jeremy Cohen goes through new capabilities Trino brings to dbt, while showcasing cool features like macros: a flexible alternative to SQL defined functions.

Check out Jeremy’s demo repo »

Choosing the best data lakehouse format for you

Ever wonder about all the hype with the new table formats? Why is everyone choosing Iceberg, Delta Lake, Hudi, over Hive? The founders of each of these modern table formats showcase each of these table formats and let you be the judge of which format makes more sense to your architecture. Below are the highlights:

Iceberg

Ryan Blue dives into important elements of your data lakehouse architecture that affect daily operations and slow down developer efficiency. He then covers how Iceberg is the solution he realized to solve those issues.

The two special elements of Iceberg is that it intentionally breaks compatibility with the Hive format to bring you features like same table partition and schema evolution. I’m the surface this may seem trivial as we’ve conditioned our minds to accepting the limitations of hive-like formats.

The second special element is that Iceberg also builds a community-driven specification that enables anyone to build out the same calls to use Iceberg library.

Delta Lake

90% of the time that our Trino data pipelines break, it was because someone committed a bad upstream change. With Delta Lake time travel (coming soon!), you won’t need to spend a whole day pinpointing that bad change: just travel back in time and identify which change that was. Denny Lee gives us a compelling argument for why users desire ACID guarantees in their data lakehouse and how Delta Lake solves for that.

Similar to Iceberg, Delta lake offers optimistic concurrency, which allows there to be multiple writers to the same Delta Lake table while maintaining ACID constrains on the data.

Hudi [Coming Soon to Trino]

The coolest part of the talk? Open up a world of new possibilities with near real-time analytics in Trino with Hudi. With Hudi, you get to serve real-time production systems, debug live issues, and more.

Vinoth Chandar showcasing the compelling use cases that drove innovation around Hudi at Uber. He then covers how he views the architecture of data lakes and lakehouses are starting to merge and the implications this has on the open versus proprietary architectures.

Touch, talk, and see your data with Tableau

Tableau is our favorite data visualization tool, and in this session, Vlad Usatin of Tableau shares how to use Tableau to directly visualize your Trino data.

Thank you to all who attended or viewed, we hope to see you again at our upcoming events later this year. Continue the conversation in our Trino Slack.

Project Tardigrade delivers ETL at Trino speeds to early users

2022-05-05T00:00:00+00:00

After six months of challenging work on Project Tardigrade, we are ready to launch. With the project we improved the user experience of running resource intensive queries that are common in the Extract, Transform, Load (ETL) and batch processing space. It required some significant and fascinating engineering to get us to the current status. The latest Trino release includes all the work from Project Tardigrade. Read on to learn how it all works, and how to enable the fault-tolerant execution in Trino.

What is Project Tardigrade?

What we love most about Trino is that you get fast query speeds, and you can iterate fast with intuitive error messages, interactive experience, and query federation.

One of the big problems that persisted a long time is that configuring, tuning, and managing Trino for long-running ETL workloads is very difficult. Following are just some of the problems you have to deal with:

Reliable landing times: Queries that run for hours can fail. Restarting them from scratch wastes resources and makes it hard for you to meet your completion time requirements.
Cost-efficient clusters: Trino queries that need terabytes of distributed memory require extremely large clusters due to the lack of iterative execution.
Concurrency: Multiple independent clients may submit their queries concurrently. Due to the lack of available resources at a certain moment some of these queries may need to be killed and restarted from zero after a while. This makes the landing time even more unpredictable.

Structuring your workload to avoid these problems can be done by a team of experts. But that is not accessible to most Trino users.

The goal of Project Tardigrade is to provide an “out of the box” solution for the problems mentioned above. We’ve designed a new fault-tolerant execution architecture that allows us to implement an advanced resource-aware scheduling with granular retries.

Following are some of the benefits and results:

When your long-running queries experience a failure, they don’t have to start from scratch.
When queries require more memory than currently available in the cluster they are still able to succeed.
When multiple queries are submitted concurrently they are able to share resources in a fair way, and make steady progress.

Trino does all the hard work of allocating, configuring, and maintaining query processing behind the scenes. Instead of spending time tuning Trino clusters to match your workload requirements, or reorganizing your workload to match your Trino cluster capabilities, you can spend your time on analytics and delivering business value. And most importantly, your heart won’t skip a beat when you wake up in the morning wondering whether that query landed on time.

What did we test so far?

Since there’s no publicly available testing query set for ETL use cases, we handcrafted more than a hundred ETL-like queries based on the TPC-H and TPC-DS datasets.

To simulate real world settings, we deployed a cluster configured for fault-tolerant execution of 15 m5.8xlarge nodes and repeatedly executed thousands of queries over datasets of different sizes (10GB / 1TB / 10TB). The queries were executed sequentially as well as with concurrency factors of 5, 10, and 20. Failure recovery capabilities were tested by crashing a random node in a cluster every couple of minutes while streaming a live workload.

To validate new resource management capabilities we submitted all 22 TPC-H based queries simultaneously with fault-tolerant execution enabled and disabled. With fault-tolerant execution disabled only two of them succeeded, while the remaining twenty queries failed with resource-related issues, such as running out of memory. With fault tolerant execution enabled all of the queries succeeded with no issues.

How do I enable fault-tolerant execution?

Fault-tolerant execution can only be enabled for an entire cluster.

In general, we recommend splitting your long-running ETL queries and short-running interactive workloads and use cases to run on different cluster. This ensures that long running ETL queries do not impact interactive workloads and cause a bad user experience. Also note that any short-running, interactive queries on a fault-tolerant cluster may experience higher latencies due to the checkpoint mechanism.

1. Add an S3 bucket for checkpointing

First you need to create an S3 bucket for spooling. We recommend configuring a bucket lifecycle rule to automatically expire abandoned objects in the event of a node crash. You can configure these rules using the s3api which is included in the tutorial below.

{
    "Rules": [
        {
            "Expiration": {
                "Days": 1
            },
            "ID": "Expire",
            "Filter": {},
            "Status": "Enabled",
            "NoncurrentVersionExpiration": {
                "NoncurrentDays": 1
            },
            "AbortIncompleteMultipartUpload": {
                "DaysAfterInitiation": 1
            }
        }
    ]
}

2. Configure the Trino exchange manager

Second you need to configure exchange manager. Add a the file exchange-manager.properties in the etc folder of your Trino installation on the coordinator and all workers with the following content:

exchange-manager.name=filesystem
exchange.base-directories=s3://<bucket-name>
exchange.s3.region=us-east-1
exchange.s3.aws-access-key=<access-key>
exchange.s3.aws-secret-key=<secret-key>

3. Enable task level retries

Lastly, you need to configure and enable task level retries by adding the following properties to config.properties:

retry-policy=TASK
query.hash-partition-count=50

Note: more than 50 partitions is currently not supported by the filesystem exchange implementation.

4. Optional recommended settings

It is also recommended to enable compression to reduce the amount of data spooled on S3 (exchange.compression-enabled=true) as well as reduce the low memory killer delay to allow the resource manager to unblock nodes running short on memory faster (query.low-memory-killer.delay=0s). Additionally, we recommend enabling automatic writer scaling to optimize output file size for tables created with Trino (scale-writers=true).

To increase overall throughput and reduce resource-related task retries, we recommend adjusting the concurrency settings based on the hardware configuration you have chosen.

Following are the settings for the hardware used in our testing (32 vCPUs, 128GB memory and 10Gbit/s network):

task.concurrency=8
task.writer-count=4
fault-tolerant-execution-target-task-input-size=4GB
fault-tolerant-execution-target-task-split-count=64
fault-tolerant-execution-task-memory=5GB

By default Trino is configured to wait up to five minutes for task to recover before considering it lost and rescheduling. This timeout can be increased or reduced as necessary by adjusting the query.remote-task.max-error-duration configuration property. For example: query.remote-task.max-error-duration=1m

Deploying on AWS with Helm and Kubernetes

To test out Tardigrade features, you need at least a cluster with a dedicated coordinator and two workers for a minimal level of parallelism and performance. The quickest and easiest way to provide all of these specifications we mentioned above is by using the Trino helm chart with a provided values.yml below and deploying a cluster to the AWS EKS cloud service. If you are not familiar with deploying Trino on Kubernetes, we recommend you take a look at the Trino Community Broadcast episodes covering local Trino on Kubernetes and deploying Trino on EKS.

Try Project Tardigrade Yourself »

Closing notes

Project Tardigrade has been a great success for us already. We learned a lot and significantly improved Trino. Now we are really ready to share this with you all, and look forward to fix anything you find. We really want you to push the limits, and let us know what you find.

If running fast batch jobs on the fastest state-of-the-art query engine interests you, consider playing around with the tutorial above and giving us your feedback. You can reach us on the #project-tardigrade channel in our Slack.

If you would like to write about your experience and results, or become a contributor, also let us know on the #project-tardigrade channel. We are happy to send you Tardigrade swag as a thank you.

Thanks for reading and learning with us today. Happy Querying!

Discuss on Reddit

Discuss On Hacker News

35: Packaging and modernizing Trino

2022-04-21T00:00:00+00:00

Releases 375 to 378

Official highlights from Martin Traverso:

Trino 375

Support for table comments in the MySQL connector.
Improved predicate pushdown for PostgreSQL.
Performance improvements for aggregations with filters.

Trino 376

Better performance when reading Parquet data.
Join pushdown for MySQL.
Aggregation pushdown for Oracle.
Support table and column comments in ClickHouse connector.
Support for adding and deleting schemas in Accumulo connector.
Support system truststore in CLI and JDBC driver.
Two-way TLS/SSL certificate validation with LDAP authentication.

Trino 377

Add support for standard SQL trim syntax.
Better performance for Glue metastore.
Join pushdown for SQL Server connector.

Trino 378

New to_base32 and from_base32 functions.
New expire_snapshots and delete_orphan_files table procedures for Iceberg.
Faster planning of queries with IN predicates.
Faster query planning for Hive, Delta Lake, Iceberg, MySQL, PostgreSQL, and SQL Server connectors.

Additional highlights worth a mention according to Manfred:

Generally lots of improvements on Hive, Delta Lake, Iceberg, and main JDBC-based connectors.
Full Iceberg v2 table format support for read and later read and write operations is getting closer and closer.
Table statistics support for PostgreSQL, MySQL, and SQL Server connector including automatic join pushdown.
Fix failure of DISTINCT .. LIMIT operator when input data is dictionary encoded.
Add new page to display the runtime information of all workers in the cluster in Web UI.
Remove user property requirement in JDBC driver.
Require internal-communication.shared-secret value with authentication usage, breaking change for many users that have not set that secret.

More detailed information is available in the release notes for Trino 375, Trino 376, Trino 377, and Trino 378.

Concept of the episode: Packaging Trino

To adopt Trino you typically need to run it on a cluster of machines. These can be bare metal servers, virtual machines, or even containers. The Trino project provides a few binary packages to allow you to install Trino:

tarball
rpm
container image

All of them include a bunch of Java libraries that constitute Trino and all the plugins. As a result there are only a few requirements. You need a Linux operating system, since some of the libraries and code require Linux indirectly, and a Java 11 runtime.

Beyond that is just the bin/launcher script, which is highly recommended, but not required. It can be used as a service script or for manual starts/stop/status of Trino, and only needs Python.

Tarball

The tarball, is a gz compressed tar archive. For installation you just need to extract the archive anywhere. It contains the following directory structure.

bin, the launcher script and related files
lib, all globally needed libraries
plugins, connectors and other plugins with their own libraries each in separate sub-directories

You need to create the etc directory with the needed configuration, since the tarball does not include any defaults, and you can not start the application without those.

etc/catalog/*.properties
etc/config.properties
etc/jvm.config
etc/log.properties
etc/node.properties

Note that all the files are within the created directory.

RPM

The RPM archive is suitable for RPM-based Linux distributions, but testing is not very thorough across different versions and distributions.

It adapts the tarball content to the Linux file system hierarchy, hooks the launcher script up as daemon script, and adds default configuration files. That allows you to start Trino after installing the archive, as well as with system restarts.

Locations used are /etc/trino, /var/lib/trino, and others. These are configured via the launcher script parameters.

In a nutshell the RPM adds some convenience, but narrows down the supported Linux distributions. It still requires Java and Python installation and management.

Container image

The container image for Trino adds the necessary Linux, Java, and Python, and adapts Trino to the container setup.

The container adds even more convenience, since it is ready to use out of the box. It allows usage on Kubernetes with the help of the Helm charts, and includes the required operating system and application parts automatically.

Customization

All three package Trino ships are just defaults. They all require further configuration to adapt Trino to your specific needs in terms of hardware, connected data sources, security configuration, and so on. All of these can be done manually or with many existing tools.

However, you can also take it a step further and create your own package suited to your needs. The tarball can be used as source for any customization to create your own package. In the following is a list of options and scenarios:

Use the tarball, but remove unused plugins.
Use the tarball as source to create your own specific package. For example a deb archive for usage with Ubuntu, or an Alpine package for that same distro.
Create your own RPM similar to Manfred’s proof of concept that pulls out the Trino RPM package creation into a separate project.
Create your own container image with different base distro, custom set of plugins, and even with all your configuration baked into the image.

Others

You can also use brew on MacOS, but that is not suitable for production usage. More for convenience to get a local Trino for playing around.

Additional topic of the episode: Modernizing Trino with Java 17

Currently Java 11 is required for Trino. Java 17 is the latest and greatest Java LTS release with lots of good performance, security, and language improvements. The community has been working hard to make Java 17 support a reality. At this stage core Trino fully supports Java 17. Starburst Galaxy for example uses Java 17.

The maintainers and contributors would like to move to fully support and also require Java 17 soon. Here is where your input comes in, and we ask that you let us know your thoughts about questions such as the following:

Are you looking forward to the new Java 17 language features and other improvements as a contributor to Trino?
Are you already using Java 17 with Trino? In production or just testing?
If we require Java 17 in the next months, can you update to use Java 17 with Trino?
If not, what are some of the hurdles?
Are you okay with staying at an older release, until you can use Java 17?

Let us know on the #dev channel on Trino Slack or ping us directly. You can also chime in on the roadmap issue.

Pull requests of the episode: Worker stats in the Web UI

The PR of the episode was submitted Github user whutpencil, and adds a significant new feature to the web UI. It exposes the system.runtimes.nodes information, so statistics for each worker, in brand new pages. What a great effort! Special thanks also go out to Dawid Adamek dedep for the review.

Demo of the episode: Tarball installation and new Web UI feature

In the demo of the month Manfred shows a worker installation to add to a local tarball install of a coordinator, and then demos the Web UI with the new feature from the pull request of the month.

Question of the episode: Are write operations in Delta Lake supported for tables stored on HDFS?

Full question from Slack: I was trying the Delta Lake connector. I noticed that write operations are supported for tables stored on Azure ADLS Gen2, S3 and S3-compatible storage. Does that mean write operations are not supported for tables stored on HDFS?

Answer: HDFS is always implicitly supported for data lake connectors. It isn’t called out because it is assumed.

The confusion actually came from an error message used when the user tried to insert into a Delta Lake table they created in Spark. Then they tried inserting a record into the table through IntelliJ IDEA and received the following error message:

Unsupported target SQL type: -155

They thought the problem might be the wrong data type of birthday. Then used statement below to insert a record into the table.

INSERT INTO
  presto.people10m (id, firstname, middlename, lastname, gender, birthdate, ssn, salary)
VALUES (1, 'a', 'b', 'c', 'male', timestamp '1990-01-01 00:00:00 +00:00', 'd', 10);

However, I got an error message like this:

Query 20220419_031201_00015_8qe76 failed:
Cannot write to table in hdfs://masters/presto.db/people10m; hdfs not supported

This was an issue on the IntelliJ client.

Events, news, and various links

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

34: A big delta for Trino

2022-03-17T00:00:00+00:00

Guests

In this episode Manfred has the pleasure to chat with two colleagues, who are working on making Trino better every day:

Claudius Li, Product Manager at Starburst
Joe Lodin, Information Engineer at Starburst

Brian is out to add another member to his family!

Releases 372, 373, and 374

Official highlights from Martin Traverso:

Trino 372

New trim_array function.
Support for reading ZSTD-compressed Avro files.
Support for column comments in Iceberg.
Support for Kerberos authentication in Kudu connector.

Trino 373

New Delta Lake connector.
Improved performance of LIKE when querying Elasticsearch and PostgreSQL.
Improved performance when querying partitioned Hive tables.
Support access to S3 via HTTP proxy.

Trino 374

Faster GROUP BY queries.
Vim/Emacs editing mode for CLI.
Support for TRUNCATE TABLE in Cassandra connector.
Support uint types in ClickHouse.
Support for Glue Metastore in Iceberg connector.
Add CREATE/DROP SCHEMA, table and column comments in MongoDB
Improved pushdown for PostgreSQL

Additional highlights from Manfred

Timeout configuration for LDAP authentication.
Values related to fault-tolerant execution in Web UI.
JDBC Driver.getProperties enables more client applications like DBVisualizer.
Vi and Emacs editing modes for interactive CLI usage.
Performance improvements in PostgreSQL connector.
SingleStore JDBC driver usage, end of memsql name.
Documentation for the atop connector.

More detailed information is available in the Trino 372, Trino 373, and Trino 374 release notes.

Project Tardigrade update

The team around Project Tardigrade joined us in episode 32 to talk about fault tolerant execution of queries in Trino. Now they have posted a status update on our blog.

It looks like things are really coming along well, and Joe has joined the effort to create a first user-facing documentation set.

The team has also posted a status update on the #project-tardigrade Slack channel. Everything is ready for the community to perform first real world testing, and help us make this a great feature set for Trino.

Concept of the episode: A new connector for Delta Lake object storage

It is great to have a new connector in Trino, but what does that even mean? Let’s find out.

What is a connector?

Just a quick refresher. Trino allows you to query many different data sources with SQL statements. You enable that by creating a catalog that contains the configuration to connect to a specific data source. The data source can be a relational database, a NoSQL database, and an object storage. A connector is the translation layer that maps the concepts in the data source to the Trino concepts of schema, tables, rows, columns, data types and so on. The connector needs to know how to retrieve the data itself from data source, and also how to interact with the metadata.

Here are some examples metadata questions to answer:

What are the available tables in schema xyz?
What columns does table abc have and what are the data types?
What file format is used by the storage for table efg?

And some queries about the actual data:

Give me the top 100 rows from table A.
Give me all files in partition x in the directory y.

So having a connector for your data source in Trino is a big deal. A connector unlocks the data to all your SQL analytics powered by Trino, and the underlying data source doesn’t even have to support SQL.

What is Delta Lake?

Delta Lake is an evolution of the Hive/Hadoop object storage data source. It is an open-source storage format. Data is stored in files, typically using binary formats such as Parquet or ORC. Metadata is stored in a Hive Metastore Service (HMS).

Delta Lake supports ACID transactions, time travel, and many other features that are lacking in the legacy Hive/Hadoop setup. This combination of traditional data lake storage with data warehouse features is often called a lake house.

History of the new connector

Delta Lake is fully open source, and part of the larger enterprise platform for a lake house offered by Databricks. Starburst has supported Delta Lake users with a connector for Starburst Enterprise for nearly two years. To foster further adoption and innovation with the community, the connector was donated to Trino 373 and continues to be improved.

Pull requests of the episode: Add Delta Lake connector and documentation

Over 25 developers helped Jakob with the effort to open-source the connector. It is a heavy lift to migrate a such a full featured connectors into Trino. By comparison the documentation was easy, but it is very important to enable you. Well done everyone!

Let’s have a look at the code in a bit more detail. A couple of key facts:

The Delta Lake connector is just another plugin like all other connectors.
This is a feature-rich connector supporting read and write operations.
It shares implementation details with Hive and Iceberg connectors such as HMS access, Parquet and ORC file readers, and so on.

Demo of the episode: Delta Lake connector in action

Now let’s have a look at all this in action. In the demo Claudius uses docker-compose to start up a HMS as metastore, MinIO as object storage, and of course Trino as the query engine.

If you want to follow along, all resources used for the demo are available on our getting started repository.

Here is the sample catalog delta.properties:

connector.name=delta-lake
hive.metastore.uri=thrift://hive-metastore:9083
hive.s3.endpoint=http://minio:9000
hive.s3.aws-access-key=minio
hive.s3.aws-secret-key=minio123
hive.s3.path-style-access=true
delta.enable-non-concurrent-writes=true

Once everything is up and running we can start playing.

Verify that the catalog is available:

SHOW CATALOGS;

Check if there are any schemas:

SHOW SCHEMAS FROM delta;

Lets create a new schema:

CREATE SCHEMA delta.myschema WITH (location='s3a://claudiustestbucket/myschema');

Create a table, insert some records, and then verify:

CREATE TABLE delta.myschema.mytable (name varchar, id integer);
INSERT INTO delta.myschema.mytable VALUES ( 'John', 1), ('Jane', 2);
SELECT * FROM delta.myschema.mytable;

Run a query to get more data and insert it into a new table:

CREATE TABLE delta.myschema.myothertable AS
  SELECT * FROM delta.myschema.mytable;

SELECT * FROM delta.myschema.myothertable ;

Now for some data manipulation:

UPDATE delta.myschema.myothertable set name='Jonathan' where id=1;
SELECT * FROM delta.myschema.myothertable ;
DELETE FROM delta.myschema.myothertable where id=2;
SELECT * FROM delta.myschema.myothertable ;

And finally, lets clean up:

ALTER TABLE delta.myschema.mytable EXECUTE optimize(file_size_threshold => '10MB');
ANALYZE delta.myschema.myothertable;
DROP TABLE delta.myschema.myothertable ;
DROP TABLE delta.myschema.mytable ;
DROP SCHEMA delta.myschema;

As you can see with Trino and Delta Lake you get full create, read, update, and delete operations on your lake house.

Question of the episode: How do I secure the connection from a Trino cluster to the data source

Since we talked about connectors earlier, you already know that the configuration for accessing a data source is assembled to create a catalog. This approach uses a properties file in etc/catalog. For example, let’s look at the recently updated SQL Server connector documentation:

connector.name=sqlserver
connection-url=jdbc:sqlserver://<host>:<port>;database=<database>;encrypt=false
connection-user=root
connection-password=secret

The connector uses username and password authentication. It connects using the JDBC driver, which in turn enables TLs by default. A number of other connectors also use JDBC drivers with username and password authentication, but the details vary a lot. However, for all of them you can use secrets support in Trino to use environment variable references instead of hardcoding passwords.

When it comes to other connectors the details of securing a connection vary even more. Ultimately the answer to how to secure the connection, and if that is even possible, is the usual “It depends”. Luckily you can check the documentation for each connector to find out more and ping us on Slack if you need more help.

Events, news, and various links

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from O’Reilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

33: Trino becomes highly available for high demand

2022-02-17T00:00:00+00:00

Guests

Ramesh Bhanan, Vice President, at Goldman Sachs (@ramesh-bhanan-byndoor).
Sambit Dikshit, Managing Director, Tech Fellow at Goldman Sachs (@sambitdixit).
Siddhant Chadha, Senior Data Engineer at Goldman Sachs (@siddhant-chadha).
Suman Baliganahalli Narayan Murthy, Vice President at Goldman Sachs (@suman-b-n).
Sumit Halder, Vice President at Goldman Sachs (@sumit-halder).

Releases 369, 370, and 371

Trino 369

Experimental support for task level retries.
Support for groups in OAuth2 claims.
Column comments in ClickHouse connector.
Write Bloom filters in ORC files.
Procedure for optimizing Iceberg tables.

Trino 370

Add CLI support for ARM64.
Improved performance for ORC.
Improved performance for map and row types.
Reduced latency for OAuth2.0 authentication.

Trino 371

Support for secrets and user group selector in resource group manager.
Support AWS role session name in S3 security mapping configuration.
Many bug fixes.

Notes from Manfred

Add support for using PostgreSQL and Oracle as backend database for resource groups.
Remove spill-order-by, spill-window-operator, and query.max-total-memory-per-node.
Add support for ALTER MATERIALIZED VIEW ... SET PROPERTIES in the engine.
Prevent hanging query execution on failures with phased execution policy.
Support for renaming schemas in PostgreSQL and Redshift connectors.
Lots of improvements on Clickhouse connector, thanks Yuya!
Update to newer ClickHouse version removed support for Altinity 20.3.
$properties table and other hidden tables in Iceberg connector, including docs.
Automatically adjust ulimit setting when using the RPM package.
Docker images changes to UBI.
Remove support/need for allow-drop-table catalog property in JDBC connectors.
A bunch of SPI changes.
DML with Iceberg connector with fault tolerant mode and more Tardigrade improvements.
Drop support for Kudu 1.13.0.

More detailed information is available in the Trino 369, Trino 370, and Trino 371 release notes.

Concept of the month: High availability with Trino

Goldman Sachs uses Trino to reduce last-mile ETL, and provide a unified way of accessing data through federated joins. Making a variety of data sets from different sources available in one spot for our data science team was a tall order. Data must be quickly accessible to data consumers, and systems like Trino must be reliable for users to trust this singular access point for their data.

In order for analysts and data scientists to use these services, they first need to trust in the system. It was vital to Goldman Sachs that Trino has high availability. In the event of any failure, another Trino cluster is available to process requests.

Integrating Trino into the Goldman Sachs internal ecosystem

Before high availability was a concern, the team had to first integrate Trino to meet their requirements. This included integrating with internal security systems, observability systems, and credential stores. It also meant adding integration with their governance services that manage cataloguing services and data discovery engines. Finally, while many of the Trino connectors that the team intended to use exist, there were many missing features and performance enhancements that would lead to a better user experience and more adoption. The team has since taken it upon themselves to work on these features and contribute them back to Trino. We will cover some of these contributions in the PR segment of this show.

Achieving scaling and high availability

Once the team had much of Trino running for some initial use cases, the next step was to improve support for more simultaneous use cases and highly concurrent workloads. The team wanted trust in the system and so as they scaled the ability to run blue-green deployments, enable resources isolation, and have highly available clusters through failures became much more pertinant.

Trino ecosystem at Goldman Sachs

Here is an overview of the Goldman Sachs ecosystem. It showcases the preexisting services that needed to connect to Trino, the catalogs supported, and the method in which Goldman Sachs achieves high availability through supporting multiple clusters in various groups.

Source: Goldman Sachs Blog

Dynamic query routing

In order to ensure that all the clusters receive an even distribution the team created services that enable dynamic query routing across the different cluster groups.

Source: Goldman Sachs Blog

Query routing components

Envoy Proxy - open source edge and service proxy that provides features such as routing, traffic management, load balancing, external authorization, rate limiting, and more.

Source: Goldman Sachs Blog

Cluster Groups - cluster group is a set of various Trino clusters that can be assigned traffic by the
Cluster Metadata Service - a service that provides the Envoy routers with all the cluster related configurations
Router Service
- Envoy Control Plane - The Envoy Control Plane is an xDs gRPC-based service, that is responsible for providing dynamic configurations to Envoy.
- Upstream Cluster Selection - Envoy provides HTTP filters to parse and modify both request and response headers. We use a custom Lua filter to parse the request and extract the x-trino-user header. Then, we call the router service, which returns the upstream cluster address.

PR of the month: PR 8956 Add support for external db for schema management in MongoDB connector

This month’s PR of the month comes from today’s guest Siddhant to solve this issue related to the MongoDB connector.

Siddhant created the issue in response to the common problem that MongoDB connector users face when they don’t have write capability in the Mongo system. Since MongoDB has no implicit schema, Trino uses a schema definition that is written to a special MongoDB database. This PR enables users without write access to create an external location to store their schema to avoid this issue.

Thanks Siddhant for raising this issue, as it’s a common issue beginners using the MongoDB connector face commonly.

Bonus PR of the month: PR 8202 Metadata for alias in Elasticsearch connector only uses the first mapping

This bonus PR of the month comes from another one of today’s guests, Suman. It solves multiple issues, meaning this feature is in high demand!

The problem brought up by these issues also have to do with how we are mapping schemas over NoSQL databases that don’t implicitely have a schema. In this case Elasticsearch stores it’s schema in an object called a mapping. This mapping can be strict or dynamic for various portions of the document that gets inserted. The object that correlates to a table in Elasticsearch is called an index. To keep Elasticsearch fast, multiple indexes are created periodically to support a given document type similar to partitioning in a database. In general, these index follow a very common mapping for a given type, but the reality is that Elasticsearch allows you to vary from the mapping. Trino currently simplifies the way this is done by only reading the first mapping and assuming that all indexes and documents follow this schema. This pull request addresses this issue by scanning a much larger sample of mappings and merging the schema to handle any conflicts. It then goes further to cache these merged mappings for a given amount of time.

Thanks for all of your continued work on this Suman! It will help a lot!

This months demo showcases a tool that Brian modified from SQL Fiddle tool called Trino Fiddle. This tool will allow Trino users to share problems and answer questions that other Trino users are facing.

Question of the month: Does Trino support CarbonData?

This month’s question of the month comes from Mahebub Sayyed on Trino Forum. Mahebub asks, “Does Trino support CarbonData?”

The answer is a little tricky, but it can be done!

CarbonData currently maintains a connector called carbondata-presto that works with an older version of Trino, version 333 (an io.prestosql version before the rename). Someone has already opened a PR to update this connector to a current Trino version that they worked on in the middle of 2021 and hasn’t made much progress recently.

That being said, you could build and use the Trino version of the connector this person was working on, and see if it works for you. If you are running on a version of Trino that is older than 351, you should be able to use the existing carbondata-presto connector.

If anyone feels motivated, it would be wonderful if you could help get this contributed to the CarbonData project, or even work with them to have it land in the Trino project!

Events, news, and various links

Blogs and resources

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Tardigrade Project Update

2022-02-16T00:00:00+00:00

Over the last couple of months we’ve added support for full query retries, landed experimental support for task level retries and provided a proof of concept implementation of a distributed exchange plugin (description below). We are still working on improving scheduling algorithms as well as optimizing exchange plugin implementation to make the task level retries fully usable.

Here is a quick summary of our progress so far:

Added support for automatic query retries. This functionality is ready to use and can be enabled by setting the retry_policy=QUERY session property. Now it is possible to enable automatic retries for queries that produce more than 32MB of output. Dynamic filtering is now also fully supported with automatic query retries enabled.
Landed an initial set of changes to support task level retries. To be enabled, a plugin implementing the ExchangeManager interface has to be installed.
Landed a proof of concept implementation of the ExchangeManager interface. The implementation is fully functional, however we are still working on optimizing the read path. Also for now, only S3 compatible file systems are supported.
Added support for automatic retries in Hive and Iceberg. Supporting automatic retries for JDBC based connectors is up for grabs.
Implemented weight based split assignment for balanced work distribution between fault tolerant tasks.
Working on adaptive sizing strategy for intermediate tasks to minimize scheduling overhead while keeping the cost of a single task failure at minimum.
Making progress on introducing an advanced memory aware scheduling that would allow us to better support memory intensive queries, improve resource utilization and ensure fair resource allocation between queries.
Started working on supporting dynamic filtering for queries with task level retries enabled.
Working on accommodating failed attempts in various internal statistics reported by the engine (e.g.: QueryInfo, QueryCompletedEvent). UI changes will come next.

Over the next couple of weeks we are planning to focus on:

Optimizing read path for the reference implementation of the exchange plugin
Landing memory aware scheduling for fault tolerant execution
Landing adaptive sizing for intermediate tasks
Accommodating failed attempts into query statistics reporting
Making progress on supporting dynamic filtering for queries with task level retries enabled

The current state of development can be tracked by following this issue.

Stay tuned!

32: Trino Tardigrade: Try, try, and never die

2022-01-20T00:00:00+00:00

Guests

Andrii Rosa, Software Engineer at Starburst (@andrii-rosa-79578561).
Brian Zhan, Product Manager at Starburst (@brianzhan1).
Lukasz Osipiuk, Software Engineer at Starburst (@losipiuk).
Martin Traverso, Trino & Presto Co-founder and CTO at Starburst (@mtraverso).
Zebing Lin, Software Engineer at Starburst (@linzebing).

Trino Summit 2021

If you missed Trino Summit 2021, you can watch it on demand, for free!

Releases 367 and 368

Martin’s official announcements merged into one:

Lineage tracking for WITH clauses and subqueries.
Option to hide inaccessible columns in SELECT *.
flush_metadata_cache() procedure for the Hive connector.
Improve performance of DECIMAL type.
File-based access control for the Iceberg connector.
Support for TIME type in the SingleStore connector.
Support for BINARY type in the Phoenix connector.

Manfred’s additional notes:

Prevent data loss on DROP SCHEMA in Hive and Iceberg connectors.
New default query execution policy phased brings performance improvements.
And finally, numerous smaller improvements around memory management and query processing for our project Tardigrade.

More detailed information is available in the Trino 367 and Trino 368 release notes.

Concept of the month: Introducing Project Tardigrade

Before we jump right into the project, lets cover some of the history of ETL and data warehousing to better understand the problems that Tardigrade solves.

Why do people want to do ETL in Trino?

Trino is used for Extract, Transform, Load (ETL) workloads in many companies, like Salesforce, Shopify, Slack, and older versions of Trino at Facebook.

First, the most important thing is query speed. Queries run a lot faster in Trino. Open data stack technologies like Hive and Spark retry the query from intermediate checkpoints when something fails. However, there’s a performance cost to this. Trino has always been focused on delivering query results as quickly as possible. Now, Trino performs task-level retries enabling failure recovery where needed for the more long-running queries. More on this later though.

Second, most companies have widely dispersed and fragmented data. It’s typical for most companies to have different storage systems for different use cases. This only becomes more commonplace when a merger and acquisition happens, and you have a ton of data stored in yet another location. The acquiring company ends up having key information living in a bunch of different places. The net result is that the data engineer ends up spending weeks to write that simple dashboard. The data scientist trying to understand a trend gets impeded whenever trying to draw data from a new source and gives up.

Third, data engineers want to spend their time writing business logic, not moving SQL between engines. Unfortunately, this is where they end up spending much of their time. Many do their ad-hoc analytics in Trino, because it provides a far more interactive experience than any other engine. If they don’t just use Trino, they have a 1,000 line SQL ETL job that they now need to convert into another dialect. You just need to search “convert Spark Presto SQL Stack Overflow” to see the numerous challenges that people face moving between engines.

Whether it’s the optimizations in one engine not working in the other, a UDF in Trino not existing in Spark, strange differences in the SQL dialect tripping people up, or being extremely difficult to debug, these factors always cause a delay in completing their tasks. Data engineers are especially paranoid about converting SQL correctly. Imagine reporting an incorrect revenue metric externally, billing a user of your platform the incorrect amount, or delivering the wrong content to users due to any of these issues.

Why are people reluctant to do their ETL in Trino?

Before the drive for big data and technologies like Hadoop showed up on the scene, systems like Teradata, Netezza, and Oracle were used to run ETL pipelines in a largely offline manner. If a query failed, you simply had to restart it. Systems would brag about the low failure rate of their systems.

As Big Data came to the forefront, systems like the Google File System, that largely inspired the design for the Hadoop Distributed File System, aimed to build large distributed systems that supported fault-tolerance. In essence, faults were expected, and if a node in the system failed, no data would be lost.

At this same time, compute and storage systems were becoming separate systems. Just as storage was built with fault-tolerance, compute systems like MapReduce that processed and transformed data was also built with fault tolerance in mind. Apache Hive is a syntax and metadata layer that enables generating MapReduce jobs without having to write code. Apache Spark came on the analytics scene by introducing lineage as a way for engineers to have more control over how and when their datasets are flushed to disk. This technique, while novel, still took a very pessimistic view that allowing faults was the worst case scenario to avoid.

When Trino was created, it was designed with speed in mind. Trino creators Martin, Dain, and David chose not to add fault-tolerance to Trino as they recognized the tradeoff of fast analytics. Due to the nature of the streaming exchange in Trino all tasks are interconnected. A failure of any task results in a query failure. To support long running queries Trino has to be able to tolerate task failures.

Having an all-or-nothing architecture makes it significantly more difficult to tolerate faults, regardless of how rare they are. The likelihood of a failure grows with the time it takes to complete a query. This risk also increases as the resource demands, such as memory requirements of a query, grow. It’s impossible to know the exact memory requirements for processing a query upfront. In addition to increased likelihood of a failure, the impact of failing a long running query is much higher, as it often results in a significant waste of time and resources.

You may think all-or-nothing is a model destined to fail, especially when scaling to petabytes of data. On the contrary, Trino’s predecessor Presto was commonly used to execute batch workloads at this scale at Facebook. Even today, companies like Salesforce, Doordash, and many others, use Trino at Petabyte scale to handle ETL workloads. While it is possible, scaling Trino to run petabyte scale ETL pipelines, you really have to know what you’re doing.

Resource management is another challenge. Users don’t know exactly what resource utilization to expect from a query they submit. It is challenging to properly size the cluster and to avoid resource related failures.

In essence, most people avoid using Trino for ETL because they lack the understanding of how to correctly configure Trino at scale.

What are the limitations of the current architecture?

In the current architecture Trino plans all tasks for processing a specific query upfront. These tasks interconnect with one another as the results from one task are the input for the next. This interdependency is necessary but if any task fails along the way, it breaks the entire chain.

Data is streamed through task graph with no intermediate checkpointing. The query execution has just internal, volatile state of operators running within tasks.

As stated before, this architecture has advantages. Most notably high throughput and low latency. Yet it implies some limitations too. Probably the most natural one is that it does not allow for granular failure recovery. If one of the tasks dies there is no way to restart processing from some intermediary state. The only option is to rerun the whole query from the very beginning.

The other notable limitation is around memory consumption. With static task placement we have little control over resource utilization on nodes.

Finally, the current architecture makes many decisions upfront during query planning. The engine creates a query plan based on incomplete data using table statistics, or blindly, if statistics are not available. After the coordinator creates the plan, and query processing started, there aren’t many ways to adapt. We have much more information during query execution at runtime. For example, we cannot change the number of tasks for a stage. If we observe data skew, we can’t move tasks away from the overworked node, so the affected tasks have more resources at hand. We cannot change the plan for a subquery, if we notice that decision already made is not optimal.

Trino engine improvements with Project Tardigrade

Project Tardigrade aims to break the all-or-nothing execution barriers. It opens many new opportunities around resource management, adaptive query optimization, and failure recovery. We will use a technique called spooling that stores intermediate data in an efficient buffering layer at stage boundaries. The buffer stores intermediate results for the duration of a query or a stage, depending on the context. The project is named after the microscopic Tardigrades that are the world’s most indestructible creatures, akin to the resiliency we are adding to Trino.

Buffering intermediate results makes it possible to execute queries iteratively. For example, the engine can process one or several tasks at a time, effectively reducing memory pressure, and allow memory intensive queries to succeed without a need to expand the cluster. Tardigrade can significantly lower cost of operation, specifically for the situation when only a small number of queries requires more memory than available.

Adaptive planning

The engine may also decide to re-optimize the query at stage boundaries. When
the engine buffers the intermediate data, it is possible to get better insight into the nature of the data as it’s processed and adapt query plans accordingly. For example, when the cost based optimizer makes a bad decision, because of incorrect statistics or estimates, it can pick the wrong type of join, or a suboptimal join order. The engine can then suspend the query, re-optimize the plan, and resume processing. Additionally, it may allow the engine to discover skewed datasets, and change query plans accordingly. This may significantly improve efficiency and landing time for workloads that are JOIN heavy.

Resource management

Iterative query processing allows us to be more flexible at resource management. Resource allocation can be adjusted as the queries run. For example, when a cluster is idle, we may allow a single query to utilize all available resources on a cluster. When more workload kicks in, the resource allocation for the initial query can be gradually reduced, and available resources can be granted to newly submitted workloads. With this model it is also significantly easier to implement auto scaling. When the submitted workload requires more resources than currently available in the cluster, the engine can request more nodes. Or the opposite, if the cluster is underutilized it is easier to return resources when there’s no need to wait for slow running tasks. Being able to better manage available resources, and adjust the resource pool based on the current workload submitted, would make the engine significantly more cost effective.

Fine-grained failure recovery

Last, but not least, with project Tardigrade we are going to provide fine-grained failure recovery. The buffering introduced at stage boundaries allows for a transparent restart of failed tasks. Fine grained failure recovery would make completion time for ETL pipelines significantly more predictable. Also, it opens the opportunity of running ETL workloads on much cheaper, widely available spot instances that can further optimize operational costs.

Opportunities that Tardigrade opens

In summary, in Project Tardigrade we work on the following improvements to Trino:

Predictable query completion times.
The ability to scale up or down to match the workload at runtime.
Fine grained resource management.
Non-homogenous hardware.
Adaptive resource limits for tasks.
Graceful Shutdown improvement.
Cheaper compute costs using spot instances that have lower failure guarantees.
Enables adaptive query replanning during runtime as context changes.
Handle situations where certain tasks are affected by data skew.

Efficient exchange data buffering implementation

This all sounds incredible, but it begs the question of how to best implement these buffers? Enabling task-level retry requires us to store intermediate exchange data to a “distributed buffer”. In order to minimize the level of disturbance buffering has on the query performance, there needs to be careful design consideration.

A naive implementation is to use a cloud object storage as intermediate storage. This allows you to scale without maintaining a separate service. This is the initial option we are using as a prototype buffer. It is intended as a proof-of-concept and should be good enough for small clusters of ten to twenty nodes. This option can be slow and won’t support high-cardinality exchanges. The number of files grows quadratically with the number of partitions. Trino then has keep track of the metadata of all these files in order to plan and schedule which tasks require which files for the query. With the high amount of files, there is memory cost to hold that metadata. There is also a penalty for the time and bandwidth it takes on the network to list them all. This is a well know many small files problem in big data.

Distributed memory with spilling as a buffer

This solution requires a long-running managed service, but improves performance. Depending on the design we choose, we can use write-ahead buffers to output data belonging to the same partition and provide sequential I/O to downstream tasks.

Demo of the month: Task retries with Project Tardigrade

In this months demo, Zebing showcases task retries using Project Tardigrade after throwing his EC2 instance out the window! See what happens next…

PR of the month: PR 10319 Trino lineage fails for `AliasedRelation`

This month’s PR of the month was created to resolve an issue reported by Lyft Data Infrasturcture Engineer, Arup Malakar (@amalakar).

Arup reported that Trino lineage fails to capture upstream columns when join and transformation is used. This issue more generally applied to any column used with function where its argument are from a AliasedRelation. Starburst engineer, Praveen Krishna (@Praveen2112), resolved the issue two days later, and with the help of Arup and the Lyft team, tested the fix works!

Thanks to both Arup and Praveen for the fix!

Question of the month: How do you cast JSON to varchar with Trino?

This month’s question of the month comes from Borislav Blagoev on Stack Overflow. He asks, “How do you cast JSON to varchar with Trino?”

This was answered by Guru Stron:

Use json_format/ json_parse to handle json object conversions instead of casting:

select json_parse('{"property": 1}') objstring_to_json, json_format(json '{"property": 2}') jsonobj_to_string

Output:

objstring_to_json	jsonobj_to_string
{“property”:1}	{“property”:2}

Events, news, and various links

Blogs and resources

How to ETL at Petabyte-Scale with Trino

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Trino 2021 Wrapped: A Year of Growth

2021-12-31T00:00:00+00:00

As we reflect on Trino’s journey in 2021, one thing stands out. Compared to previous years we have seen even further accelerated, tremendous growth. Yes, this is what all these year-in-retrospect blog posts say, but this has some special significance to it. This week marked the one-year anniversary since the project dropped the Presto name and moved to the Trino name. Immediately after the announcement, the Trino GitHub repository started trending in number of stargazers. Up until this point, the PrestoSQL GitHub repository had only amassed 1,600 stargazers in the two years since it had split from the PrestoDB repository. However, within four months after the renaming, the number of stargazers had doubled. GitHub stars, issues, pull requests and commits started growing at a new trajectory.

At the time of writing, we just hit 4,600 stargazers on GitHub. This means, we have grown by over 3,000 stargazers in the last year, a 187% increase. While we are on the subject, let’s talk about the health of the Trino community.

2021 by the numbers

Let’s take a look at the Trino project growth by the numbers:

3679 new commits 💻 in GitHub
3015 new stargazers ⭐ in GitHub
2450 new members 👋 in Slack
1979 pull requests merged ✅ in GitHub
1213 issues 📝 created in GitHub
988 new followers 🐦 on Twitter
525 average weekly members 💬 in Slack
491 new subscribers 📺 in YouTube
23 Trino Community Broadcast ▶️ episodes
17 Trino 🚀 releases
13 blog ✍️ posts
10 Trino 🍕 meetups
1 Trino ⛰️ Summit

Along with the growth we’ve seen in GitHub, we have seen a 47% growth of the Trino Twitter followers this year. The Trino Slack community, where a large amount of troubleshooting and development discussions occur, saw a 75% growth, nearing 6,000 members. Finally, the Trino YouTube channel has seen an impressive 280% growth in subscribers.

A lot of the increase on this channel was due to the Trino Community Broadcast, that brought users and contributors from the community to cover 23 episodes about the following topics:

7 episodes on the Trino ecosystem (dbt, Amundsen, Debezium, Superset)
4 episodes on the Trino project (Renaming Trino, Intro to Trino, Trinewbies)
4 episodes on Trino connectors (Iceberg, Druid, Pinot)
4 episodes on Trino internals (Distributed Hash-Joins, Dynamic Filtering, Views)
2 episodes on Trino using Kubernetes (Trinetes series)
2 episodes on Trino users (LinkedIn, Resurface)

While stargazers, subscribers, episodes, and followers tell the story of the growing awareness of the Trino project with the new name, what about the actual rate of development on the project?

At the start of the year, there were 21,924 commits. This year, we pushed 3,679 commits to the repository, sitting at over 25,600 now. Looking at the graph, this keeps us pretty consistent with 2020’s throughput.

With the project’s trajectory displayed in numbers, let’s examine the top features that landed in Trino this year.

Features

Here’s a high-level list of the most exciting features that made their way into Trino in 2021. For details and to keep up you can check out the release notes.

SQL language improvements

SQL language support is crucial for the increasing complexities of queries and usage of Trino. In 2021 we added numerous new language features and improvements:

MATCH_RECOGNIZE a feature that allows for complex analysis across multiple rows. To learn more about this feature watch the Community Broadcast show.
WINDOW clause.
RANGE and ROWS keyword for usage within a window function.
Time travel support and syntax, like FOR VERSION AS OF and FOR TIMESTAMP AS OF.
UPDATE is supported.
Subquery expressions that return multiple columns. Example: SELECT x = (VALUES (1, 'a')).
Add support for ALTER MATERIALIZED VIEW … RENAME TO …
from_geojson_geometry/to_geojson_geometry functions.
contains function for checking if a CIDR contains an IP address.
listagg function returns concatenated values seperated by a specified separator.
soundex function that checks phonetic similarity of two strings.
format_number function.
SET TIME ZONE to set the current time zone for the session.
Arbitrary queries in SHOW STATS.
CURRENT_CATALOG and CURRENT_SCHEMA session functions.
TRUNCATE TABLE which allows for a more efficient delete.
DENY statement, which enables you to remove a user or groups access via SQL.
IN <catalog> clause to CREATE ROLE, DROP ROLE, GRANT ROLE, REVOKE ROLE, and SET ROLE to specify the target catalog of the statement instead of using the current session catalog.

Query processing improvements

Added support for automatic query retries (this feature is very experimental with some limitations for now).
Transparent query retries.
Updated the behavior of ROW to JSON cast to produce JSON objects instead of JSON arrays.
Column and table lineage tracking in QueryCompletedEvent.

Performance improvements

Improved performance for the following operations:

Querying Parquet data for files containing column indexes.
Reading dictionary-encoded Parquet files.
Queries using rank() window function.
Queries using sum() and avg() for decimal types.
Queries using GROUP BY with single grouping column.
Aggregation on decimal values.
Evaluation of the WHERE and SELECT clause.
Computing the product of decimal values with precision larger than 19.
Queries that process row or array data.
Queries that contain a DISTINCT clause.
Reduced memory usage and improved performance of joins.
ORDER BY LIMIT performance was improved when data was pre-sorted.
Node-local Dynamic Filtering

Security

Added the following improvements and features relevant for authentication, authorization and integration with other security systems:

Automatic configuration of TLS for secure internal communication.
Handling of Server Name Indication (SNI) for multiple TLS certificates. This removes the need to provision per-worker TLS certificates.
Access control for materialized views.
OAuth2/OIDC opaque access tokens.
Configuring HTTP proxy for OAuth2 authentication.
Configuring multiple password authentication plugins.
Hiding inaccessible columns from SELECT * statement.

Data Sources

BigQuery connector

Added CREATE TABLE and DROP TABLE support.
Added support for case insensitive name matching for BigQuery views.
Support reading bignumeric type whose precision is less than or equal to 38.
Added support for CREATE SCHEMA and DROP SCHEMA statements.
Improved support for BigQuery datetime and timestamp types.

Cassandra connector

Mapped Cassandra uuid type to Trino uuid.
Added support for Cassandra tuple type.
Changed minimum number of speculative executions from two to one.
Support for reading user-defined types.

Clickhouse connector

Added ClickHouse connector.
Improved performance of aggregation queries by computing aggregations within ClickHouse. Currently, the following aggregate functions are eligible for pushdown: count, min, max, sum and avg.
Added support for dropping columns.
Map ClickHouse UUID columns as UUID type in Trino instead of VARCHAR.

HDFS, S3, Azure and cloud object storage systems

A core use case of Trino uses the Hive and Iceberg connectors to connect to a data lake. These connectors differ from most as Trino is the sole query engine as opposed to the client calling another system. Here are some changes that for these connectors:

Enabled Glue statistics to support better query planning when using AWS.
UPDATE support for ACID tables
A lot of Hive view improvements.
Parquet column indexes.
target_max_file_size configuration to control the file size of data written by Trino.
Streaming uploads to S3 by default to improve performance and reduce disk usage.
Improved performance for tables with small files and partitioned tables.
Transparent redirection from a Hive catalog to Iceberg catalog if the table is an Iceberg table.
Updated to Iceberg 0.11.0 behavior for transforms of dates and timestamps before 1970.
Added procedure system.flush_metadata_cache() to flush metadata caches.
Avoid generating splits for empty files.
Sped up Iceberg query performance when dynamic filtering can be leveraged.
Increased Iceberg performance when reading timestamps from Parquet files.
Improved Iceberg performance for queries on nested data through dereference pushdown.
Added support for INSERT OVERWRITE operations on S3-backed tables.
Made the Iceberg uuid type available.
Trino views made available in Iceberg.

Elasticsearch connector

Added support for reading fields as json values.
Fixed failure when documents contain fields of unsupported types.
Added support for scaled_float type.
Added support for assuming an IAM role.
Added retry requests with backoff when Elasticsearch is overloaded.
Better support for Elastic Cloud.

MongoDB connector

Added timestamp_objectid() function.
Enabled mongodb.socket-keep-alive config property by default.
Add support for json type.
Support reading MongoDB DBRef type.
Allow skipping creation of an index for the _schema collection, if it already exists.
Added support to redact the value of mongodb.credentials in the server log.
Added support for dropping columns.

MySQL connector

Added support for reading and writing timestamp values with precision higher than three.
Added support for predicate pushdown on timestamp columns.
Exclude an internal sys schema from schema listings.

Pinot connector

Updated Pinot connector to be compatible with versions >= 0.8.0 and drop support for older versions.
Added support for pushdown of filters on varbinary columns to Pinot.
Fixed incorrect results for queries that contain aggregations and IN and NOT IN filters over varchar columns.
Fixed failure for queries with filters on real or double columns having +Infinity or -Infinity values.
Implemented aggregation pushdown.
Allowed HTTPS URLs in pinot.controller-urls.

Phoenix connector

Phoenix 5 support was added.
Reduced memory usage for some queries.
Improved performance by adding ability to parallelize queries within Trino.

Features added to various connectors

In addition to the above some more features were added that apply to connectors that use common code. These features improve performance using:

Statistical aggregate function pushdown
TopN pushdown and join pushdown
Improved planning times by reducing number of connections opened
Improved performance by improving metadata caching hit rate
Rule based identifier mapping support
DELETE, non-transactional inserts and write-batch-size
Metadata cache max size
TRUNCATE TABLE
Improved handling of Gregorian - Julian switch for date type
Ensured correctness when pushing down predicates and topN to remote system that is case-insensitive or sorts differently from Trino.

Runtime improvements

There are a lot of performance improvements to list from the release notes. Here are a few examples:

Improved coordinator CPU utilization.
Improved query performance by reducing CPU overhead of repartitioning data across worker nodes.
Reduced graceful shutdown time for worker nodes.

Everything else

HTTP Event listener
Added support for ARM64 in the Trino Docker image.
Added clear command to the Trino CLI to clear the screen.
Improved tab completion for the Trino CLI.
Custom connector metrics.
Fixed many, many, many bugs!

Trino Summit

In 2021 we also enjoyed a successful inaugural Trino Summit, hosted by Starburst, with well over 500 attendees. There were wonderful talks given at this event from companies like Doordash, EA, LinkedIn, Netflix, Robinhood, Stream Native, and Tabular. If you missed this event, we have the recordings and slides available.

As a teaser, the event started with Commander Bun Bun playing guitar to AC/DC’s, “Back In Black”.

Renaming from PrestoSQL to Trino

As mentioned above, we renamed the project this year. What followed, was an outpouring of support and shock from the larger tech community. Community members immediately got to work. The project had to change the namespace practically overnight from the io.prestosql namespace to io.trino and a migration blog post was published. Due to the hasty nature of the Linux Foundation to enforce the Presto trademark, users had to adapt quickly.

This confused many in the community, especially once the ownership of old PrestoSQL accounts were taken down by the Linux Foundation. The https://prestosql.io site had broken documentation links, JDBC urls had to change from jdbc:presto to jdbc:trino, header protocol names had to be changed from prefix X-Presto- to X-Trino-, and various other user impacting changes had to be made in the matter of weeks. Even the legacy Docker images were removed from the prestosql/presto Docker repository, causing disruptions for many users who immediately had to upgrade to the trinodb/trino Docker repository.

We reached out to multiple projects to update compatibility to Trino.

Despite the breaking changes, once the immediate hurdles fell behind, not only was the community excited and supportive about the brand change, but particularly they were all loving the new mascot. Our adorable bunny was soon after named Commander Bun Bun by the community.

2022 Roadmap: Project Tardigrade

One of the interesting developments that came out of Trino Summit was a feature Trino co-creator, Martin, talked about in the State of Trino presentation. He proposed adding granular fault-tolerance and features to improve performance in the core engine. While Trino has been proven to run batch analytics workloads at scale, many have avoided long-running batch jobs in fear of a query failure. The fault-tolerance feature introduces a first step for the Trino project to gain first-class support for long-running batch queries at massive scale.

The granular fault-tolerance is being thoughtfully crafted to maintain the speed advantage that Trino has over other query engines, while increasing the resiliency of queries. In other words, rather than when a query runs out of resources or fails for any other reason, a subset of the query is retried. To support this intermediate stage data is persisted to replicated RAM or SSD.

The project to introduce granular fault-tolerance into Trino is called Project Tardigrade. It is a focus for many contributors now, and we will introduce you to details in the coming months. The project is named after the microscopic Tardigrades that are the worlds most indestructible creatures, akin to the resiliency we are adding to Trino’s queries. We look forward to telling you more as features unfold.

Along with Project Tardigrade will be a series of changes focused around faster performance in the query engine using columnar evaluation, adaptive planning, and better scheduling for SIMD and GPU processors. We also will be working on dynamically resolved functions, MERGE support, Time Travel queries in data lake connectors, Java 17, improved caching mechanisms, and much much more!

Conclusion

In summary, living this first year under the banner of Trino was nothing short of a wild endeavor. Any engineer knows that naming things is hard, and renaming things is all the more difficult.

As we head into 2022, we can be certain of one thing. Trino will be reaching into newer areas of development and breaking norms just as it did as Presto in previous eras. The adoption of native fault-tolerance to a lightning fast query engine will bring Trino to a new level of adoption. Keep your eyes peeled for more about Project Tardigrade.

Along with Project Tardigrade, we are looking forward to another year filled with features, issues, and suggestions from our amazing and passionate community. Thank you all for an incredible year. We can’t wait to see what you all bring in 2022!

31: Trinites II: Trino on AWS Kubernetes Service

2021-12-16T00:00:00+00:00

Trino Summit 2021

If you missed Trino Summit 2021, you can watch it on demand, for free!

Releases 365 and 366

Martin’s official announcement mentioned the following highlights:

Trino 365

Aggregations in MATCH_RECOGNIZE
Support for TRUNCATE TABLE
Compatibility with Pinot 0.8.0
HTTP proxy support for OAuth2 authentication
Many improvements to Iceberg connector

Release notes: https://trino.io/docs/current/release/release-365.html

Trino 366

Support for automatic query retries
Support for DENY security rules
Performance optimizations

Release notes: https://trino.io/docs/current/release/release-366.html

Manfred’s additional notes:

Cool new SQL like TRUNCATE TABLE and support for time travel
contains function for IP check in CIDR
Lots of performance and correctness fixes on Hive and Iceberg connectors
Drop support for old Pinot versions
Support for Hive to Iceberg redirects
Automatic TLS for internal communication support for Java 17

And a last note, full Java 17 support is becoming a reality.

More detailed information is available in the 365 and 366 release notes.

To play around with query retries, you need to set the retry_policy session variable to QUERY with the following command SET SESSION retry_policy=QUERY;

Log4Shell

There’s a new vulnerability in town that has the potential to affect Java projects that use some Log4j2 versions. It is called Log4Shell, and it does not affect Trino. Read the blog for more details.

Concept of the month: ReplicaSets, Deployments, and Services

In the first installment of Trinetes, we talked about what containerization is and why we use it. We covered the difference between tools like docker-compose and container orchestration systems like Kubernetes (k8s). Finally, we went over the first k8s object called a pod.

As a reminder, a pod is the basic unit of deployment in a k8s cluster. In this episode, we cover how to scale, deploy, and connect these pods. If you are missing some context, you should review the first installment of this series.

ReplicaSets

Replicas make one or more instances based on the same pod definitions. In k8s, the object used to manage replication is a ReplicaSet.

ReplicaSets provide high availability by managing multiple instances based on a pod definition in the k8s cluster. Kubernetes automatically brings up any failed pod instances that go down in a ReplicaSets based on the number of replicas you specify in the definition.

Replication also enables load balancing IO traffic over multiple pods. You gain the flexibility to scale up or down as traffic increases or decreases without any downtime.

To scale the number of pods in a live ReplicaSet, you can update the replicas value in the ReplicaSet definition file, then running the following command to update it:

kubectl replace -f replicaset-definition.yml

You can also edit the live ReplicaSet without changing the local file:

kubectl edit replicaset <replicaset-name>

Labels and selectors

Kubernetes objects have labels which are just key/value properties used to identify and dynamically group k8s objects. Labels should be meaningful and relevant to k8s users to easily comprehend things like which application, version, component, and environment certain objects belong to. Labels are shared across instances, and so they are not unique.

Selectors specify the grouping of instance to target a set of objects when deploying or applying other operations over these objects. For example, a ReplicaSet that identifies a set of pods with its selector to manage. When creating the ReplicaSet, k8s creates new pods defined in the ReplicaSet’s selector definition. If the pods crash, k8s brings up new pods and associates the new pods with the ReplicaSet.

Deployments

Deployment objects allow you to take a ReplicaSet, and perform actions on that set like creation, a rolling update, rollback, pod update, and so on.

Source: https://www.udemy.com/course/learn-kubernetes/

The best way to start making sense of these concepts is to look at the k8s configuration files.

helm template tcb trino/trino --version 0.3.0

Below is the generated deployment configuration, trino/templates/deployment-worker.yaml with comments that delineate where different sections of the configuration are defining.

#-------------------------Deployment-----------------------------
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tcb-trino-worker
  labels:
    app: trino
    chart: trino-0.3.0
    release: tcb
    heritage: Helm
    component: worker
spec:
#-------------------------ReplicaSet-----------------------------
  replicas: 2
  selector:
    matchLabels:
      app: trino
      release: tcb
      component: worker
  template:
#----------------------------Pod---------------------------------
    metadata:
      labels:
        app: trino
        release: tcb
        component: worker
    spec:
      volumes:
        - name: config-volume
          configMap:
            name: tcb-trino-worker
        - name: catalog-volume
          configMap:
            name: tcb-trino-catalog
      imagePullSecrets:
        - name: registry-credentials
      containers:
        - name: trino-worker
          image: "trinodb/trino:latest"
          imagePullPolicy: IfNotPresent
          env:
            []
          volumeMounts:
            - mountPath: /etc/trino
              name: config-volume
            - mountPath: /etc/trino/catalog
              name: catalog-volume
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /v1/info
              port: http
          readinessProbe:
            httpGet:
              path: /v1/info
              port: http
          resources:
            {}

ConfigMap

You may have noticed that the pods define volumes that are referring to an object called ConfigMap. This is a way to store non-confidential data in the form of key-value pairs.

ConfigMaps are how the Trino chart loads the Trino configurations in the /etc/trino directory on the containers. The ConfigMap file, trino/templates/configmap-worker.yaml, defines the files loaded into the worker nodes. The only real difference of the ConfigMap is in the config.properites file specifying if the node is a coordinator or not.

apiVersion: v1
kind: ConfigMap
metadata:
  name: tcb-trino-worker
  labels:
    app: trino
    chart: trino-0.3.0
    release: tcb
    heritage: Helm
    component: worker
data:
  node.properties: |
    node.environment=production
    node.data-dir=/data/trino
    plugin.dir=/usr/lib/trino/plugin

  jvm.config: |
    -server
    -Xmx8G
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+UseGCOverheadLimit
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:+ExitOnOutOfMemoryError
    -Djdk.attach.allowAttachSelf=true
    -XX:-UseBiasedLocking
    -XX:ReservedCodeCacheSize=512M
    -XX:PerMethodRecompilationCutoff=10000
    -XX:PerBytecodeRecompilationCutoff=10000
    -Djdk.nio.maxCachedBufferSize=2000000

  config.properties: |
    coordinator=false
    http-server.http.port=8080
    query.max-memory=4GB
    query.max-memory-per-node=1GB
    query.max-total-memory-per-node=2GB
    memory.heap-headroom-per-node=1GB
    discovery.uri=http://tcb-trino:8080

  log.properties: |
    io.trino=INFO

The only other ConfigMap defines the catalog properties files in the /etc/trino/catalog folder. This ConfigMap only defines two catalogs. They expose the TPC-H and TPC-DS benchmark datasets.

apiVersion: v1
kind: ConfigMap
metadata:
  name: tcb-trino-catalog
  labels:
    app: trino
    chart: trino-0.3.0
    release: tcb
    heritage: Helm
    role: catalogs
data:
  tpch.properties: |
    connector.name=tpch
    tpch.splits-per-node=4
  tpcds.properties: |
    connector.name=tpcds
    tpcds.splits-per-node=4

Networking

Unlike in the Docker world, where it runs on the host directly where you can expose the container, pods in a k8s cluster run in a private network. Kubernetes exposes the internal IP address of the pod with the IP address of the k8s node and a unique port.

These IP addresses can be used to address pods internally, it’s not a good idea as these IP addresses are dynamic and subject to change upon termination and recreation. For this, you set up routing that handles addressing via pod name vs IP address.

When you have multiple k8s nodes, you have multiple IP addresses set up for the nodes. The routing software must be set up to handle the assignment of the internal networks to each nodes to avoid conflicts across the cluster. This type of functionality exists in cloud services, such as Amazon EKS, Google GKE, and Azure AKS.

Services

Services establish connectivity between different pods and can make pods available from the external k8s node IP address. This enables loose coupling between microservices in applications.

The above example is showing a NodePort service. There are three service types.

ClusterIP - the service creates a virtual IP inside the cluster to enable communication between different services. This service is the default when you don’t specify a type value under spec in the configuration.
NodePort - is used to expose the internal address of a pod using the IP address and port of the node it is running on.
Load Balancer - this service creates a load balancer for the application in supported cloud providers. We won’t cover this one, but this is used when we create our cluster in EKS using the eksctl.

Here’s a diagram of the ClusterIP networking between different ReplicaSets.

Source: https://www.udemy.com/course/learn-kubernetes/

NodePort’s establish connectivity to a specific ReplicaSet of pod instances. It cannot make a generically accessible IP address for services to communicate between one another.

In our case, we configure an external IP address for the coordinator. The Helm chart defines a ClusterIP service to accomplish this. Notice the selector targets the Trino app, the release label, and only the coordinator component, which we know is one node.

apiVersion: v1
kind: Service
metadata:
  name: tcb-trino
  labels:
    app: trino
    chart: trino-0.3.0
    release: tcb
    heritage: Helm
spec:
  type: ClusterIP
  ports:
    - port: 8080
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: trino
    release: tcb
    component: coordinator

NodePort

The NodePort Service type, creates a proxy service to forward traffic to a specific port on the node from the pod.

Source: https://www.udemy.com/course/learn-kubernetes/

There are three ports when setting up a NodePort.

TargetPort - is the port number on the pod itself, where the service forwards to.
Port - is the port used by the service.
NodePort - is the port that is exposed by the worker node and made available externally. NodePorts can only be in the range of 30000 - 32767.

The only required port to set is port. By default targetPort is the same as port and nodePort is automatically assigned a free port in the allowed range. ports is also an array which is why the - char is used.

Amazon EKS (Elastic Kubernetes Service)

Amazon EKS is a managed container service to run and scale Kubernetes applications in the cloud. EKS provides k8s clusters in the cloud for you without your having to manage the whole k8s services and platform. Unlike with your own k8s cluster, you can’t log into the control plane node in EKS, although you won’t need to. You are able to access workers which are usually EC2 nodes.

There are many steps involved in setting up a Kubernetes cluster on EKS, unless you use a simple command line tool called eksctl that provisions the cluster for you.

eksctl

From the eksctl website:

eksctl is a simple CLI tool for creating and managing clusters on EKS - Amazon’s managed Kubernetes service for EC2. It is written in Go, uses CloudFormation, was created by Weaveworks and it welcomes contributions from the community. Create a basic cluster in minutes with just one command.

Demo of the month: Deploy Trino k8s to Amazon EKS

First, you’ll need to install the following tools if you haven’t done so already:

Then you need to add your IAM credentials to the ~/.aws/credentials file.

Check the latest k8s version that is available on EKS. https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html

eksctl create cluster \
 --name tcb-cluster \
 --version 1.21 \
 --region us-east-1 \
 --nodegroup-name k8s-tcb-cluster \
 --node-type t2.large \
 --nodes 2

The command completed in 10 to 15 minutes. This is the first output you see:

2021-12-16 01:25:17 [ℹ]  eksctl version 0.76.0
2021-12-16 01:25:17 [ℹ]  using region us-east-1
2021-12-16 01:25:17 [ℹ]  setting availability zones to [us-east-1a us-east-1e]
2021-12-16 01:25:17 [ℹ]  subnets for us-east-1a - public:192.168.0.0/19 private:192.168.64.0/19
2021-12-16 01:25:17 [ℹ]  subnets for us-east-1e - public:192.168.32.0/19 private:192.168.96.0/19
2021-12-16 01:25:17 [ℹ]  nodegroup "k8s-tcb-cluster" will use "" [AmazonLinux2/1.21]
2021-12-16 01:25:17 [ℹ]  using Kubernetes version 1.21
2021-12-16 01:25:17 [ℹ]  creating EKS cluster "tcb-cluster" in "us-east-1" region with managed nodes

After some time, you notice that two ec2 instances have come up. The final output of the tool should look like this.

2021-12-16 02:00:17 [ℹ]  waiting for at least 2 node(s) to become ready in "k8s-tcb-cluster"
2021-12-16 02:00:17 [ℹ]  nodegroup "k8s-tcb-cluster" has 2 node(s)
2021-12-16 02:00:17 [ℹ]  node "ip-192-168-2-123.ec2.internal" is ready
2021-12-16 02:00:17 [ℹ]  node "ip-192-168-55-167.ec2.internal" is ready
2021-12-16 02:00:18 [ℹ]  kubectl command should work with "~/.kube/config", try 'kubectl get nodes'
2021-12-16 02:00:18 [✔]  EKS cluster "tcb-cluster" in "us-east-1" region is ready

Take special note that eksctl overwrote your k8s configuration to point you to the EKS cluster instead of a local cluster. To test that you can connect, run:

kubectl get nodes

You should see two nodes running. Now everything is simple. All you have to do to install Trino is reuse the Helm chart that we used to locally deploy Trino. Now, with the exact same command, you deploy to EKS since the tool updated your settings.

helm install tcb trino/trino --version 0.3.0

After you’ve installed the Helm chart, wait a minute or two for the Trino service to fully start and run:

kubectl get deployments

You should see the output that the coordinator and both workers are available.

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tcb-trino-coordinator   1/1     1            1           67s
tcb-trino-worker        2/2     2            2           67s

To connect to the cluster, the Helm output gives pretty good instructions on how to create a tunnel from the cluster to your local laptop.

Get the application URL by running these commands:
  export POD_NAME=$(kubectl get pods --namespace default -l "app=trino,release=tcb,component=coordinator" -o jsonpath="{.items[0].metadata.name}")
  echo "Visit http://127.0.0.1:8080 to use your application"
  kubectl port-forward $POD_NAME 8080:8080

Run that, then go to http://127.0.0.1:8080, and you should see the Trino UI.

To clear out the Helm install, run:

kubectl delete service --all
kubectl delete deployment --all
kubectl delete configmap --all

To tear down the entire k8s cluster, run:

eksctl delete cluster --name test-cluster --region us-east-1

PR of the month: PR 8921: Support TRUNCATE TABLE statement

This weeks PR of the month implements TRUNCATE TABLE. This command is very similar to DELETE statements, with the exception that it does not perform deletes on individual rows. This ends up becoming a much faster operation that DELETE as it uses fewer system and logging resources.

Thanks to Yuya Ebihira for adding the support for TRUNCATE TABLE.

Question of the month: How do I run `system.sync_partition_metadata` with different catalogs?

This week’s question of the month comes from Yu on Slack. Yu asks:

Hi team, in the following system procedure, how can we specify the catalog name? system.sync_partition_metadata(schema_name, table_name, mode, case_sensitive) We are using multiple catalogs and we need to call this procedure against non-default catalog.

I answered this with a link back to our fifth episode :

You need to set the catalog either in the jdbc string as I do in the video, or you need to set the session catalog variable, https://trino.io/docs/current/sql/set-session.html

Events, news, and various links

Blogs and resources

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Log4Shell does not affect Trino

2021-12-13T00:00:00+00:00

In the last few days we had a surge of folks in our community reaching out with concerns over the Log4Shell exploit (CVE-2021-44228), and we want to inform you that Trino is not affected. Trino does not use log4j in the core engine or runtime classes. There are some connectors that include the log4j dependency from client dependencies, but are either not used or are not versions affected by the Log4Shell vulnerability. Regular security reviews, including code and dependency analysis, are part of the regular development process. As we learn more we will update the code to keep vulnerabilities out of the code.

Trino connectors with the Log4j dependency

If you do a search in the Trino repository, you’ll notice two direct dependencies of the log4j dependency shows up in two of the connectors, Accumulo and Elasticsearch.

Accumulo

The Accumulo connector depends on log4j 1.2.17, which although isn’t vulnerable to Log4Shell, has other vulnerabilities. These vulnerabilities do not apply to how we’ve used the loggers in the connector code. To be clear, despite the small use of this logger in the Accumulo connector, there is still no threat even if you are using it. We are working on removing the uses of this log4j library to avoid any confusion in an upcoming release.

Elasticsearch

The Elasticsearch connector did have an affected dependency that was recently removed. Log4j was not being used in the connector. So despite the existence of the dependency in the Elasticsearch connector, there is no direct use of the vulnerable library.

Avoiding future introduction of Log4Shell

We take security seriously on the Trino project, as it provides a single point of access to your data sources. We’re taking precautionary measures to protect against the vulnerability from creeping its way into future versions. In version 366, we’re removing that dependency and adding a dedicated rule to the build process to ban log4j as a direct dependency.

What should you do?

Rest assured that there is no vulnerability in your Trino cluster.
If you’ve created your own plugin with one of the affected log4j libraries, you should upgrade as quickly as possible to 2.15.0 or higher.
In the coming weeks, upgrade to the 366 release at your convenience.

We know there can be a lot of concern when vulnerabilities come up. We wish you all the best of luck while you work hard to mitigate the risk of exploits in your systems. If you have any questions, reach out on the Trino Slack.

30: Trino and dbt, a hot data mesh

2021-11-17T00:00:00+00:00

Guests

José Cabeda, Data Engineer at Talkdesk (@jecabeda).
Przemek Denkiewicz, Cloud Ecosystem Engineer at Starburst (@hovaesco).

Trino Summit 2021

If you missed Trino Summit 2021, you can watch it on demand, for free!

Release 364

Trino 364 shipped on the first of November, just after our last episode. Martin’s official announcement mentioned the following highlights:

Support for dynamic filtering in Iceberg connector
Performance improvements when querying small files
Procedure to merge small files in Hive tables
Support for Cassandra UUID type
Support for MemSQL datetime and timestamp types

Manfred’s additional notes:

ALTER MATERIALIZED VIEW ... RENAME TO
A whole bunch of performance improvements
Elasticsearch connector no longer fails with unsupported types
A lot of improvements on Hive and Iceberg connectors
Hive connector has optimize procedure now!
Parquet and avro fixes and improvements
Web UI performance improvement for long query texts

More detailed information is available in the release notes.

Concept of the week: Trino and dbt, a hot data mesh

Data mesh, the buzzword that follows data lakehouse, may feel rather irrelevant for many. This is especially true for those that just want to move from a Hive and HDFS cluster to storing data in object store, or from a cloud data warehouse and query it with Trino.

While data mesh is certainly in the hype cycle phase, it’s actually not a new idea and has very sound principles. Many companies have written their own software and created organizational policies that align with the strategies outlined by the data mesh principles. In essence, these principles aim to make data management for analytics platforms decentralized. This means decentralizing the infrastructure and data engineers managing it to different domains (or products) within a company.

What’s really exciting about data mesh is that much of the technology today makes these theoretical principles more of a reality without having to invent your own services. The author of data mesh, Zhamak Dehghani, lays out 4 principles that characterize a data mesh:

Domain-oriented, decentralized data ownership and architecture
Data as a product
Self-serve data infrastructure as a platform
Federated computational governance

Let’s see what the engineers from Talkdesk are doing to implement their data mesh.

Talkdesk

Talkdesk is a contact center as a service. Talkdesk was created at a Twilio Hackathon in 2011. They just hit a 10 billion dollar valuation. As a fast growing startup, they are growing their product strategy at a fast pace, and deal with a large data sets to analyze regularly.

The Talkdesk product is deployed in cloud infrastructure and provides all the infrastructure for operating a call center. Its architecture is heavily event-driven. Dealing with realtime events at scale is difficult and requires a reactive and flexible architecture.

The early architecture for the analytics platform followed a traditional approach using Spark and Fivetran to ingest data into Redshift. It had various pipelines to update the data for downstream consumption.

This centralized workflow made communication across data entity management much simpler as it all exists on the same team. However, scaling caused increased backlogs, which delayed analysis and deployments. It also made it difficult to handle different use cases like realtime and historical use cases.

The use cases between analytics and transactional are varied and overlapping. Live data typically feeds into stateful databases that updates as data arrives. To analyze data in motion, you need a realtime database. Historical data exists to keep a backup of multiple copies of different states over time. This enables trend analysis over longer periods of time versus right now. One challenge Talkdesk faced was realizing a robust architecture that satisfies analyzing live data that gets the latest changes as they arrive to OLTP databases while meeting all the analytics use cases.

To enable analytics across the various use cases, Talkdesk integrated Trino into their workflow to read data across both live and historic data and merge them. Using Trino enabled reading from live data feeding into their stateful data stores, and reads across historic data stores to produce data in the form needed to support Talkdesk products.

Trino is also used to hide the complexity of the data platform, and allows merging data across mulitple relational and object stores.

Why dbt?

In episode 21 we discussed using dbt and Trino in detail. As we mentioned there:

dbt is a transformation workflow tool that lets teams quickly and collaboratively deploy analytics code, following software engineering best practices like modularity, CI/CD, testing, and documentation. It enables anyone who knows SQL to build production-grade data pipelines.

You can achieve modular, repeatable, and testable units of processing by defining various models and definitions to the data pipelines. For example:

Using the definitions above, Talkdesk engineers were able to consolidate all these tasks into a much more simplified graph of operations.

Why data mesh?

While a lot of focus has gone into the technology aspects of data mesh, there is also a lot to be said about the implications on the data team and socio-political policies that come with data mesh. Talkdesk also made structural changes to their team to improve their data mesh strategy.

How data mesh affects the everyday life of data engineers?

There is a real fear that comes around when management changes business policies. It can be hard to tell how these policies trickle down and affect the engineer’s every day work life. In general, engineers become more entrenched in different domains rather than trying to manage all domains under one architecture. Data engineers are distributed to product teams and specialize in the domain’s data models. They also have specific knowledge of how to use the self-service platform to integrate across other teams.

Comparing microservices-based applications to the data mesh

When we think of a functional system for deploying and managing microservices-based applications, there are several features that we’ve come to expect. It is very easy to compare the features of microservices-based applications to features of a data mesh. Data Mesh: A Software Engineer’s Perspective blog.

PR of the week: Partitioned table tests and fixed PR 9757

This weeks PR of the week is for the Iceberg connector. Release 364 had quite a few improvements for Iceberg and handled small issues that could cause query failure in some scenarios. This PR addressed a query failure when reading a partition on a UUID column.

Thanks to Piotr Findeisen for fixing this and many other bugs, as well as, improving performance in the Iceberg connector!

Question of the week: What’s the difference between `location` and `external_location`?

This week’s question of the week comes from Aakash Nand on Slack and ported to Trino Forum. Aakash asks:

When creating a Hive table in Trino, what is the difference between external_location and location . If I have to create external table I have to use external_location right? What is the difference between these two?

This was answered Arkadiusz Czajkowski:

Tables created with location are managed tables. You have full control over them from their creation to modification. tables created with external_location are tables created by third party systems. We just access them mostly for read. I would encourage you to use location in your case.

Events, news, and various links

Blogs and resources

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

29: What is Trino and the Hive connector

2021-10-28T00:00:00+00:00

Release 364

Release 364 is just around the corner, here is Manfred’s release preview:

ALTER MATERIALIZED VIEW ... RENAME TO
A whole bunch of performance improvements
Elasticsearch connector no longer fails if fields with unsupported types exist
Hive connector has optimize procedure now!
Parquet and Avro fixes and improvements

Concept of the week: What is Trino?

Trino is the project created by Martin Traverso, Dain Sundstrom, David Phillips, and Eric Hwang in 2012 to replace the 300PB Hive data warehouse at Facebook. The goal of Trino is to run fast ad-hoc analytics queries over big data file systems like HDFS and object stores like S3.

An initially unintended but now characteristic feature of Trino is its ability to execute federated queries over various distributed data sources. This includes, but is not limited to: Accumulo, BigQuery, Apache Cassandra, ClickHouse, Druid, Elasticsearch, Google Sheets, Apache Iceberg, Apache Hive, JMX, Apache Kafka, Kinesis, Kudu, MongoDB, MySQL, Oracle, Apache Phoenix, Apache Pinot, PostgreSQL, Prometheus, Redis, Redshift, SingleStore (MemSQL), Microsoft SQL Server.

How does Trino query across everything from data lakes, SQL, and NoSQL databases at unprecedented speeds? It helps to start by going over Trino’s architecture:

Source: Trino: The Definitive Guide.

Trino consists of two types of nodes, coordinator and worker nodes. The coordinator plans, and schedules the processing of SQL queries. The queries are submitted by users directly or with connected SQL reporting tools. The workers actually carry out more of the processing by reading the data from the source or performing various operations within the task(s) they are assigned.

Source: Trino: The Definitive Guide.

Trino is able to query over multiple data types by exposing a common interface called the SPI (Service Provider Interface) that enables the core engine to treat the interactions with each data source the same. Each connector must then implement the SPI which includes exposing metadata, statistics, data location, and establishing one or more connections with an underlying data source.

Source: Trino: The Definitive Guide.

Many of these interfaces are used in the coordinator during the analysis and planning phases. The analyzer, for example, uses the metadata SPI to make sure the table in the FROM clause actually exists in the data source.

Source: Trino: The Definitive Guide.

Once a logical query plan is generated, the coordinator then converts this to a distributed query plan that maps actions into stages that contain tasks to be run on nodes. Stages model the sequence of events and a directed acyclic graph (DAG).

Source: Trino: The Definitive Guide.

The coordinator then schedules tasks over the worker nodes as efficiently as possible, depending on the physical layout and distribution of the data.

Source: Trino: The Definitive Guide.

Data is split and distributed across the worker nodes to provide inter-node parallelism.

Source: Trino: The Definitive Guide.

Once this data arrives to the worker node, it is further divided and processed in parallel. Workers submit the processed data back to coordinator. Finally, the coordinator provides the results of the query to the user.

PR 8821 Add HTTP/S query event logger

Pull request 8821 enables Trino cluster owners to log query processing metadata by submitting it to an HTTP endpoint. This may be used for usage monitoring and alarming, but it might also be used to extract analytics on cluster usage, such as tables/column usage metrics.

Query events are serialized to JSON and sent to the provided address over HTTP or over HTTPS. Configuration allows selecting which events should be included.

Thanks for the contribution mosiac1 and others at Bloomberg!

Read the docs to learn more about this exciting feature!

Question of the week: Does the Hive connector depend on the Hive runtime?

This week’s question covers a lot of the confusion around the Hive connector. In short, the answer is that the Hive runtime is not required. There’s more information available in the Intro to the Hive Connector blog.

Videos

An Overview of the Starburst Trino Query Optimizer (Karol Sobczak)

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

28: Autoscaling streaming ingestion to Trino with Pravega

2021-10-14T00:00:00+00:00

Guests

Derek Moore, Software Senior Principal Engineer at Dell EMC (@derekm00r3).
Andrew Robertson,Principal Software Engineer at Dell EMC (@andrew-robertson).
Karan Singh, Software Engineer 2 at Dell EMC (@singhkaranrakesh).

Trino Summit 2021

Get ready for Trino Summit, coming October 21st and 22nd! This annual Trino community event is where we gather practitioners that deploy Trino at scale and share their experiences and best practices with the rest of the community. While the planning for this event was a bit chaotic due to the pandemic, we have made the final decision to host the event virtually for the safety of all the attendees. We look forward to seeing you there, and can’t wait to share more information in the coming weeks!

Release 363

Official announcement items from Martin:

New HTTP event listener plugin
Insert overwrite for S3-backed tables
Support for Elasticsearch scaled_float type
Support for Cassandra tuple type
Support for time type in MySQL connector
Support for SQLServer datetimeoffset type

Manfred’s additional notes:

Misc performance and memory usage improvements
SHOW ROLES fix
EXPLAIN ANALYZE fix for estimate display
Numerous improvements for Parquet files in Hive and Iceberg connectors

More info at https://trino.io/docs/current/release/release-363.html.

Concept of the week: Event stream abstractions and Pravega

Events and streams

What is an event? This sounds like a silly question when asked generally. The answer is less clear when discussing event-driven systems though. An event is an action or occurrence that is captured by either a sensor, or a generated by a source system, and emitted to a sink system. Some examples include user events from an application, system events in telemetry systems, or sensor events from monitoring applications.

What is an event stream? Now knowing what an event is, an event stream is an unbounded set of events that are tracked over time.

In this simple view, an event stream contains a sequential list of events. The list contains events that have been processed, and some that still need to be processed.

Cloud Native Computing Foundation Presentation: Source.

This is very different from a more realistic view of event streams that considers that events arrive and are processed in parallel. Event load may also fluctuate as events may burst around specific events or events have specific periodic behavior. While taking event ingest (writes) into consideration, it is also important to consider event egress (reads) as part of the problem of representing event streams.

Cloud Native Computing Foundation Presentation: Source.

Pravega and segments

Engineers at Dell Labs wanted to find a better abstraction to solve for the problems they saw in existing event streaming systems. This included how to address this type of constant shift in scaling, while also addressing the brittle storage abstractions that even streams use today. The storage abstraction needs to allow for both real-time and historical analytics. The data along a particular transaction also needs to be consistent.

Cloud Native Computing Foundation Presentation: Source.

Their solution is Pravega. The core of Pravega models streams built around a storage unit called a segment. A segment is an append-only sequence of bytes (not events/records). This offers a greater level of flexibility and better parallelism and serialization over streams. Pravega stream writers are then able to write in parallel increasing ingest throughput.

Cloud Native Computing Foundation Presentation: Source.

You can use routing keys to map events to particular segments. Pravega enforces order within specific keys, but does not guarantee ordering of events across keys. The tradeoff is providing ordering of events versus higher parallelism and better performance.

With segments, you can also scale up and scale down the number of segments depending on the workload you’re experiencing. Another compelling capability this enables is managing transactions in the stream. As writers submit data, they write to a temporary segment, which are merged to a permanent segment on commit.

Cloud Native Computing Foundation Presentation: Source.

The following diagram displays autoscaling splits and merges as specific routing keys become more popular. To provide a clearer example, say that the routing keys are actually just hash geo location values for a taxi app that are mapped between zero and one. As certain locations become crowded, lets say that a lot of people are going home for the work day, and many taxis are in the downtown location. The locations mapped to the downtown routing keys can automatically trigger a split, and once the rush hour is over, it merges these segments as traffic slows down.

Pravega Docs: Source.

Pravega architecture

The Pravega architecture comes with writers groups and reader groups that scale up and down along with the autoscaling applied to the segments. It consists of a controller that maintains stream metadata and the segment store that works off of tier one storage (Apache Bookkeeper) and tier two storage (Object storage).

Pravega Docs: Source.

Just like Trino, Pravega also aims to build a rich set of connectors with systems that act as a source and sink. This includes a connector used for Trino.

Pravega Docs: Source.

Pravega compared to other event streaming platforms.

This chart is very helpful resource to summarize Pravega against other popular streaming platforms. This comes from the Pravega site so be sure to check for an up to date list of these features moving forward.

	Pravega	Kafka	Pulsar
Transactions	✅	✅	✅
Event streams	✅	✅	✅
Long-term retention	✅		✅
Durable by default	✅		✅
Auto-scaling	✅
Ingestion of large data (video)	✅
efficient at high partition counts	✅
Consistent state replication	✅
Key-value tables	✅

Comparison between Pravega, Kafka, and Pulsar: Source

Demo of the week: Querying Pravega from Trino

This week the Pravega teams demonstrates an example from their getting-started tutorial for the Trino connector.

PR of the week: Pravega presto-connector PR 49

This weeks PR of the week doesn’t come from the Trino repository this week but rather the presto-connector repository. The Trino portion of the repository was committed by Dell engineer Karan Singh. As it states, this now makes Pravega available from Trino along with the original Presto connector.

Thanks Karan for adding Trino and Andrew for writing the original Presto-Pravega connector!

Question of the week: What is the point of Trino Forum and what is the relationship to Trino Slack?

Our question of the week comes from the new Trino Forum by Starburst. Brian and a few others at Starburst created. Slack is a much more adhoc platform for people to work through problems rather than to search and find solutions to problems. The Trino community has such a great amount of knowledge accumulated in this Slack channel, but there is no way for people to find answers unless they have joined here and none of the information we discuss can be found by a search engine like Google.

Further, a lot of the answers are scattered between different conversations and this too can be condensed and simplified. I pondered about the best way for us to expose this and though maybe to add an FAQ page on but this would get stale quickly and this would require a lot of work to be maintained at scale without a crowdsourcing element. Instead, starting a [Discourse forum](https://www.discourse.org) (not to be confused with Discord) acts as a central repository of knowledge makes this information easily searchable. The forum is maintained by some of us at Starburst but over time we want more moderators from the community (this happens through merit and consistency using Discourse Trust levels).

Events, news, and various links

Blogs and resources

Pravega: Rethinking Storage For Streams

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

JVM challenges in production

2021-10-06T00:00:00+00:00

At Comcast, we have a large on-premise Trino cluster. It enables us to extract insights from data no matter where it resides, and prepares the company for a more cloud-centric future. Recently, however, we experienced and overcame challenges related to the Java virtual machine (JVM). We wanted to share what we encountered and learned in hopes that it might be useful for the Trino community.

JIT recompilation

Some users complained that nightly reports were taking far too long to complete. Queries that ran for six hours made very little progress.

First, we looked at the queries involved in these nightly reports. We noticed that all these queries involved two particular tables. In this post, let’s call them table A and table B.

Our initial suspicion was that there could be an issue with the table data in HDFS. Thus, we tried to reproduce the performance problem by using queries that performed simple scans against these tables.

We tried a simple table scan with no filters, range filter on a partitioned column, etc., ran these queries multiple times and execution times were consistent. This ruled out a potential problem with HDFS.

Next, we took a closer look at the portion of the slow running queries involving table A, and came up with the simplest possible query that could demonstrate the problem. We discovered that the following query did not exhibit the performance problem:

SELECT
 count(a.c1)
FROM
 hive.schema1.A a, hive.schema2.B da
WHERE
 a.day_id = da.date_id
 AND a.day_id BETWEEN '2021-03-22' AND '2021-04-21'

But adding a predicate, a.c2 = '4 (Success)', caused the performance problem to appear:

SELECT
 count(a.c1)
FROM
 hive.schema1.A a, hive.schema2.date_dim da
WHERE
 a.day_id = da.date_id
 AND a.day_id BETWEEN '2021-03-22' AND '2021-04-21'
 AND a.c2 = '4 (Success)'

We narrowed the problem down to the Scan/Filter/Project operator using the output of EXPLAIN ANALYZE from Trino. For the query that performed as expected, this stage had the following CPU stats:

CPU: 2.39h, Scheduled: 4.47h, Input: 17434967615 rows (357.47GB)

For the version of the query with the additional predicate, a.c2 = '4 (Success)', that exhibited the performance problem, the same stage has the following CPU stats:

CPU: 3.73d, Scheduled: 48.01d, Input: 17052985227 rows (413.98GB)

This shows that for roughly the equivalent amount of data, Trino used significantly more CPU (3.73 days to 2.39 hours!!). Our next step was to determine possible reasons.

We generated a few jstack and Java flight recorder (JFR) profiles of the Trino Java process from one of the worker nodes while the scan stage was running. After analyzing these profiles, we found no obvious problem. Trino performed as expected.

Next, we looked at the list of tasks in the web UI to see what the distribution of CPU times for each stage was:

Some workers have tasks that only use up a few minutes of CPU time and others have tasks that use up to 2 hours of CPU time! Different query runs would show this would happen to different workers so it was not a problem with any one individual worker.

We discussed this with Starburst engineer, Piotr Findeisen, and came to the conclusion that this could potentially be an issue with JVM code deoptimization. After re-compiling a method a certain number of times, the JVM refuses to do so any more and will run the method in interpreted mode, which is much slower.

The evidence for this is what we highlighted above: that the CPU used by the same tasks on different workers vary by a factor of approximately 30. This is the typical difference for compiled versus interpreted code, according to Piotr’s experience at Starburst.

The following JVM options were added to the Trino jvm.config file to help with this issue:

-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000

These settings increased the recompilation cutoff limit. They are now also included in the default jvm.config settings that ship with Trino since the 348 release.

Since we have been running Trino in production for some time, we did not have these settings in our jvm.config.

Initial results

Execution time observed with the JVM options in place was 4 minutes and 51 seconds. The CPU stats for the scan/filter/project stage for this query now look like:

CPU: 3.22h, Scheduled: 7.21h, Input: 17631445897 rows (428.03GB)

The CPU used by individual tasks is much more uniform:

Code cache

We noticed that the cluster’s overall CPU utilization decreased after the cluster was up for a few days, and there would be a few workers where tasks were running slow.

When looking at these workers with slow running tasks, we found that CPU usage was very high:

[root@worker-node log]# uptime
 21:36:57 up 20 days, 20:39,  1 user,  load average: 149.92, 152.83, 144.82
[root@worker-node log]#

We also noticed all these workers had messages like this in the launcher.log file:

[219756.210s][warning][codecache] Try increasing the code heap size using -XX:ProfiledCodeHeapSize=
OpenJDK 64-Bit Server VM warning: CodeHeap 'profiled nmethods' is full. Compiler has been disabled.
OpenJDK 64-Bit Server VM warning: Try increasing the code heap size using -XX:ProfiledCodeHeapSize=
CodeHeap 'non-profiled nmethods': size=258436Kb used=235661Kb max_used=257882Kb free=22774Kb
 bounds [0x00007f466f980000, 0x00007f467f5e1000, 0x00007f467f5e1000]
CodeHeap 'profiled nmethods': size=258432Kb used=207330Kb max_used=216383Kb free=51101Kb
 bounds [0x00007f465fd20000, 0x00007f466f980000, 0x00007f466f980000]
CodeHeap 'non-nmethods': size=7420Kb used=1881Kb max_used=3766Kb free=5538Kb
 bounds [0x00007f465f5e1000, 0x00007f465fab1000, 0x00007f465fd20000]
 total_blobs=64220 nmethods=62699 adapters=1432
 compilation: disabled (not enough contiguous free space left)
              stopped_count=4, restarted_count=3
 full_count=3

Once the code cache is full, the JVM won’t compile any additional code until space is freed.

We were running with the -XX:ReservedCodeCacheSize JVM option set to 512M. To see what’s taking up space in the code cache, we used jcmd:

jcmd <TRINO_PID> Compiler.CodeHeap_Analytics

We ran this at various intervals so we could compare how the code cache changed over time.

30 of the top 48 non-profiled methods were PagesHashStrategy, which are generated per-query. These can’t be removed from the cache until the query is completed, so the amount of cache needed is going to be relative to the concurrency. We have a very busy cluster with significant concurrency at our busiest times.

Next, we set -XX:ReservedCodeCacheSize to 2G to see how that would help. We have not seen the code cache fill while the cluster has been running since increasing the size to 2GB. We can also monitor the size of the code cache over time using JMX. One query that can be used if you have the JMX catalog enabled on your cluster is:

SELECT
    node,
    regexp_extract(usage, 'max=(-?\d*)', 1) as max,
    regexp_extract(usage, 'used=(-?\d*)', 1) AS used
FROM
  jmx.current."java.lang:name=codeheap 'non-profiled nmethods',type=memorypool"
ORDER BY used DESC

Off heap memory usage

One final JVM issue we noticed in our production cluster was that off-heap memory on some workers grew to be quite large. We allocate approximately 85% of the physical memory on our workers for the JVM heap. Recently, we received alerts from our monitoring systems that memory consumption on our workers got dangerously close to the physical limit on the machines.

We noticed some memory related issues from the Alluxio client in the Trino worker logs on machines generating these high memory alerts. Upon further investigation, we noticed that Trino was running with the open source version of the Alluxio client. Trino ships with version 2.4.0 of the Alluxio client. We are an Alluxio customer and use it in our environment.

After discussing with Alluxio, they suggested we upgrade to version 2.4.1 of their Enterprise client which includes a fix for an off-heap memory leak bug. After upgrading to the Alluxio Enterprise client, the off-heap memory usage became a lot more stable.

Summary

This post outlined some of the JVM issues we encountered while running Trino in production. Many of these issues we only hit in our production environment and were difficult to replicate outside of it. Thus, we wanted to write up our experience with the hopes of helping other Trino users in the future!

27: Trino gits to wade in the data LakeFS

2021-09-30T00:00:00+00:00

Guests

Paul Singman, Developer Advocate at Treeverse (@datawhisp).

Trino Summit 2021

Get ready for Trino Summit, coming October 21st and 22nd! This annual Trino community event is where we gather practitioners that deploy Trino at scale, and share their experiences and best practices with the rest of the community. While the planning for this event was a bit chaotic due to the pandemic, we have made the final decision to host the event virtually for the safety of all the attendees. We look forward to seeing you there, and can’t wait to share more information in the coming weeks!

Concept of the week: LakeFS and Git on object storage

LakeFS offers git-like semantics over your files in the data lake. Akin to the versioning you can do on Iceberg, you can also version your data with LakeFS, and roll back to previous commits when you make a mistake. LakeFS allows you to roll out new features in production or prod-like environments with ease and isolation from the real data. Join us as we dive into this awesome new way to approach versioning on your data!

Why we built LakeFS: Source.

Features

Exabytes scale version control
Git-like operations: branch, commit, merge, revert
Zero copy branching for frictionless experiments
Full reproducibility of data and code
Pre-commit/merge hooks for data CI/CD
Instantly revert changes to data

Use cases

In development

Experiment - try new tools, upgrade versions, and evaluate code changes in isolation. By creating a branch of the data you get an isolated snapshot to run experiments over, while others are not exposed. Compare between branches with different experiments or to the main branch of the repository to understand a change’s impact.
Debug - checkout specific commits in a repository’s commit history to materialize consistent, historical versions of your data. See the exact state of your data at the point-in-time of an error to understand its root cause.
Collaborate - avoid managing data access at the two extremes of either treating your data lake like a shared folder or creating multiple copies of the data to safely collaborate. Instead, leverage isolated branches managed by metadata (not copies of files) to work in parallel.

During deployment

Version Control - deploy data safely with CI/CD workflows borrowed from software engineering best practices. Ingest new data onto an isolated branch, perform data validations, then add to production through a merge operation.
Test - define pre-merge and pre-commit hooks to run tests that enforce schema and validate properties of the data to catch issues before they reach production.

In production

Roll back - recover from errors by instantly reverting data to a former, consistent snapshot of the data lake. Choose any commit in a repository’s commit history to revert in one atomic action.
Troubleshoot - investigate production errors by starting with a snapshot of the inputs to the failed process. Spend less time re-creating the state of datasets at the time of failure, and more time finding the solution.
Cross-collection consistency - provide consumers multiple synchronized collections of data in one atomic, revertable action. Using branches, writers provide consistency guarantees across different logical collections - merging to the main branch only after all relevant datasets have been created or updated successfully.

Source: https://docs.lakefs.io/#use-cases

Demo of the week: Running Trino on LakeFS

In order to run Trino and LakeFS, you need Docker installed on your system with at least 4GB of memory allocated to Docker.

Let’s start up the LakeFS instance and the required PostgreSQL instance along with the typical Trino containers used with the Hive connector. Clone the trino-getting-started repository and navigate to the community_tutorials/lakefs/trino-lakefs-minio/ directory.

git clone git@github.com:bitsondatadev/trino-getting-started.git

cd community_tutorials/lakefs/trino-lakefs-minio/

docker-compose up -d

Once this is done, you can navigate to the following locations to verify that everything started correctly.

Navigate to http://localhost:8000 to open the LakeFS user interface.
Log in with Access Key, AKIAIOSFODNN7EXAMPLE, and Secret Access Key, wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY.
Verify that the example repository exists in the UI and open it.
The branch main in the repository, found under example/main/, should be empty.

Once you have verified the repository exists, let’s go ahead and create a schema under the Trino Hive catalog called minio that was pointing to minio but is now wrapped by LakeFS to add the git-like layer around the file storage.

Name the schema tiny as that is the schema we copy from the TPCH data set. Notice the location property of the schema. It now has a namespace that is prefixed before the actual tiny/ table directory. The prefix contains the repository name, then the branch name. All together this follows the pattern of <protocol>://<repository>/<branch>/<schema>/.

CREATE SCHEMA minio.tiny
WITH (location = 's3a://example/main/tiny');

Now, create two tables, customer and orders by setting external_location using the same namespace used in the schema and adding the table name. The query retrieves the data from the tiny TPCH data set.

CREATE TABLE minio.tiny.customer
WITH (
  format = 'ORC',
  external_location = 's3a://example/main/tiny/customer/'
) 
AS SELECT * FROM tpch.tiny.customer;

CREATE TABLE minio.tiny.orders
WITH (
  format = 'ORC',
  external_location = 's3a://example/main/tiny/orders/'
) 
AS SELECT * FROM tpch.tiny.orders;

Verify that you can see the table directories in LakeFS once they exist. http://localhost:8000/repositories/example/objects?ref=main&path=tiny%2F

Run a query on these two tables using the standard table pointing to the main branch.

SELECT ORDERKEY, ORDERDATE, SHIPPRIORITY
FROM minio.tiny.customer c, minio.tiny.orders o
WHERE MKTSEGMENT = 'BUILDING' AND c.CUSTKEY = o.CUSTKEY AND
ORDERDATE < date'1995-03-15'
GROUP BY ORDERKEY, ORDERDATE, SHIPPRIORITY
ORDER BY ORDERDATE;

Open the LakeFS UI again and click on the Unversioned Changes tab. Click Commit Changes. Type a commit message on the popup and click Commit Changes.

Once the changes are commited on branch main, click on the Branches tab. Click Create Branch. Name a new branch sandbox that branches off of the main branch. Now click Create.

Although there is a branch that exists called sandbox, this only exists logically. We need to make Trino aware by adding another schema and tables that point to the new branch. Do this by making a new schema called tiny_sandbox and changing the location property to point to the sandbox branch instead of the main branch.

CREATE SCHEMA minio.tiny_sandbox
WITH (location = 's3a://example/sandbox/tiny');

Once the tiny_sandbox schema exists, we can copy the table definitions of the customer and orders table from the original tables created. We got the schema for free by copying it directly from the TPCH data using the CTAS statement. We don’t want to use CTAS in this case as it not only copies the table definition, but also the data. This duplication of data is unnecessary and is what creating a branch in LakeFS avoids. We want to just copy the table definition using the SHOW CREATE TABLE statement.

SHOW CREATE TABLE minio.tiny.customer;
SHOW CREATE TABLE minio.tiny.orders;

Take the output and update the schema to tiny_sandbox and external_location to point to sandbox for both tables.

CREATE TABLE minio.tiny_sandbox.customer (
   custkey bigint,
   name varchar(25),
   address varchar(40),
   nationkey bigint,
   phone varchar(15),
   acctbal double,
   mktsegment varchar(10),
   comment varchar(117)
)
WITH (
   external_location = 's3a://example/sandbox/tiny/customer',
   format = 'ORC'
);

CREATE TABLE minio.tiny_sandbox.orders (
   orderkey bigint,
   custkey bigint,
   orderstatus varchar(1),
   totalprice double,
   orderdate date,
   orderpriority varchar(15),
   clerk varchar(15),
   shippriority integer,
   comment varchar(79)
)
WITH (
   external_location = 's3a://example/sandbox/tiny/orders',
   format = 'ORC'
);

Once these table definitions exist, go ahead and run the same query as before, but update using the tiny_sandbox schema instead of the tiny schema.

SELECT ORDERKEY, ORDERDATE, SHIPPRIORITY
FROM minio.tiny_sandbox.customer c, minio.tiny_sandbox.orders o
WHERE MKTSEGMENT = 'BUILDING' AND c.CUSTKEY = o.CUSTKEY AND
ORDERDATE < date'1995-03-15'
ORDER BY ORDERDATE;

One last bit of functionality we want to test is the merging capabilities. To do this, create a table called lineitem in the sandbox branch using a CTAS statement.

CREATE TABLE minio.tiny_sandbox.lineitem
WITH (
  format = 'ORC',
  external_location = 's3a://example/sandbox/tiny/lineitem/'
) 
AS SELECT * FROM tpch.tiny.lineitem;

Verify that you can see three table directories in LakeFS including lineitem in the sandbox branch. http://localhost:8000/repositories/example/objects?ref=sandbox&path=tiny%2F

Verify that you do not see lineitem in the table directories in LakeFS in the main branch. http://localhost:8000/repositories/example/objects?ref=main&path=tiny%2F

You can also verify this by running queries against lineitem in the tables pointing to the sandbox branch that should fail on the tables pointing to the main branch.

To merge the new table lineitem to show up in the main branch, first commit the new change to sandbox by again going to Unversioned Changes tab. Click Commit Changes. Type a commit message on the popup and click Commit Changes.

Once the lineitem add is committed, click on the Compare tab. Set the base branch to main and the compared to branch to sandbox. You should see the addition of a line item show up in the diff view. Click Merge and click Yes.

Once this is merged you should see the table data show up in LakeFS. Verify that you can see lineitem in the table directories in LakeFS in the main branch. http://localhost:8000/repositories/example/objects?ref=main&path=tiny%2F

As before, we won’t be able to query this data from Trino until we run the SHOW CREATE TABLE from the tiny_sandbox schema and use the output to create the table in the tiny schema that is pointing to main.

PR of the week: PR 8762 Add query error info to cluster overview page in web UI

The PR of the week adds some really useful context around query failures in the Trino Web UI. This PR was created by Pádraig O’Sullivan . For many, it can be fustrating when a query fails and you have to do a lot of digging before you understand even the type of error that is happening.This PR gives a better highlight of what failed so that you don’t have to do a lot of investigation upfront to get a sense of what is happening and where to look next.

Thank you so much Pádraig!

Question of the week: Why are deletes so limited in Trino?

Our question of the week comes from Marius Grama on our Trino community Slack. Marius created the dbt-trino adapter and wants to implement INSERT OVERWRITE functionality.

INSERT OVERWRITE checks whether there are entries in the target table that exist as well in the staging table, and it first deletes the target entries, before inserting the staging entries. Unfortunately the delete didn’t work for RDBMS, Hive, or Iceberg. His questionis if this is a limitation of Trino for all connectors, and how we can approach the “delete” part of INSERT OVERWRITE

Events, news, and various links

Blogs and Resources

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Announcing Trino Summit

2021-09-23T00:00:00+00:00

Greetings Trino nation,

Get ready for this year’s virtual Trino Summit event! This year’s summit feels a little different as the name of the event has changed from Presto to Trino. So this will be the first event of the project hosted under the new banner of Trino.

This year’s Summit is hosted by Starburst virtually on October 21st and 22nd. We’d originally set the date on September 15th but later realized that this was conflicting with Yom Kippur. While we had originally set out to make this event a hybrid format, we had to make the difficult decision of moving the event to fully virtual in lieu of the growing health concerns around contracting and spreading the delta variant. If you haven’t registered yet, register here. If you planned on attending in person, we will still have your registration and you will still be able to attend virtually.

Get excited for our great lineup of speakers, panels, and presentations! We’re always on the lookout for speakers who are excited to share their Trino experiences.

We look forward to seeing you there!

26: Trino discovers data catalogs with Amundsen

2021-09-16T00:00:00+00:00

Guests

Mark Grover, Co-creator of Amundsen and Founder at Stemma (@mark_grover).

Release 362

Official announcement items from Martin is not yet available since release it not out… but soon.

Manfreds notes:

Add new listagg function contributed by Marius
Join performance and DISTINCT performance improvements
SQL security related changes in ALTER SCHEMA
Add IN table for CREATE/DROP/… ROLE
Whole bunch of improvements in the BigQuery connector
Numerous improvements for Parquet file usage in Hive connector
All connector docs now have SQL support section

Concept of the week: Data discovery and Amundsen

Data discovery is a process that aids in the analysis of data where siloed data has been centralized, and it is difficult to find data or overlap between disparate data sets. Many teams have their own view of the world when it comes to the data they need, but they commonly need to reason about how their data relates to data outside of their domain.

There are typically questions about who owns what data to help identify individuals responsible for maintaining the standards. Additionally, there are also issues around providing documentation around the data, and to identify who to call for help if there are issues using the data. This allows analysts to discover patterns in the data, and periodically audit the data storage practices. Interesting questions also arise around existing policies, and can encourage a system of record that act as a shared front end around their data policies.

What is Amundsen?

Amundsen provides data discovery by using ETL processes to scrape metadata from all of the data sources. It creates a central location to collect all that metadata and enables search and other analytics of this metadata. Here’s how the project describes itself on the Amundsen website:

Amundsen is a data discovery and metadata engine for improving the productivity of data analysts, data scientists and engineers when interacting with data. It does that today by indexing data resources (tables, dashboards, streams, etc.) and powering a page-rank style search based on usage patterns (e.g. highly queried tables show up earlier than less queried tables).

Amundsen has an architecture that interacts primarily with information_schema tables, among other metadata, depending on the data source. In Trino’s case, the extractor used connects directly to the Hive metastore database, for Trino views, since they’re stored there. Physical tables use the HiveTableMetadataExtractor to load these tables into Amundsen. This makes sense since the data is stored in the Hive table format. For non-Hive use cases, you generally want to bypass using Trino (for now) and directly connect Amundsen to each data source.

Amundsen includes an ETL framework called databuilder that runs multiple jobs. Jobs contain an ETL task to extract the metadata and load it into the two databases that are central to Amundsen, Neo4j and Elasticsearch. Neo4j stores the core metadata that is represented on the UI. Elasticsearch enables search over the many fields in the metadata. Ingestion via ETL follows the following steps:

Ingest base data to Neo4j.
Ingest additional data and decorate Neo4j over base data.
Update Elasticsearch index using Neo4j data.
Remove stale data.

Each job contains an ETL task. The task must define an extractor and a loader, and optionally a translator. You can see example configurations for different extractors on the website, like the example for the HiveTableMetadataExtractor.

The metadata is modeled using a graph representation in neo4j and optionally Apache Atlas to model advanced concepts, such as, lineage and other relations.

You can learn more about the models in the metadata here.

Amundsen resources

Docs: https://www.amundsen.io/amundsen/
GitHub: https://github.com/amundsen-io/amundsen
YouTube: https://www.youtube.com/playlist?list=PL0UJdxehTNlKnGU_h7k2fzJyvAiufeh1U
Slack: Join

Amundsen as a subcomponent to data mesh

A new architecture, philosophy, and yes, buzzword that is gaining momentum is the data mesh. While it certainly still not concretely defined, it is in the research and development phase. Data mesh is gaining a lot of attention as a potential alternative to data lakes and data warehouses for analytics solutions.

Data mesh mirrors the philosophy of microservice architecture. It argues that data should be defined and maintained by teams responsible for their business domain similar to how the responsibility is delegated at the service layer. Since not everyone is going to be a data engineer on the domain team, there must be some consideration for the architecture of such a platform. The author of this paradigm, Zhamak Dehghani, lays out 4 principles that characterize a data mesh. Below are the principles of a Data mesh. Below the systems that provide some or all of the solution for a principle are listed in parentheses.

Domain-oriented decentralized data ownership and architecture (Trino & Amundsen)
Data as a product (Amundsen)
Self-serve data infrastructure as a platform (Trino)
Federated computational governance (Amundsen to some extent)

Stemma

Like with many successful open source projects, there are enterprise products that build on and support the open source project. Stemma is the enterprise company that supports Amundsen. It’s founded by Mark and others central to the open source project.

PR of the week: Index Trino views

The PR (or should we say commit) of the week, adds the original Trino extractor. As mentioned above this extractor is only needed for views as the physical tables exist in Hive and are retrieved.

Call to contribute to Amundsen

If you want to help out, you can consider adding the Trino image similar to this commit completed a while back.

Demo: Extracting metadata from Hive metastore and loading it into Amundsen

There were technical difficulties on the day of broadcasting the show, so the demo was moved to its own separate video.

The steps in this demo are adapted from the Amundsen installation page. Clone this repository and navigate to the trino-getting-started/community_tutorials/amundsen directory. For this demo you need at least 3GB of memory allocated to your Docker application.

git clone git@github.com:bitsondatadev/trino-getting-started.git

cd community_tutorials/amundsen

docker-compose up -d

Once all the services are running, clone the Amundsen repository in a separate terminal. Then navigate to the databuilder folder and install all the dependencies:

git clone --recursive https://github.com/amundsen-io/amundsen.git
cd databuilder
python3 -m venv venv
source venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 setup.py install

Navigate to MinIO at http://localhost:9000 to create the tiny bucket for the schema in Trino to map to. In Trino, create a schema and a couple tables in the existing minio catalog:

CREATE SCHEMA minio.tiny
WITH (location = 's3a://tiny/');

CREATE TABLE minio.tiny.customer
WITH (
  format = 'ORC',
  external_location = 's3a://tiny/customer/'
) 
AS SELECT * FROM tpch.tiny.customer;

CREATE TABLE minio.tiny.orders
WITH (
  format = 'ORC',
  external_location = 's3a://tiny/orders/'
) 
AS SELECT * FROM tpch.tiny.orders;

Navigate back to the trino-getting-started/community_tutorials/amundsen directory in the same Python virtual environment you just opened.

cd trino-getting-started/community_tutorials/amundsen
python3 assets/scripts/sample_trino_data_loader.py

View the Amundsen UI at http://localhost:5000 and try to search test, it should return the tables you just created.

You can verify dummy data has been ingested into Neo4j by visiting http://localhost:7474/browser/. Log in as neo4j with the test password and run MATCH (n:Table) RETURN n LIMIT 25 in the query box. You should see few tables.

If you have any issues, look at some of the troubleshooting steps in the Amundsen installation page.

Question of the week: Can I add a UDF without restarting Trino?

This weeks question of the week comes in from the Trino Slack from Chen Xuying.

Is there any way to register a new user defined function (UDF) and needn’t restart coordinator and worker?

Currently, no. In Java, jar files and all the java code is loaded up on start time. So in order to load the files on all the worker nodes and coordinator, you need to restart. There are various ways for UDFs to be implemented in a dynamic way so we are still looking for a suggestion here.

One option, as Manfred mentions, would be to load Javascript as a UDF as Java allows to compile Javascript. This would allow for new functions to be added without restart. There may be other ways to acheive and we invite you to contribute your ideas!

Events, news, and various links

Blogs

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

25: Trino going through changes

2021-09-02T00:00:00+00:00

Guests

Ayush Chauhan, Data Platform Engineer at Zomato (Ayush Chauhan).
Gunnar Morling, Lead of Debezium and Open source software engineer at Red Hat (@gunnarmorling).
Ashhar Hasan, Software Engineer at Starburst (@hashhar).

Release 361

Official announcement items from Martin:

Support for OAuth2/OIDC opaque access tokens
Aggregation pushdown for Pinot
Better performance for Parquet files with column indexes
Support for reading fields as JSON values in Elasticsearch

Manfred’s additional notes:

Predicate pushdown in Cassandra
Metadata cache size limitation in a few connectors
Lots of improvements for Hive view support
Glue table statistics improvements

More info at https://trino.io/docs/current/release/release-361.html.

Concept of the week: Change Data Capture

If you know Trino, you know it allows for flexible architectures that include many systems with varying use cases they support. We’ve come to accept this potpourri of systems as a general modus operandi for most businesses.

Many times the data gets copied to different systems to accomplish varying use cases from performance and data warehousing to merge cross cutting data into a single store. When copying data between systems, how do these systems stay in sync? It’s a critical need especially for Trino to know that the state across the data sources we query is valid.

To answer this, we can use the concept of Change Data Capture (CDC). CDC is a powerful concept that considers a data source(s), called a systems of record(s), that store the true state of a system. The systems of records are monitored for changes, and upon detecting changes, the CDC system propogates changes to a number of target systems.

Change Data Capture: Source.

Debezium for CDC

One implemention of CDC that has grown tremendously in popularity since its inception is called Debezium. According to https://debezium.io:

Debezium is an open-source distributed platform for change data capture. Start it up, point it at your databases, and your apps can start responding to all of the inserts, updates, and deletes that other apps commit to your databases. Debezium is durable and fast, so your apps can respond quickly and never miss an event, even when things go wrong.

The common way Debezium is deployed in the wild is using [Kafka Connect(https://docs.confluent.io/platform/current/connect/index.html) and defining the Debezium source connectors. You can then use the Kafka Connect ecosystem to create to different targets downstream.

The Debezium architecture with Kafka Connect: Source.

Another alternative, if you don’t want to use Kafka, is to use dedicated Debezium servers to implement CDC and push the logs to the target database downstram using Debezium connectors.

The Debezium standalone server architecture: Source.

While CDC is the primary focus, Debezium also provides support for more advanced concepts such as the outbox pattern support for Quarkus apps.

Debezium + Trino at Zomato

Zomato is a technology platform that connects customers, restaurant partners and delivery partners, serving their multiple needs. Customers use their platform to search and discover restaurants, read and write customer generated reviews and view and upload photos, order food delivery, book a table and make payments while dining-out at restaurants. Clearly there’s a lot of data that can flow through a platform like this. You’ll have both operational databases to support the applications in this platform, but also need big data stores to store and analyze all of this data.

Here is one of the earlier iterations of Zomato’s big data architecture before they were able to integrate Debezium. Ayush covers some of the pain points they experienced before implementing CDC.

Once Zomato implemented CDC, they were able to keep their downstream Iceberg stores in sync across multiple operational systems. As a result the analytics data is now much more dependable.

PR of the week: PR 4140 Implement aggregation pushdown in Pinot

The PR of the week is actually a throwback to episode thirteen, Trino takes a sip of Pinot, where our guest Elon Azoulay discussed some of the upcoming features coming to the Pinot connector were. Push down aggregates was on that list and this just landed in the 361 release!

This PR implements aggregation pushdown for COUNT, AVG, MIN, MAX, SUM, COUNT(DISTINCT) and approx_distinct. It is enabled by default and can be disabled using the configuration property pinot.aggregation-pushdown.enabled or the catalog session property aggregation_pushdown_enabled.

FYI: https://github.com/trinodb/trino/pull/9208

Thanks Elon!

Question of the week: Is there an array function that flattens a row like `1 | [a, b, c]` into three rows?

Our question of the week comes from Brian Hudson on our Trino community Slack. Brian is dealing with an ARRAY type in one column and an INTEGER column in another. This is common when processing nested denormalized data. The goal is to make this row 1 | [a, b, c], split the array into three rows.

| a
| b
| c

Kasia answered this question by using the UNNEST on the array column. This UNNEST statement produces a single column of the size of the array and a JOIN is performed with the original INTEGER column.

WITH t(x, y) AS (VALUES (1, ARRAY['a', 'b', 'c']))
SELECT x, y_unnested
FROM t
LEFT JOIN UNNEST (t.y) t2(y_unnested) ON true;

trino> WITH t(x, y) AS (VALUES (1, ARRAY['a', 'b', 'c']))
     -> SELECT x, y_unnested
     -> FROM t
     -> LEFT JOIN UNNEST (t.y) t2(y_unnested) ON true;
 x | y_unnested
---+------------
 1 | a
 1 | b
 1 | c
(3 rows)

Events, news, and various links

Blogs and Resources

Videos

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

24: Trinetes I: Trino on Kubernetes

2021-08-19T00:00:00+00:00

This is the first episode in a series where we cover the basics and just enough advanced Kubernetes features and information to understand how to deploy Trino on Kubernetes.

Concept of the week: K8s architecture: Containers, Pods, and kubelets

For this concept of the week, we want to provide you a minimalistic overview of what you need to know about Kubernetes to deploy Trino to a cluster.

Why Kubernetes? Kubernetes is a container orchestration platform that allows you to indicate how to manage containers declaritively using yaml configuration files. This definition can be tricky to understand if you don’t have proper context. To make sure nobody is left behind, it is useful to cover what containers are:
- The traditional way to deploy an application is to take the compiled binary of that application and run it directly on computer hardware that has an operating system to run the application on it. This works, but has a lot of dependency on the underlying hardware and operating system to be functional and requires multiple applications to share the same resources. If one of the applications fails and causes any of the shared resources to crash, it could cause all applications to fail on that machine.
- To remove these dependencies, engineers created virtual machines (VMs) by using a VM manager called the hypervisor that emulate hardware environments to host other operating systems. This is a big step forward as now each application can be isolated, but it comes at a great cost. Each virtual machine hosts an entire operating system and is resource intensive and slow.
- Containers are the newest type of deployment. Containers enable a logical isolation of resources while still physically running on shared resources. All resources created in the hardware and operating systems exist on the host system. The isolation restricts any interference from other processes. Containers achieve the goals of virtualization without sacrificing much performance or efficiency.
Source: https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/
- Containerization simplified a trend in service oriented architecture called microservices. Microservices deploy loosely coupled and modular applications rather than all-encompassing monolithic applications. With containers, these applications can be deployed and scaled up quickly across various virtual and physical machines without affecting other applications on the same machine. This is great, but results in new complexities. Some examples are the need for new approaches to monitoring the health of applications, scaling the applications as requests grow and diminish, redeploying crashed applications, and networking the applications together. In summary, all of these activities can be considered container orchestration and this is exactly what Kubernetes solves!
Source: https://www.slideshare.net/devopsdaysaustin/continuously-delivering-microservices-in-kubernetes-using-jenkins
Here we hae two services that each sit behind a load balancer provided and mapped by the Kuberenets cluster.
Kubernetes components and architecture:
- Node - The physical machine or VM running a kubelet and container runtime.
- Control Plane - The container orchestration layer that exposes the API and interfaces to define, deploy, and manage the lifecycle of containers.
- Cluster - a set of nodes connected to the same control plane.
- Pod - single instance of an application, the smallest object in kubernetes.
Source: https://kubernetes.io/docs/concepts/overview/components/

Kubernetes control plane components:

API server that nodes connect with and is the front end for users and administrators of the cluster.
etcd keystore is a distributed store containing all data used to manage the cluster
Scheduler that distributes work across nodes and assigns newly created containers to nodes
Controllers that are the brain behind orchestration and monitors for nodes going down etc…

Kubernetes worker node components:

container runtime - underlying runtime used to manage containers
kubelet - agent that checks the health and manages the pods running on the node based on the desired state provided in the PodSpec
kube-proxy - network proxy that maintains network rules applied to nodes and allows network access between Pods in a cluster

You can scale up multiple pods on a single node until the node has no more resources, at which time a new node needs to be added and pod instances are distributed between the nodes.

So how does this relate to Trino?

Out of the box, Kubernetes can do these key things for Trino.
- Simple scale up and down (manually tell k8s to start or kill Trino pods).
- Kubernetes supports failover, meaning that your workers will restart if they die.
Advanced jobs that could exist but not currently in open source.
- Auto-scaling via the Horizontal Pod Autoscaler and custom metrics.
- Graceful Shutdowns are hooks that you can add into your cluster that wait to shut down to avoid a failed call to a node that already shut down.
Source: https://learnk8s.io/graceful-shutdown

Source: https://learnk8s.io/graceful-shutdown

What the heck are helm charts then?

Helm is package manager for Kubernetes
Removes the need for managing lots of Kubernetes related yaml files
Best way to deploy apps to Kubernetes
Charts are available for many different applications
Helm chart for Trino

PR of the week: PR 11 Merge contributor version of k8s charts with the community version

This weeks PR of the week comes from a different repo under the trinodb org, trinodb/charts. This PR contains the merging from contributor Valeriano Manassero.

Valerino maintains a very useful helm chart, that started before the Trino org had defined our own community chart. This pull request effectively is trying to merge some useful features Valeriano added to his Trino helm chart so that it can be maintained in the community version.

Valeriano’s Trino Helm Chart: https://artifacthub.io/packages/helm/valeriano-manassero/trino

It hasn’t been merged yet but we are really looking forward to seeing this get merged in. Thanks Valeriano!

Demo: Running the Trino charts with kubectl

For this weeks demo, you need to install kubectl, minikube using the docker driver, and helm. You can find the trino helm chart on ArtifactHub at this URL.

https://artifacthub.io/packages/helm/trino/trino

First, start your minikube instance.

minikube start --driver=docker

Now take a quick look at the state of your k8s cluster.

kubectl get all

Add the template for the different trino catalogs on coordinators and workers.

kubectl apply -f - <<EOF
# Source: trino/templates/configmap-catalog.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: tcb-trino-catalog
  labels:
    app: trino
    chart: trino-0.2.0
    release: tcb
    heritage: Helm
    role: catalogs
data:
  tpch.properties: |
    connector.name=tpch
    tpch.splits-per-node=4
  tpcds.properties: |
    connector.name=tpcds
    tpcds.splits-per-node=4
EOF

Add the template for a single coordinator configuration.

kubectl apply -f - <<EOF
# Source: trino/templates/configmap-coordinator.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: tcb-trino-coordinator
  labels:
    app: trino
    chart: trino-0.2.0
    release: tcb
    heritage: Helm
    component: coordinator
data:
  node.properties: |
    node.environment=production
    node.data-dir=/data/trino
    plugin.dir=/usr/lib/trino/plugin

  jvm.config: |
    -server
    -Xmx8G
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+UseGCOverheadLimit
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:+ExitOnOutOfMemoryError
    -Djdk.attach.allowAttachSelf=true
    -XX:-UseBiasedLocking
    -XX:ReservedCodeCacheSize=512M
    -XX:PerMethodRecompilationCutoff=10000
    -XX:PerBytecodeRecompilationCutoff=10000
    -Djdk.nio.maxCachedBufferSize=2000000

  config.properties: |
    coordinator=true
    node-scheduler.include-coordinator=true
    http-server.http.port=8080
    query.max-memory=4GB
    query.max-memory-per-node=1GB
    query.max-total-memory-per-node=2GB
    memory.heap-headroom-per-node=1GB
    discovery-server.enabled=true
    discovery.uri=http://localhost:8080

  log.properties: |
    io.trino=INFO
EOF

Add the tcb-trino service definition to run Trino.

kubectl apply -f - <<EOF
# Source: trino/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: tcb-trino
  labels:
    app: trino
    chart: trino-0.2.0
    release: tcb
    heritage: Helm
spec:
  type: ClusterIP
  ports:
    - port: 8080
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: trino
    release: tcb
    component: coordinator
EOF

Add the deployment definition for the service.

kubectl apply -f - <<EOF
# Source: trino/templates/deployment-coordinator.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tcb-trino-coordinator
  labels:
    app: trino
    chart: trino-0.2.0
    release: tcb
    heritage: Helm
    component: coordinator
spec:
  selector:
    matchLabels:
      app: trino
      release: tcb
      component: coordinator
  template:
    metadata:
      labels:
        app: trino
        release: tcb
        component: coordinator
    spec:
      securityContext:
        runAsUser: 1000
        runAsGroup: 1000
      volumes:
        - name: config-volume
          configMap:
            name: tcb-trino-coordinator
        - name: catalog-volume
          configMap:
            name: tcb-trino-catalog
      imagePullSecrets:
        - name: registry-credentials
      containers:
        - name: trino-coordinator
          image: "trinodb/trino:latest"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /etc/trino
              name: config-volume
            - mountPath: /etc/trino/catalog
              name: catalog-volume
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /v1/info
              port: http
          readinessProbe:
            httpGet:
              path: /v1/info
              port: http
          resources:
            {}
EOF

Now check the state of the k8s cluster again.

kubectl get all

Run the following command to expose the url and port to the localhost system.

minikube service tcb-trino --url

Clean up all the resources.

kubectl delete pod --all
kubectl delete replicaset --all
kubectl delete service tcb-trino
kubectl delete deployment tcb-trino-coordinator
kubectl delete configmap --all

Now you can run the same demo using the helm chart which includes all of these templates out-of-the-box. First add the trino helm chart, check the templates that are produced by helm, and run the install.

# HELM DEMO

helm repo add trino https://trinodb.github.io/charts/

helm template tcb trino/trino --version 0.2.0

helm install tcb trino/trino --version 0.2.0

Now that it’s installed, run the same command to expose the url of the service.

minikube service tcb-trino --url

Clean up all the resources.

minikube delete
helm repo remove trino

Events, news, and various links

Trino Summit is moving to 100% virtual: register here.

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Trino on ice IV: Deep dive into Iceberg internals

2021-08-12T00:00:00+00:00

Welcome to the Trino on ice series, covering the details around how the Iceberg table format works with the Trino query engine. The examples build on each previous post, so it’s recommended to read the posts sequentially and reference them as needed later. Here are links to the posts in this series:

So far, this series has covered some very interesting user level concepts of the Iceberg model, and how you can take advantage of them using the Trino query engine. This blog post dives into some implementation details of Iceberg by dissecting some files that result from various operations carried out using Trino. To dissect you must use some surgical instrumentation, namely Trino, Avro tools, the MinIO client tool and Iceberg’s core library. It’s useful to dissect how these files work, not only to help understand how Iceberg works, but also to aid in troubleshooting issues, should you have any issues during ingestion or querying of your Iceberg table. I like to think of this type of debugging much like a fun game of operation, and you’re looking to see what causes the red errors to fly by on your screen.

Understanding Iceberg metadata

Iceberg can use any compatible metastore, but for Trino, it only supports the Hive metastore and AWS Glue similar to the Hive connector. This is because there is already a vast amount of testing and support for using the Hive metastore in Trino. Likewise, many Trino use cases that currently use data lakes already use the Hive connector and therefore the Hive metastore. This makes it convenient to have as the leading supported use case as existing users can easily migrate between Hive to Iceberg tables. Since there is no indication of which connector is actually executed in the diagram of the Hive connector architecture, it serves as a diagram that can be used for both Hive and Iceberg. The only difference is the connector used, but if you create a table in Hive, you can view the same table in Iceberg.

To recap the steps taken from the first three blogs; the first blog created an events table, while the first two blogs ran two insert statements. The first insert contained three records, while the second insert contained a single record.

Up until this point, the state of the files in MinIO haven’t really been shown except some of the manifest list pointers from the snapshot in the third blog post. Using the MinIO client tool, you can list files that Iceberg generated through all these operations and then try to understand what purpose they are serving.

% mc tree -f local/
local/
└─ iceberg
   └─ logging.db
      └─ events
         ├─ data
         │  ├─ event_time_day=2021-04-01
         │  │  ├─ 51eb1ea6-266b-490f-8bca-c63391f02d10.orc
         │  │  └─ cbcf052d-240d-4881-8a68-2bbc0f7e5233.orc
         │  └─ event_time_day=2021-04-02
         │     └─ b012ec20-bbdd-47f5-89d3-57b9e32ea9eb.orc
         └─ metadata
            ├─ 00000-c5cfaab4-f82f-4351-b2a5-bd0e241f84bc.metadata.json
            ├─ 00001-27c8c2d1-fdbb-429d-9263-3654d818250e.metadata.json
            ├─ 00002-33d69acc-94cb-44bc-b2a1-71120e749d9a.metadata.json
            ├─ 23cc980c-9570-42ed-85cf-8658fda2727d-m0.avro
            ├─ 92382234-a4a6-4a1b-bc9b-24839472c2f6-m0.avro
            ├─ snap-2720489016575682283-1-92382234-a4a6-4a1b-bc9b-24839472c2f6.avro
            ├─ snap-4564366177504223943-1-23cc980c-9570-42ed-85cf-8658fda2727d.avro
            └─ snap-6967685587675910019-1-bcbe9133-c51c-42a9-9c73-f5b745702cb0.avro

There are a lot of files here, but here are a couple of patterns that you can observe with these files.

First, the top two directories are named data and metadata.

/<bucket>/<database>/<table>/data//<bucket>/<database>/<table>/metadata/

As you might expect, data contains the actual ORC files split by partition. This is akin to what you would see in a Hive table data directory. What is really of interest here is the metadata directory. There are specifically three patterns of files you’ll find here.

/<bucket>/<database>/<table>/metadata/<file-id>.avro/<bucket>/<database>/<table>/metadata/snap-<snapshot-id>-<version>-<file-id>.avro

/<bucket>/<database>/<table>/metadata/<version>-<commit-UUID>.metadata.json

Iceberg has a persistent tree structure that manages various snapshots of the data that are created for every mutation of the data. This enables not only a concurrency model that supports serializable isolation, but also cool features like time travel across a linear progression of snapshots.

This tree structure contains two types of Avro files, manifest lists and manifest files. Manifest list files contain pointers to various manifest files and the manifest files themselves point to various data files. This post starts out by covering these manifest files, and later covers the table metadata files that are suffixed by .metadata.json.

The last blog covered the command in Trino that shows the snapshot information that is stored in the metastore. Here is that command and its output again for your review.

SELECT manifest_list 
FROM iceberg.logging."events$snapshots";

Result:

snapshots
s3a://iceberg/logging.db/events/metadata/snap-6967685587675910019-1-bcbe9133-c51c-42a9-9c73-f5b745702cb0.avro
s3a://iceberg/logging.db/events/metadata/snap-2720489016575682283-1-92382234-a4a6-4a1b-bc9b-24839472c2f6.avro
s3a://iceberg/logging.db/events/metadata/snap-4564366177504223943-1-23cc980c-9570-42ed-85cf-8658fda2727d.avro

You’ll notice that the manifest list returns the Avro files prefixed with snap- are returned. These files are directly correlated with the snapshot record stored in the metastore. According to the diagram above, snapshots are records in the metastore that contain the url of the manifest list in the Avro file. Avro files are binary files and not something you can just open up in a text editor to read. Using the avro-tools.jar tool distributed by the Apache Avro project, you can actually inspect the contents of this file to get a better understanding of how it is used by Iceberg.

The first snapshot is generated on the creation of the events table. Upon inspecting this file, you notice that the file is empty. The output is an empty line that the jq JSON command line utility removes on pretty printing the JSON that is returned, which is just a newline. This snapshot represents an empty state of the table upon creation. To investigate the snapshots you need to download the files to your local filesystem. Let’s move them to the home directory:

% java -jar  ~/Desktop/avro_files/avro-tools-1.10.0.jar tojson ~/snap-6967685587675910019-1-bcbe9133-c51c-42a9-9c73-f5b745702cb0.avro | jq .

Result (is empty):

The second snapshot is a little more interesting and actually shows us the contents of a manifest list.

% java -jar  ~/Desktop/avro_files/avro-tools-1.10.0.jar tojson ~/snap-2720489016575682283-1-92382234-a4a6-4a1b-bc9b-24839472c2f6.avro | jq .

Result:

{
   "manifest_path":"s3a://iceberg/logging.db/events/metadata/92382234-a4a6-4a1b-bc9b-24839472c2f6-m0.avro",
   "manifest_length":6114,
   "partition_spec_id":0,
   "added_snapshot_id":{
      "long":2720489016575682000
   },
   "added_data_files_count":{
      "int":2
   },
   "existing_data_files_count":{
      "int":0
   },
   "deleted_data_files_count":{
      "int":0
   },
   "partitions":{
      "array":[
         {
            "contains_null":false,
            "lower_bound":{
               "bytes":"\u001eI\u0000\u0000"
            },
            "upper_bound":{
               "bytes":"\u001fI\u0000\u0000"
            }
         }
      ]
   },
   "added_rows_count":{
      "long":3
   },
   "existing_rows_count":{
      "long":0
   },
   "deleted_rows_count":{
      "long":0
   }
}

To understand each of the values in each of these rows, you can refer to the Iceberg specification in the manifest list file section. Instead of covering these exhaustively, let’s focus on a few key fields. Below are the fields, and their definition according to the specification.

manifest_path - Location of the manifest file.
partition_spec_id - ID of a partition spec used to write the manifest; must be listed in table metadata partition-specs.
added_snapshot_id - ID of the snapshot where the manifest file was added.
partitions - A list of field summaries for each partition field in the spec. Each field in the list corresponds to a field in the manifest file’s partition spec.
added_rows_count - Number of rows in all files in the manifest that have status ADDED, when null this is assumed to be non-zero.

As mentioned above, manifest lists hold references to various manifest files. These manifest paths are the pointers in the persistent tree that tells any client using Iceberg where to find all of the manifest files associated with a particular snapshot. To traverse this tree, you can look over the different manifest paths to find all the manifest files associated with the particular snapshot you want to traverse. Partition spec ids are helpful to know the current partition specification which are stored in the table metadata in the metastore. This references where to find the spec in the metastore. Added snapshot ids tells you which snapshot is associated with the manifest list. Partitions hold some high level partition bound information to make for faster querying. If a query is looking for a particular value, it only traverses the manifest files where the query values fall within the range of the file values. Finally, you get a few metrics like the number of changed rows and data files, one of which is the count of added rows. The first operation consisted of three rows inserts and the second operation was the insertion of one row. Using the row counts you can easily determine which manifest file belongs to which operation.

The following command shows the final snapshot after both operations executed and filters out only the fields pointed out above.

% java -jar  ~/Desktop/avro_files/avro-tools-1.10.0.jar tojson ~/snap-4564366177504223943-1-23cc980c-9570-42ed-85cf-8658fda2727d.avro | jq '. | {manifest_path: .manifest_path, partition_spec_id: .partition_spec_id, added_snapshot_id: .added_snapshot_id, partitions: .partitions, added_rows_count: .added_rows_count }'

Result:

{
   "manifest_path":"s3a://iceberg/logging.db/events/metadata/23cc980c-9570-42ed-85cf-8658fda2727d-m0.avro",
   "partition_spec_id":0,
   "added_snapshot_id":{
      "long":4564366177504223700
   },
   "partitions":{
      "array":[
         {
            "contains_null":false,
            "lower_bound":{
               "bytes":"\u001eI\u0000\u0000"
            },
            "upper_bound":{
               "bytes":"\u001eI\u0000\u0000"
            }
         }
      ]
   },
   "added_rows_count":{
      "long":1
   }
}
{
   "manifest_path":"s3a://iceberg/logging.db/events/metadata/92382234-a4a6-4a1b-bc9b-24839472c2f6-m0.avro",
   "partition_spec_id":0,
   "added_snapshot_id":{
      "long":2720489016575682000
   },
   "partitions":{
      "array":[
         {
            "contains_null":false,
            "lower_bound":{
               "bytes":"\u001eI\u0000\u0000"
            },
            "upper_bound":{
               "bytes":"\u001fI\u0000\u0000"
            }
         }
      ]
   },
   "added_rows_count":{
      "long":3
   }
}

In the listing of the manifest file related to the last snapshot, you notice the first operation where three rows were inserted is contained in the manifest file in the second JSON object. You can determine this from the snapshot id, as well as, the number of rows that were added in the operation. The first JSON object contains the last operation that inserted a single row. So the most recent operations are listed in reverse commit order.

The next command does the same listing of the file that you ran with the manifest list, except you run this on the manifest files themselves to expose their contents and discuss them. To begin with, you run the command to show the contents of the manifest file associated with the insertion of three rows.

% java -jar  ~/avro-tools-1.10.0.jar tojson ~/Desktop/avro_files/92382234-a4a6-4a1b-bc9b-24839472c2f6-m0.avro | jq .

Result:

{
   "status":1,
   "snapshot_id":{
      "long":2720489016575682000
   },
   "data_file":{
      "file_path":"s3a://iceberg/logging.db/events/data/event_time_day=2021-04-01/51eb1ea6-266b-490f-8bca-c63391f02d10.orc",
      "file_format":"ORC",
      "partition":{
         "event_time_day":{
            "int":18718
         }
      },
      "record_count":1,
      "file_size_in_bytes":870,
      "block_size_in_bytes":67108864,
      "column_sizes":null,
      "value_counts":{
         "array":[
            {
               "key":1,
               "value":1
            },
            {
               "key":2,
               "value":1
            },
            {
               "key":3,
               "value":1
            },
            {
               "key":4,
               "value":1
            }
         ]
      },
      "null_value_counts":{
         "array":[
            {
               "key":1,
               "value":0
            },
            {
               "key":2,
               "value":0
            },
            {
               "key":3,
               "value":0
            },
            {
               "key":4,
               "value":0
            }
         ]
      },
      "nan_value_counts":null,
      "lower_bounds":{
         "array":[
            {
               "key":1,
               "value":"ERROR"
            },
            {
               "key":3,
               "value":"Oh noes"
            }
         ]
      },
      "upper_bounds":{
         "array":[
            {
               "key":1,
               "value":"ERROR"
            },
            {
               "key":3,
               "value":"Oh noes"
            }
         ]
      },
      "key_metadata":null,
      "split_offsets":null
   }
}
{
   "status":1,
   "snapshot_id":{
      "long":2720489016575682000
   },
   "data_file":{
      "file_path":"s3a://iceberg/logging.db/events/data/event_time_day=2021-04-02/b012ec20-bbdd-47f5-89d3-57b9e32ea9eb.orc",
      "file_format":"ORC",
      "partition":{
         "event_time_day":{
            "int":18719
         }
      },
      "record_count":2,
      "file_size_in_bytes":1084,
      "block_size_in_bytes":67108864,
      "column_sizes":null,
      "value_counts":{
         "array":[
            {
               "key":1,
               "value":2
            },
            {
               "key":2,
               "value":2
            },
            {
               "key":3,
               "value":2
            },
            {
               "key":4,
               "value":2
            }
         ]
      },
      "null_value_counts":{
         "array":[
            {
               "key":1,
               "value":0
            },
            {
               "key":2,
               "value":0
            },
            {
               "key":3,
               "value":0
            },
            {
               "key":4,
               "value":0
            }
         ]
      },
      "nan_value_counts":null,
      "lower_bounds":{
         "array":[
            {
               "key":1,
               "value":"ERROR"
            },
            {
               "key":3,
               "value":"Double oh noes"
            }
         ]
      },
      "upper_bounds":{
         "array":[
            {
               "key":1,
               "value":"WARN"
            },
            {
               "key":3,
               "value":"Maybeh oh noes?"
            }
         ]
      },
      "key_metadata":null,
      "split_offsets":null
   }
}

Now this is a very big output, but in summary, there’s really not too much to these files. As before, there is a Manifest section in the Iceberg spec that details what each of these fields means. Here are the important fields:

snapshot_id - Snapshot id where the file was added, or deleted if status is two. Inherited when null.
data_file - Field containing metadata about the data files pertaining to the manifest file, such as file path, partition tuple, metrics, etc…
data_file.file_path - Full URI for the file with FS scheme.
data_file.partition - Partition data tuple, schema based on the partition spec.
data_file.record_count - Number of records in the data file.
data_file.*_count - Multiple fields that contain a map from column id to number of values, null, nan counts in the file. These can be used to quickly filter out unnecessary get operations.
data_file.*_bounds - Multiple fields that contain a map from column id to lower or upper bound in the column serialized as binary. Each value must be less than or equal to all non-null, non-NaN values in the column for the file.

Each data file struct contains a partition and data file that it maps to. These files only be scanned and returned if the criteria for the query is met when checking all of the count, bounds, and other statistics that are recorded in the file. Ideally only files that contain data relevant to the query should be scanned at all. Having information like the record count may also help in the query planning process to determine splits and other information. This particular optimization hasn’t been completed yet as planning typically happens before traversal of the files. It is still in ongoing discussion and is discussed a bit by Iceberg creator Ryan Blue in a recent meetup. If this is something you are interested in, keep posted on the Slack channel and releases as the Trino Iceberg connector progresses in this area.

As mentioned above, the last set of files that you find in the metadata directory which are suffixed with .metadata.json. These files at baseline are a bit strange as they aren’t stored in the Avro format, but instead the JSON format. This is because they are not part of the persistent tree structure. These files are essentially a copy of the table metadata that is stored in the metastore. You can find the fields for the table metadata listed in the Iceberg specification. These tables are typically stored persistently in a metasture much like the Hive metastore but could easily be replaced by any datastore that can support an atomic swap (check-and-put) operation required for Iceberg to support the optimistic concurrency operation.

The naming of the table metadata includes a table version and UUID: <table-version>-<UUID>.metadata.json. To commit a new metadata version, which just adds 1 to the current version number, the writer performs these steps:

It creates a new table metadata file using the current metadata.
It writes the new table metadata to a file following the naming with the next version number.
It requests the metastore swap the table’s metadata pointer from the old location to the new location.
1. If the swap succeeds, the commit succeeded. The new file is now the current metadata.
2. If the swap fails, another writer has already created their own. The current writer goes back to step 1.

If you want to see where this is stored in the Hive metastore, you can reference the TABLE_PARAMS table. At the time of writing, this is the only method of using the metastore that is supported by the Trino Iceberg connector.

SELECT PARAM_KEY, PARAM_VALUEFROM metastore.TABLE_PARAMS;

Result:

PARAM_KEY	PARAM_VALUE
EXTERNAL	TRUE
metadata_location	s3a://iceberg/logging.db/events/metadata/00002-33d69acc-94cb-44bc-b2a1-71120e749d9a.metadata.json
numFiles	2
previous_metadata_location	s3a://iceberg/logging.db/events/metadata/00001-27c8c2d1-fdbb-429d-9263-3654d818250e.metadata.json
table_type	iceberg
totalSize	5323
transient_lastDdlTime	1622865672

So as you can see, the metastore is saying the current metadata location is the 00002-33d69acc-94cb-44bc-b2a1-71120e749d9a.metadata.json file. Now you can dive in to see the table metadata that is being used by the Iceberg connector.

% cat ~/Desktop/avro_files/00002-33d69acc-94cb-44bc-b2a1-71120e749d9a.metadata.json

Result:

{
   "format-version":1,
   "table-uuid":"32e3c271-84a9-4be5-9342-2148c878227a",
   "location":"s3a://iceberg/logging.db/events",
   "last-updated-ms":1622865686323,
   "last-column-id":5,
   "schema":{
      "type":"struct",
      "fields":[
         {
            "id":1,
            "name":"level",
            "required":false,
            "type":"string"
         },
         {
            "id":2,
            "name":"event_time",
            "required":false,
            "type":"timestamp"
         },
         {
            "id":3,
            "name":"message",
            "required":false,
            "type":"string"
         },
         {
            "id":4,
            "name":"call_stack",
            "required":false,
            "type":{
               "type":"list",
               "element-id":5,
               "element":"string",
               "element-required":false
            }
         }
      ]
   },
   "partition-spec":[
      {
         "name":"event_time_day",
         "transform":"day",
         "source-id":2,
         "field-id":1000
      }
   ],
   "default-spec-id":0,
   "partition-specs":[
      {
         "spec-id":0,
         "fields":[
            {
               "name":"event_time_day",
               "transform":"day",
               "source-id":2,
               "field-id":1000
            }
         ]
      }
   ],
   "default-sort-order-id":0,
   "sort-orders":[
      {
         "order-id":0,
         "fields":[
            
         ]
      }
   ],
   "properties":{
      "write.format.default":"ORC"
   },
   "current-snapshot-id":4564366177504223943,
   "snapshots":[
      {
         "snapshot-id":6967685587675910019,
         "timestamp-ms":1622865672882,
         "summary":{
            "operation":"append",
            "changed-partition-count":"0",
            "total-records":"0",
            "total-data-files":"0",
            "total-delete-files":"0",
            "total-position-deletes":"0",
            "total-equality-deletes":"0"
         },
         "manifest-list":"s3a://iceberg/logging.db/events/metadata/snap-6967685587675910019-1-bcbe9133-c51c-42a9-9c73-f5b745702cb0.avro"
      },
      {
         "snapshot-id":2720489016575682283,
         "parent-snapshot-id":6967685587675910019,
         "timestamp-ms":1622865680419,
         "summary":{
            "operation":"append",
            "added-data-files":"2",
            "added-records":"3",
            "added-files-size":"1954",
            "changed-partition-count":"2",
            "total-records":"3",
            "total-data-files":"2",
            "total-delete-files":"0",
            "total-position-deletes":"0",
            "total-equality-deletes":"0"
         },
         "manifest-list":"s3a://iceberg/logging.db/events/metadata/snap-2720489016575682283-1-92382234-a4a6-4a1b-bc9b-24839472c2f6.avro"
      },
      {
         "snapshot-id":4564366177504223943,
         "parent-snapshot-id":2720489016575682283,
         "timestamp-ms":1622865686278,
         "summary":{
            "operation":"append",
            "added-data-files":"1",
            "added-records":"1",
            "added-files-size":"746",
            "changed-partition-count":"1",
            "total-records":"4",
            "total-data-files":"3",
            "total-delete-files":"0",
            "total-position-deletes":"0",
            "total-equality-deletes":"0"
         },
         "manifest-list":"s3a://iceberg/logging.db/events/metadata/snap-4564366177504223943-1-23cc980c-9570-42ed-85cf-8658fda2727d.avro"
      }
   ],
   "snapshot-log":[
      {
         "timestamp-ms":1622865672882,
         "snapshot-id":6967685587675910019
      },
      {
         "timestamp-ms":1622865680419,
         "snapshot-id":2720489016575682283
      },
      {
         "timestamp-ms":1622865686278,
         "snapshot-id":4564366177504223943
      }
   ],
   "metadata-log":[
      {
         "timestamp-ms":1622865672894,
         "metadata-file":"s3a://iceberg/logging.db/events/metadata/00000-c5cfaab4-f82f-4351-b2a5-bd0e241f84bc.metadata.json"
      },
      {
         "timestamp-ms":1622865680524,
         "metadata-file":"s3a://iceberg/logging.db/events/metadata/00001-27c8c2d1-fdbb-429d-9263-3654d818250e.metadata.json"
      }
   ]
}

As you can see, these JSON files can quickly grow as you perform different updates on your table. This file contains a pointer to all of the snapshots and manifest list files, much like the output you found from looking at the snapshots in the table. A really important piece to note is the schema is stored here. This is what Trino uses for validation on inserts and reads. As you may expect, there is the root location of the table itself, as well as a unique table identifier. The final part I’d like to note about this file is the partition-spec and partition-specs fields. The partition-spec field holds the current partition spec, while the partition-specs is an array that can hold a list of all partition specs that have existed for this table. As pointed out earlier, you can have many different manifest files that use different partition specs. That wraps up all of the metadata file types you can expect to see in Iceberg!

This post wraps up the Trino on ice series. Hopefully these blog posts serve as a helpful initial dialogue about what is expected to grow as a vital portion of an open data lakehouse stack. What are you waiting for? Come join the fun and help us implement some of the missing features or instead go ahead and try Trino on Ice(berg) yourself!

23: Trino looking for patterns

2021-08-02T00:00:00+00:00

Guests

Kasia Findeisen, Software Engineer at Starburst (@kasiafi).

Release 360

In our last episode we already had a bit of a glimpse. Now the release is really out.

Official announcement items from Martin:

Automatic configuration of TLS for internal communication.
Improved correlated subqueries with GROUP BY or LIMIT.
Support for assuming an IAM role in Elasticsearch connector.
Support for Trino views in Iceberg connector.

Manfred’s additional notes:

Documentation for materialized views SQL commands
Partial support for DELETE and batch insert support for various JDBC-based connectors
A bunch of performance and correctness fixes
Numerous improvements on Iceberg connector

More info at https://trino.io/docs/current/release/release-360.html.

Concept of the week: Row pattern matching and MATCH_RECOGNIZE

The MATCH_RECOGNIZE syntax was introduced in the latest SQL specification of 2016. It is a super powerful tool for analyzing trends in your data. We are proud to announce that Trino supports this great feature since version 356. With MATCH_RECOGNIZE, you can define a pattern using the well-known regular expression syntax, and match it to a set of rows. Upon finding a matching row sequence, you can retrieve all kinds of detailed or summary information about the match, and pass it on to be processed by the subsequent parts of your query. This is a new level of what a pure SQL statement can do.

For more details, this blog post gives you a taste of row pattern matching capabilities, and a quick overview of the MATCH_RECOGNIZE syntax.

Let’s look at an example with data similar to the TPCH data. Here is an example, and the same goal: detect a “V”-shape of the price values over time for different customers.

trino> WITH orders(customer_id, order_date, price) AS (VALUES
    ('cust_1', DATE '2020-05-11', 100),
    ('cust_1', DATE '2020-05-12', 200),
    ('cust_2', DATE '2020-05-13',   8),
    ('cust_1', DATE '2020-05-14', 100),
    ('cust_2', DATE '2020-05-15',   4),
    ('cust_1', DATE '2020-05-16',  50),
    ('cust_1', DATE '2020-05-17', 100),
    ('cust_2', DATE '2020-05-18',   6))
SELECT customer_id, start_price, bottom_price, final_price, start_date, final_date
    FROM orders
        MATCH_RECOGNIZE (
            PARTITION BY customer_id
            ORDER BY order_date
            MEASURES
                START.price AS start_price,
                LAST(DOWN.price) AS bottom_price,
                LAST(UP.price) AS final_price,
                START.order_date AS start_date,
                LAST(UP.order_date) AS final_date
            ONE ROW PER MATCH
            AFTER MATCH SKIP PAST LAST ROW
            PATTERN (START DOWN+ UP+)
            DEFINE
                DOWN AS price < PREV(price),
                UP AS price > PREV(price)
            );

 customer_id | start_price | bottom_price | final_price | start_date | final_date
-------------+-------------+--------------+-------------+------------+------------
 cust_1      |         200 |           50 |         100 | 2020-05-12 | 2020-05-17
 cust_2      |           8 |            4 |           6 | 2020-05-13 | 2020-05-18
(2 rows)

Two matches are detected, one for cust_1, and one for cust_2.

The matching algorithm was a collaboration between Martin and Kasia. This algorithm lives in the Matcher class.

The running semantics is the default both in the DEFINE and MESAURES clauses. Note that FINAL only applies to the MEASURES clause.

To sum up, here’s one complex measure expression combining different elements of the special syntax:

PR of the week: PR 8348 Document row pattern recognition in window

The PR of the week, is adding documentation for applying pattern matching over windows. This is yet another SQL functionality that Kasia added after getting the patter recognition to work with MATCH_RECOGNIZE.

Demo: Showing MATCH_RECOGNIZE functionality by example

Here are a few examples that Kasia will be running:

Demo preview:

The initial query. That’s mostly the same query that’s in the blog post, the differences being:

Usage of a real table instead of a CTE.
Additional sort key for consistent ordering
Two more measures

 SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN+ UP+)
                       DEFINE
                           DOWN AS totalprice < PREV(totalprice),
                           UP AS totalprice > PREV(totalprice)
                       )

The query returns many results (many matches). Wrap it in a count() aggregation to check how many there are:

 SELECT count() FROM (SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN+ UP+)
                       DEFINE
                           DOWN AS totalprice < PREV(totalprice),
                           UP AS totalprice > PREV(totalprice)
                       ))

Modify the PATTERN to limit the results. Now searching for a “big V”:

 SELECT count() FROM (SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice < PREV(totalprice),
                           UP AS totalprice > PREV(totalprice)
                       ))

Unwrap from count() aggregation to see the actual matches:

 SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice < PREV(totalprice),
                           UP AS totalprice > PREV(totalprice)
                       )

Change AFTER MATCH SKIP PAST LAST ROW to AFTER MATCH SKIP TO NEXT ROW to detect overlapping matches:

 SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ONE ROW PER MATCH
                       AFTER MATCH SKIP TO NEXT ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice < PREV(totalprice),
                           UP AS totalprice > PREV(totalprice)
                       )

Change ONE ROW PER MATCH to ALL ROWS PER MATCH (also, revert the previous change). Discuss the classy column and explain the running semantics on the example of final_date column:

 SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ALL ROWS PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice < PREV(totalprice),
                           UP AS totalprice > PREV(totalprice)
                       )

Change the semantics of the final_date column to FINAL:

 SELECT custkey, match_no, start_price, bottom_price, final_price, start_date, final_date, classy
               FROM orders
                   MATCH_RECOGNIZE (
                       PARTITION BY custkey
                       ORDER BY orderdate, orderkey
                       MEASURES
                           START.totalprice AS start_price,
                           LAST(DOWN.totalprice) AS bottom_price,
                           LAST(UP.totalprice) AS final_price,
                           START.orderdate AS start_date,
                           FINAL LAST(UP.orderdate) AS final_date,
                           MATCH_NUMBER() AS match_no,
                           CLASSIFIER() AS classy
                       ALL ROWS PER MATCH
                       AFTER MATCH SKIP PAST LAST ROW
                       PATTERN (START DOWN{3,} UP{4,})
                       DEFINE
                           DOWN AS totalprice < PREV(totalprice),
                           UP AS totalprice > PREV(totalprice)
                       )

Question of the week: How do you tag a list of rows with custom periodic rules?

A StackOverflow user asked how to tag orders in a table that meet a certain criterion that relies on periodicity. There are certainly some complicated and inefficient SQL queries that you could craft to address these issues. However, now with MATCH_RECOGNIZE it is possible to do this and take advantage of the efficient matching capabilities that Martin and Kasia have added.

Here is an example orders table represented as a csv table:

Create_time, Order_id, person_id, variable_a
'2021-06-01', 1234, 2232, 1
'2021-06-02', 1235, 2232, 0.6
'2021-06-03', 1236, 2232, 0.33
'2021-06-04', 1237, 2232, 0.7
'2021-06-05', 1238, 2232, 0.6
'2021-06-06', 1239, 2232, 0.4
'2021-06-07', 1240, 2232, 0.8
'2021-06-08', 1241, 2232, 0.7
'2021-06-09', 1242, 2232, 0.4
'2021-06-10', 1243, 2232, 0.6
'2021-06-11', 1244, 2232, 0.7
'2021-06-12', 1245, 2232, 0.6

The grace period logic will produce the final_hit column as the result of this logic:

The is_hit column equals to 1 if the variable A less than equal to 0.5
There is a grace period totaling 4 Orders after the hit, so any hit that is within the grace period will be ignored. The resulting row can be called final_hit.

Based on this logic, this is the desired result of the example is:

Create_time, Order_id, person_id, variable_a, is_hit, final_hit
'2021-06-01', 1234, 2232, 1, NULL, NULL
'2021-06-02', 1235, 2232, 0.6, NULL, NULL
'2021-06-03', 1236, 2232, 0.33, true, true
'2021-06-04', 1237, 2232, 0.7, NULL, NULL
'2021-06-05', 1238, 2232, 0.6, NULL, NULL
'2021-06-06', 1239, 2232, 0.4, true, NULL
'2021-06-07', 1240, 2232, 0.8, NULL, NULL
'2021-06-08', 1241, 2232, 0.7, NULL, NULL
'2021-06-09', 1242, 2232, 0.4, true, true
'2021-06-10', 1243, 2232, 0.6, NULL, NULL
'2021-06-11', 1244, 2232, 0.7, NULL, NULL
'2021-06-12', 1245, 2232, 0.6, NULL, NULL

To accomplish this with MATCH_RECOGNIZE, you can do the following statement, which gives us the correct answer:

WITH data(Create_time, Order_id, person_id, variable_a) AS (
    VALUES
      (DATE '2021-06-01', 1234, 2232, 1),
      (DATE '2021-06-02', 1235, 2232, 0.6),
      (DATE '2021-06-03', 1236, 2232, 0.33),
      (DATE '2021-06-04', 1237, 2232, 0.7),
      (DATE '2021-06-05', 1238, 2232, 0.6),
      (DATE '2021-06-06', 1239, 2232, 0.4),
      (DATE '2021-06-07', 1240, 2232, 0.8),
      (DATE '2021-06-08', 1241, 2232, 0.7),
      (DATE '2021-06-09', 1242, 2232, 0.4),
      (DATE '2021-06-10', 1243, 2232, 0.6),
      (DATE '2021-06-11', 1244, 2232, 0.7),
      (DATE '2021-06-12', 1245, 2232, 0.6)
)
SELECT Create_time, Order_id, person_id, variable_a, if(variable_a <= 0.5, true, null) is_hit, final_hit
FROM data
   MATCH_RECOGNIZE (
     PARTITION BY person_id
     ORDER BY Create_time
     MEASURES if(classifier() = 'HIT', true, null) AS final_hit
     ALL ROWS PER MATCH WITH UNMATCHED ROWS
     AFTER MATCH SKIP PAST LAST ROW
     PATTERN (HIT G{,4})
     DEFINE /* G -- grace period */
            HIT AS HIT.variable_a <= 0.5
  )

Check out Martin and Kasia’s full answer to this question.

Events, news, and various links

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Trino on ice III: Iceberg concurrency model, snapshots, and the Iceberg spec

2021-07-30T00:00:00+00:00

In the last two blog posts, we’ve covered a lot of cool feature improvements of Iceberg over the Hive model. I recommend you take a look at those if you haven’t yet. We introduced concepts and issues that table formats address. This blog closes up the overview of Iceberg features by discussing the concurrency model Iceberg uses to ensure data integrity, how to use snapshots via Trino, and the Iceberg Specification.

Concurrency Model

Some issues with the Hive model are the distinct locations where the metadata is stored and where the data files are stored. Having your data and metadata split up like this is a recipe for disaster when trying to apply updates to both services atomically.

A very common problem with Hive is that if a writing process failed during insertion, many times you would find the data written to file storage, but the metastore writes failed to occur. Or conversely, the metastore writes were successful, but the data failed to finish writing to file storage due to a network or file IO failure. There’s a good Trino Community Broadcast episode that talks about a function in Trino that exists to resolve these issues by syncing the metastore and file storage. You can watch a simulation of this error on that episode.

Aside from having issues due to the split state in the system, there are many other issues that stem from the file system itself. In the case of HDFS, depending on the specific filesystem implementation you are using, you may have different atomicity guarantees for various file systems and their operations, such as creating, deleting, and renaming files and directories. HDFS isn’t the only troublemaker here. Other than Amazon S3’s recent announcement of strong consistency in their S3 service, most object storage systems only offer eventual consistency that may not show the latest files immediately after writes. Despite storage systems showing more progress towards offering better performance and guarantees, these systems still offer no reliable locking mechanism.

Iceberg addresses all of these issues in a multitude of ways. One of the primary ways Iceberg introduces transactional guarantees is by storing the metadata in the same datastore as the data itself. This simplifies handling commit failures down to rolling back on one system rather than trying to coordinate a rollback across two systems like in Hive. Writers independently write their metadata and attempt to perform their operations, needing no coordination with other writers. The only time the writers coordinate is when they attempt to perform a commit of their operations. In order to do a commit, they perform a lock of the current snapshot record in a database. This concurrency model where writers eagerly do the work upfront is called optimistic concurrency control.

Currently, in Trino, this method still uses the Hive metastore to perform the lock-and-swap operation necessary to coordinate the final commits. Iceberg creator, Ryan Blue, covers this lock-and-swap mechanism and how the metastore can be replaced with alternate locking methods. In the event that two writers attempt to commit at the same time, the writer that first acquires the lock successfully commits by swapping its snapshot as the current snapshot, while the second writer will retry to apply its changes again. The second writer should have no problem with this, assuming there are no conflicting changes between the two snapshots.

This works similarly to a git workflow where the main branch is the locked resource, and two developers try to commit their changes at the same time. The first developer’s changes may conflict with the second developer’s changes. The second developer is then forced to rebase or merge the first developer’s code with their changes before commiting to the main branch again. The same logic applies to merging data files. Currently, Iceberg clients use a copy-on-write mechanism that makes a new file out of the merged data in the next snapshot. This enables accurate time traveling and preserves previous split versions of the files. At the time of writing, upserts via MERGE INTO syntax are not supported in Trino, but this is in active development. UPDATE: Since the original writing of this post, the MERGE syntax exists as of version 393.

One of the great benefits of tracking each individual change that gets written to Iceberg is that you are given a view of the data at every point in time. This enables a really cool feature that I mentioned earlier called time travel.

Snapshots and Time Travel

To showcase snapshots, it’s best to go over a few examples drawing from the event table we created in the previous blog posts. This time we’ll only be working with the Iceberg table, as this capability is not available in Hive. Snapshots allow you to have an immutable set of your data at a given time. They are automatically created on every append or removal of data. One thing to note is that for now, they do not store the state of your metadata.

Say that you have created your events table and inserted the three initial rows as we did previously. Let’s look at the data we get back and see how to check the existing snapshots in Trino:

SELECT level, message
FROM iceberg.logging.events;

Result:

level	message
ERROR	Double oh noes
WARN	Maybeh oh noes?
ERROR	Oh noes

To query the snapshots, all you need is to use the $ operator appended to the end of the table name, and add the hidden table, snapshots:

SELECT snapshot_id, parent_id, operation
FROM iceberg.logging.“events$snapshots”;

Result:

snapshot_id	parent_id	operation
7620328658793169607		append
2115743741823353537	7620328658793169607	append

Let’s take a look at the manifest list files that are associated with each snapshot ID. You can tell which file belongs to which snapshot based on the snapshot ID embedded in the filename:

SELECT manifest_list
FROM iceberg.logging.“events$snapshots”;

Result:

shapshots
s3a://iceberg/logging.db/events/metadata/snap-7620328658793169607-1-cc857d89-1c07-4087-bdbc-2144a814dae2.avro
s3a://iceberg/logging.db/events/metadata/snap-2115743741823353537-1-4cb458be-7152-4e99-8db7-b2dda52c556c.avro

Now, let’s insert another row to the table:

INSERT INTO iceberg.logging.events
VALUES
(
‘INFO’,
timestamp ‘2021-04-02 00:00:11.1122222’,
‘It is all good’,
ARRAY [‘Just updating you!’]
);

Let’s check the snapshot table again:

SELECT snapshot_id, parent_id, operation
FROM iceberg.logging.“events$snapshots”;

Result:

snapshot_id	parent_id	operation
7620328658793169607		append
2115743741823353537	7620328658793169607	append
7030511368881343137	2115743741823353537	append

Let’s also verify that our row was added:

SELECT level, message
FROM iceberg.logging.events;

Result:

level	message
ERROR	Oh noes
INFO	It is all good
ERROR	Double oh noes
WARN	Maybeh oh noes?

Since Iceberg is already tracking the list of files added and removed at each snapshot, it would make sense that you can travel back and forth between these different views into the system, right? This concept is called time traveling. You need to specify which snapshot you would like to read from and you will see the view of the data at that timestamp. In Trino, you need to use the @ operator, followed by the snapshot you wish to read from:

SELECT level, message
FROM iceberg.logging.“events@2115743741823353537”;

Result:

level	message
ERROR	Double oh noes
WARN	Maybeh oh noes?
ERROR	Oh noes

If you determine there is some issue with your data, you can always roll back to the previous state permanently as well. In Trino we have a function called rollback_to_snapshot to move the table state to another snapshot:

CALL system.rollback_to_snapshot(‘logging’, ‘events’, 2115743741823353537);

Now that we have rolled back, observe what happens when we query the events table with:

SELECT level, message
FROM iceberg.logging.events;

Result:

level	message
ERROR	Double oh noes
WARN	Maybeh oh noes?
ERROR	Oh noes

Notice the INFO row is still missing even though we query the table without specifying a snapshot id. Now just because we rolled back, doesn’t mean we’ve lost the snapshot we just rolled back from. In fact, we can roll forward, or as I like to call it, back to the future! In Trino, you use the same function call but with a predecessor of the existing snapshot:

CALL system.rollback_to_snapshot(‘logging’, ‘events’, 7030511368881343137)

And now we should be able to query the table again and see the INFO row return:

SELECT level, message
FROM iceberg.logging.events;

Result:

level	message
ERROR	Oh noes
INFO	It is all good
ERROR	Double oh noes
WARN	Maybeh oh noes?

As expected, the INFO row returns when you roll back to the future.

Having snapshots not only provides you with a level of immutability that is key to the eventual consistency model, but gives you a rich set of features to version and move between different versions of your data like a git repository.

Iceberg Specification

Perhaps saving the best for last, the benefit of using Iceberg is the community that surrounds it, and the support you receive. It can be daunting to have to choose a project that replaces something so core to your architecture. While Hive has so many drawbacks, one of the things keeping many companies locked in is the fear of the unknown. How do you know which table format to choose? Are there unknown data corruption issues that I’m about to take on? What if this doesn’t scale like it promises on the label? It is worth noting that alternative table formats are also emerging in this space and we encourage you to investigate these for your own use cases. When sitting down with Iceberg creator, Ryan Blue, comparing Iceberg to other table formats, he claims the community’s greatest strength is their ability to look forward. They intentionally broke compatibility with Hive to enable them to provide a richer level of features. Unlike Hive, the Iceberg project explained their thinking in a spec.

The strongest argument I can see for Iceberg is that it has a specification. This is something that has largely been missing from Hive and shows a real maturity in how the Iceberg community has approached the issue. On the Trino project, we think standards are important. We adhere to many of them ourselves, such as the ANSI SQL syntax, and exposing the client through a JDBC connection. By creating a standard around this, you’re no longer tied to any particular technology, not even Iceberg itself. You are adhering to a standard that will hopefully become the de facto standard over a decade or two, much like Hive did. Having the standard in clear writing invites multiple communities to the table and brings even more use cases. Doing so improves the standards and therefore the technologies that implement them.

The previous three blog posts of this series covered the features and massive benefits from using this novel table format. The following post will dive deeper and discuss more about how Iceberg achieves some of this functionality, with an overview into some of the internals and metadata layouts. In the meantime, feel free to try Trino on Ice(berg).

22: TrinkedIn: LinkedIn gets a Trino promotion

2021-07-22T00:00:00+00:00

Commander Bun Bun, landing the job!

Guests

Akshay Rai, Staff Software Engineer at LinkedIn (@akshayrai09)
Jithesh Rajan, Staff Software Engineer at LinkedIn (@jithesh-tr-a3185b20)
Laura Chen, Staff Software Engineer at LinkedIn (@laura-yu-chen-3a75413)
Pratham Desai, Software Engineer at LinkedIn (@pratham-desai)
Raju Nalli, Staff Site Reliability Engineer at LinkedIn (@rajunalli)

Upcoming release and Trino Summit

Sneak peek items for 360

Automatic cluster internal TLS
Views support in Iceberg connector
Documentation for materialized views SQL commands
DELETE and batch insert support for various JDBC-based connectors

Trino Summit 2021

Get excited for this year’s Trino Summit hosted by Starburst. Registration and call for papers is now open!

LinkedIn is hiring!

Concept of the week: Trino at LinkedIn

The LinkedIn team covers the concept of the week in this section.

PR of the week: Digging into join queries

Today our PR of the week is from the future 🔮! LinkedIn is currently investigating the issue. This gives us a chance to talk about the research aspects that go into a PR.

With a view V that performs a UNION ALL from an old table O and a new migrated table N. For datepartition values older than D (say 2021-06-05), table O will be referred for data, while for date equal to or greater than D, data from N will be used.

The query in question is:

SELECT * FROM V
WHERE x IN (SELECT x2 FROM Z)
AND cast(substring(datepartition,1,10) as date) >= date('2021-06-08')

Here, table Z has stats available and only have 17 rows in them. While the data from view V (which is entirely from underlying table N for this query) has say billions of rows.

This query used to take about 39 seconds to run before our upgrade (PrestoSQL-333). After the upgrade (Trino-352) it increased to approximately thirty-five minutes.

Question of the week: How can I query the Hive views from Trino?

We actually covered the answer in episode 18.

You can use the Coral project that allows for translation between different SQL syntax. For example, it processes Hive QL statements and convert them to an internal representation using Apache Calcite. It then converts the internal representation to Trino SQL. See the docs for more details.

This diagram shows the creation of a Hive view, then shows the sequence of events when Trino reads that view.

Events, news, and various links

Blogs:

News

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Trino on ice II: In-place table evolution and cloud compatibility with Iceberg

2021-07-12T00:00:00+00:00

The first post covered how Iceberg is a table format and not a file format It demonstrated the benefits of hidden partitioning in Iceberg in contrast to exposed partitioning in Hive. There really is no such thing as “exposed partitioning.” I just thought that sounded better than not-hidden partitioning. If any of that wasn’t clear, I recommend either that you stop reading now, or go back to the first post before starting this one. This post discusses evolution. No, the post isn’t covering Darwinian nor Pokémon evolution, but in-place table evolution!

You may find it a little odd that I am getting excited over tables evolving in-place, but as mentioned in the last post, if you have experience performing table evolution in Hive, you’d be as happy as Ash Ketchum when Charmander evolved into Charmeleon discovering that Iceberg supports Partition evolution and schema evolution. That is, until Charmeleon started treating Ash like a jerk after the evolution from Charmander. Hopefully, you won’t face the same issue when your tables evolve.

Another important aspect that is covered, is how Iceberg is developed with cloud storage in mind. Hive and other data lake technologies were developed with file systems as their primary storage layer. This is still a very common layer today, but as more companies move to include object storage, table formats did not adapt to the needs of object stores. Let’s dive in!

Partition Specification evolution

In Iceberg, you are able to update the partition specification, shortened to partition spec in Iceberg, on a live table. You do not need to perform a table migration as you do in Hive. In Hive, partition specs don’t explicitly exist because they are tightly coupled with the creation of the Hive table. Meaning, if you ever need to change the granularity of your data partitions at any point, you need to create an entirely new table, and move all the data to the new partition granularity you desire. No pressure on choosing the right granularity or anything!

In Iceberg, you’re not required to choose the perfect partition specification upfront, and you can have multiple partition specs in the same table, and query across the different sized partition specs. How great is that! This means, if you’re initially partitioning your data by month, and later you decide to move to a daily partitioning spec due to a growing ingest from all your new customers, you can do so with no migration, and query over the table with no issue.

This is conveyed pretty succinctly in this graphic from the Iceberg documentation. At the end of the year 2008, partitioning occurs at a monthly granularity and after 2009, it moves to a daily granularity. When the query to pull data from December 14th, 2008 and January 13th, 2009, the entire month of December gets scanned due to the monthly partition, but for the dates in January, only the first 13 days are scanned to answer the query.

At the time of writing, Trino is able to perform reads from tables that have multiple partition spec changes but partition evolution write support does not yet exist. There are efforts to add this support in the near future.

Schema evolution

Iceberg also handles schema evolution much more elegantly than Hive. In Hive, adding columns worked well enough, as data inserted before the schema change just reports null for that column. For formats that use column names, like ORC and Parquet, deletes are also straightforward for Hive, as it simply ignores fields that are no longer part of the table. For unstructured files like CSV that use the position of the column, deletes would still cause issues, as deleting one column shifts the rest of the columns. Renames for schemas pose an issue for all formats in Hive as data written prior to the rename is not modified to the new field. This effectively works the same as if you deleted the old field and added a new column with the new name. This lack of support for schema evolution across various file types in Hive requires a lot of memorizing the formats underneath various tables. This is very susceptible to causing user errors if someone executes one of the unsupported operations on the wrong table.

Hive 2.2.0 schema evolution based on file type and operation.
	Add	Delete	Rename
CSV/TSV	✅	❌	❌
JSON	✅	✅	❌
ORC/Parquet/Avro	✅	✅	❌

Currently in Iceberg, schemaless position-based data formats such as CSV and TSV are not supported, though there are some discussions on adding limited support for them. This would be good from a reading standpoint, to load data from the CSV, into an Iceberg format with all the guarantees that Iceberg offers.

While JSON doesn’t rely on positional data, it does have an explicit dependency on names. This means, that if I remove a text column from a JSON table named severity, then later I want to add a new int column called severity, I encounter an error when I try to read in the data with the string type from before when I try to deserialize the JSON files. Even worse would be if the new severity column you add has the same type as the original but a semantically different meaning. This results in old rows containing values that are unknowingly from a different domain, which can lead to wrong analytics. After all, someone who adds the new severity column might not even be aware of the old severity column, if it was quite some time ago when it was dropped.

ORC, Parquet, and Avro do not suffer from these issues as they are columnar formats that keep a schema internal to the file itself, and each format tracks changes to the columns through IDs rather than name values or position. Iceberg uses these unique column IDs to also keep track of the columns as changes are applied.

In general, Iceberg can only allow this small set of file formats due to the correctness guarantees it provides. In Trino, you can add, delete, or rename columns using the ALTER TABLE command. Here’s an example that continues from the table created in the last post that inserted three rows. The DDL statement looked like this.

CREATE TABLE iceberg.logging.events (
  level VARCHAR,
  event_time TIMESTAMP(6), 
  message VARCHAR,
  call_stack ARRAY(VARCHAR)
) WITH (
  format = 'ORC',
  partitioning = ARRAY['day(event_time)']
);

Here is an ALTER TABLE sequence that adds a new column named severity, inserts data including into the new column, renames the column, and prints the data.

ALTER TABLE iceberg.logging.events ADD COLUMN severity INTEGER; 

INSERT INTO iceberg.logging.events VALUES 
(
  'INFO', 
  timestamp 
  '2021-04-01 19:59:59.999999' AT TIME ZONE 'America/Los_Angeles', 
  'es muy bueno', 
  ARRAY ['It is all normal'], 
  1
);

ALTER TABLE iceberg.logging.events RENAME COLUMN severity TO priority;

SELECT level, message, priority
FROM iceberg.logging.events;

Result:

level	message	priority
ERROR	Double oh noes	NULL
WARN	Maybeh oh noes?	NULL
ERROR	Oh noes	NULL
INFO	es muy bueno	1

ALTER TABLE iceberg.logging.events 
DROP COLUMN priority;

SHOW CREATE TABLE iceberg.logging.events;

Result

CREATE TABLE iceberg.logging.events (
   level varchar,
   event_time timestamp(6),
   message varchar,
   call_stack array(varchar)
)
WITH (
   format = 'ORC',
   partitioning = ARRAY['day(event_time)']
)

Notice how the priority and severity columns are both not present in the schema. As noted in the table above, Hive renames cause issues for all file formats. Yet in Iceberg, performing all these operations causes no issues with the table and underlying data.

Cloud storage compatibility

Not all developers consider or are aware of the performance implications of using Hive over a cloud object storage solution like S3 or Azure Blob storage. One thing to remember is that Hive was developed with the Hadoop Distributed File System (HDFS) in mind. HDFS is a filesystem and is particularly well suited to handle listing files on the filesystem, because they were stored in a contiguous manner. When Hive stores data associated with a table, it assumes there is a contiguous layout underneath it and performs list operations that are expensive on cloud storage systems.

The common cloud storage systems are typically object stores that do not lay out the files in a contiguous manner based on paths. Therefore, it becomes very expensive to list out all the files in a particular path. Yet, these list operations are executed for every partition that could be included in a query, regardless of only a single row, in a single file out of thousands of files needing to be retrieved to answer the query. Even ignoring the performance costs for a minute, object stores may also pose issues for Hive due to eventual consistency. Inserting and deleting can cause inconsistent results for readers, if the files you end up reading are out of date.

Iceberg avoids all of these issues by tracking the data at the file level, rather than the partition level. By tracking the files, Iceberg only accesses the files containing data relevant to the query, as opposed to accessing files in the same partition looking for the few files that are relevant to the query. Further, this allows Iceberg to control for the inconsistency issue in cloud-based file systems by using a locking mechanism at the file level. See the file layout below that Hive layout versus the Iceberg layout. As you can see in the next image, Iceberg makes no assumptions about the data being contiguous or not. It simply builds a persistent tree using the snapshot (S) location stored in the metadata, that points to the manifest list (ML), which points to manifests containing partitions (P). Finally, these manifest files contain the file (F) locations and stats that can quickly be used to prune data versus needing to do a list operation and scanning all the files.

Referencing the picture above, if you were to run a query where the result set only contains rows from file F1, Hive would require a list operation and scanning the files, F2 and F3. In Iceberg, file metadata exists in the manifest file, P1, that would have a range on the predicate field that prunes out files F2 and F3, and only scans file F1. This example only shows a couple of files, but imagine storage that scales up to thousands of files! Listing becomes expensive on files that are not contiguously stored in memory. Having this flexibility in the logical layout is essential to increase query performance. This is especially true on cloud object stores.

If you want to play around with Iceberg using Trino, check out the Trino Iceberg docs. To avoid issues like the eventual consistency issue, as well as other problems of trying to sync operations across systems, Iceberg provides optimistic concurrency support, which is covered in more detail in the next post.

21: Trino + dbt = a match made in SQL heaven?

2021-07-08T00:00:00+00:00

Guests

Amy Chen, Partner Solutions Architect at dbt Labs (formerly Fishtown Analytics) (@yuanamychen)
Victor Coustenoble, Solutions Architect at Starburst (@victorcouste)

Release 359

Martin:

Row pattern recognition for window functions
Support for SET TIME ZONE
Support for timestamp(n) with precision higher than 3 in MySQL
ARM64-compatible docker image
Support for granting UPDATE privilege

Manfred:

SET TIME ZONE is a feature from our guest Marius from last time!
ARM64 compatible docker image as well as already existing tar.gz and rpm means usage of Graviton and other ARM64 processors is now available also for Kubernetes users, there are significant cost/performance benefits, try it out
wow .. this time it took a whole month from 358 to 359
breaking change - need Java 11.0.11
more materialized view stuff, and I am working on docs!
Fix handling of multiple LDAP user bind patterns - for those of us in larger orgs..
network logging in CLI
rename connector.name from hive-hadoop2 to hive

More info at https://trino.io/docs/current/release/release-359.html.

Question of the week: Can dbt connect to different databases in the same project?

This week we are going a little out of order from our usual sequence on this show. The question really gets to the heart of the concept of the week. We’ll cover this first then jump into the concept.

This question was asked on StackOverflow:

It seems dbt only works for a single database. If my data is in a different database, will that still work? For example, if my datalake is using delta, but I want to run dbt using Redshift, would dbt still work for this case?

Our guest Victor replied:

You can use Trino with dbt to connect to multiple databases in the same project.

The GitHub example project https://github.com/victorcouste/trino-dbt-demo contains a fully working setup, that you can replicate and adapt to your needs.

Concept of the week:

What is dbt?

dbt is a transformation workflow tool that lets teams quickly and collaboratively deploy analytics code, following software engineering best practices like modularity, CI/CD, testing, and documentation. It enables anyone who knows SQL to build production-grade data pipelines.

When referring to dbt, it can mean two slightly different things. dbt core is the open source framework that provides the SQL compiler and framework to manage your SQL workflow. You can interact with it via a command line interface. In addition, dbtlabs offers the fully managed SaaS product dbt Cloud. You can use it to handle all of your dbt projects from development to deployment in a single browser based tool. It provides useful features like a full IDE to develop and test code, orchestration, logging, and alerting. At the moment, dbt Cloud is not available for Trino users.

The framework allows you to check the quality of results, document the lineage, manage the changes/versions in the SQL scripts and orchestrate the queries, like a CI/CD framework but for your data. dbt is not an extract and load tool. The focus is on transforming what is already in your data warehouse/data lake.

Check out these links to learn more:

Goals of dbt and how that differs from Trino

Trino is the execution SQL engine and dbt is the framework to manage your SQL statements. dbt won’t execute the SQL itself, rather it pushes all of the compute down to the SQL engine. This SQL engine can be Trino, or an engine included in the data source like the database itself. Using Trino as the SQL execution engine allows you to use the same SQL dialect for all connected data sources. This includes data sources that natively do not support SQL like object storage systems, Kafka, Elasticsearch, and many others.

Transformation vs ad-hoc joins

Transformations done by dbt are in general used to clean and prepare data for analytics purposes. It’s often used to go from the raw data to a ready-to-use data for reporting and analysis. dbt creates database objects like tables or views to be consumed by business users and analytics tools.

On the other hand, even if Trino can also execute SQL to create tables and views, these SQL queries are not managed but just executed. Trino doesn’t have, like dbt, all the framework to version, audit, document and orchestrate SQL script and execution. Trino is more used to execute SQL SELECT statements generated by users or BI tools to analyze data in an interactive way.

Cases for why you need both

Trino and dbt are complementary when you need to access different sources from a single SQL query or when you need to run SQL query with good performance on object storage systems like S3, GCS, ADLS, or HDFS.

It’s where Trino can complement dbt, as dbt can only access a single data warehouse connection in a SQL query. In dbt there is no way to query multiple storage systems at the same time.

Trino is recognized for great performance with object storage/data lake processing. With dbt it can transform and prepare data at scale. Trino also allows you to run dbt on a traditional, on-premise data warehouse where normally dbt only runs on a modern cloud data warehouse like Snowflake, BigQuery, or Redshift.

dbt basics

dbtlabs offers a good tutorial which covers the fundamental topics of dbt for you to learn:

Project: A directory of SQL and YAML files defined with a single project file.
Models: A model is a single SQL file where you define your transformations to create a table or a view.
Profile: To define connections to your data sources.

Then you have other resources like seeds, macros, tests, sources, snapshots.

Demo: Querying Trino from a dbt project

Victor shows us a demo from his blog post that inspired this episode.

If you looked at the code, you may have noticed that the code used an adapter called db-presto-trino. This adapter derives from the outdated presto naming and is still there for interaction with legacy Presto clusters. Although it can work it uses an outdated python client to interact with Trino and there is an open issue to create an official dbt-trino adapter that uses the updated trino-python-client.

If you want to help with this, reach out on the issue itself and join the #db-presto-trino channel on the dbt Slack. https://community.getdbt.com/

After the show Marius Grama, started work on dbt-trino in his own repository. Thanks for the quick turnaround Marius!

PR of the week: PR 8283 Externalised destination table cache expiry duration for BigQuery Connector

The PR of the week, was committed by Ayush Bilala(Twitter), (LinkedIn), a Staff Software Engineer at Walmart Global Tech.

This fixes issue 8263 by adding a new configuration for the Big Query connector, bigquery.views-cache-ttl to allow configuring the cache expiration for BigQuery views.

Thanks Ayush!

Events, news, and various links

News

The “frog” book has been translated to Chinese! Keep your eyes peeled for the rebrand into Trino for the translation.

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

Latest training from David, Dain, and Martin(Now with timestamps!):

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

20: Trino for the Trinewbie

2021-06-23T00:00:00+00:00

Guests

Marius Grama, Data Engineer at willhaben internet service GmbH & Co KG (@findinpath)

Concept of the week: Trino for the Trinewbie

One of the best and easiest ways to get an understanding about Trino, and how to use it is the book Trino: Definitive Guide. The next three sections have a few excerpts from the book that does an incredible job at introducing the space Trino is in. If you would like to read the book in its entirety, Starburst offers the digital copy for free.

The Problems with Big Data

Everybody is capturing more and more data from device metrics, user behavior tracking, business transactions, location data, software and system testing procedures and workflows, and much more. The insights gained from understanding that data and working with it can make or break the success of any initiative, or even a company.

At the same time, the diversity of storage mechanisms available for data has exploded: relational databases, NoSQL databases, document databases, key-value stores, object storage systems, and so on. Many of them are necessary in today’s organizations, and it is no longer possible to use just one of them.

What is Trino?

Trino is not a database with storage, rather, it simply queries data where it lives. When using Trino, storage and compute are decoupled and can be scaled independently. Trino represents the compute layer, whereas the underlying data sources represent the storage layer.

This allows Trino to scale up and down its compute resources for query processing, based on analytics demand to access this data. There is no need to move your data, and provision compute and storage to the exact needs of the current queries, or change that regularly, based on your changing query needs.

Trino can scale the query power by scaling the compute cluster dynamically, and the data can be queried right where it lives in the data source. This characteristic allows you to greatly optimize your hardware resource needs and therefore reduce cost.

SQL-on-Anything

Trino was initially designed to query data from HDFS. And it can do that very efficiently, as you learn later. But that is not where it ends. On the contrary, Trino is a query engine that can query data from object storage, relational database management systems (RDBMSs), NoSQL databases, and other systems.

Trino queries data where it lives and does not require a migration of data to a single location. So Trino allows you to query data in HDFS and other distributed object storage systems. It allows you to query RDBMSs and other data sources. As such, it can really query data wherever it lives and therefore be a replacement to the traditional, expensive, and heavy extract, transform, and load (ETL) processes. Or at a minimum, it can help you with them and lighten the load. So Trino is clearly not just another SQL-on-Hadoop solution.

Object storage systems include Amazon Web Services (AWS) Simple Storage Service (S3), Microsoft Azure Blob Storage, Google Cloud Storage, and S3-compatible storage such as MinIO and Ceph. Trino can query traditional RDBMSs such as Microsoft SQL Server, PostgreSQL, MySQL, Oracle, Teradata, and Amazon Redshift. Trino can also query NoSQL systems such as Apache Cassandra, Apache Kafka, MongoDB, or Elasticsearch. Trino can query virtually anything and is truly a SQL-on-Anything system.

For users, this means that suddenly they no longer have to rely on specific query languages or tools to interact with the data in those specific systems. They can simply leverage Trino and their existing SQL skills and their well-understood analytics, dashboarding, and reporting tools. These tools, built on top of using SQL, allow analysis of those additional data sets, which are otherwise locked in separate systems. Users can even use Trino to query across different systems with the SQL they know.

Contributing to Trino

In this episode, Marius Grama discusses his journey with Trino. From joining the community, his first impressions and experiences, and what led him to make sixteen commits over the last three months. We also ask him where he thinks we could improve to make the onboarding experience better.

In the Trino project there are four roles. You can immediately become a participant or reviewer. To be a contributor, you need to follow some steps that are covered later in the episode. Likewise, for maintainers, there is a path to becoming a maintainer that is discussed in detail on the roles page.

Participants

Participants are those who show up and join in discussions about the project. Users, developers, and administrators can all be participants, as can literally anyone who has the time, energy, and passion to become involved. Participants suggest improvements and new features. They report bugs, regressions, performance issues, and so on. They work to make Trino better for everyone.

Contributors

Today’s episode covers the process that a contributor goes through to make a code change, but simply put:

A contributor submits code changes to Trino.

Reviewers

A reviewer reads a proposed change to Trino, and assesses how well the change aligns with the Trino vision and guidelines. This includes everything from high level project vision to low level code style. Everyone is invited and encouraged to review others’ contributions – you don’t need to be a maintainer for that.

Maintainers

A maintainer is responsible for checking in code only after ensuring it has been reviewed thoroughly and aligns with the Trino vision and guidelines. In addition to merging code, a maintainer actively participates in discussions and reviews. Being a maintainer does not grant additional rights in the project to make changes, set direction, or anything else that does not align with the direction of the project. Instead, a maintainer is expected to bring these to the project participants as needed to gain consensus. The maintainer role is for an individual, so if a maintainer changes employers, the role is retained. However, if a maintainer is no longer actively involved in the project, their maintainer status will be reviewed.

There is a writeup on the Apache Hive process to become a committer. For context, a committer is equivalent to a maintainer in Trino. This writeup aligns precisely with the Trino philosophy. Here are a few good quotes from that article:

Contributors often ask Hive PMC members the question, “What do I need to do in order to become a committer?” The simple (though frustrating) answer to this question is, “If you want to become a committer, behave like a committer.” If you follow this advice, then rest assured that the PMC will notice, and committership will seek you out rather than the other way around.

It should go without saying, but here it is anyway: your participation in the project should be a natural part of your work with Hive; if you find yourself undertaking tasks “so that you can become a committer”, then you’re doing it wrong, young padawan. This is particularly true if your motivations for wanting to become a committer are primarily negative or self-centered

PR of the week: PR 8135 Set default time zone for the current session

The PR of the week, was committed by today’s guest, Marius Grama.

This fixes issue 8112 by adding support for the SET TIME ZONE statement. The time zone specified is being stored as a session property and has a lower precedence than sql.forced-session-time-zone setting.

Thanks Marius!

Demo: Contributing to Trino

Here is the video that goes into detail on the steps below on how to contribute code to Trino!

Download an IDE.

First, you need to have an integrated development environment (IDE) to run the code. We recommend Intellij Community Edition as it is the standard that is used by developers across the project. Of course, you may use any IDE you like, but there may be issues that others may not be able to help with as readily.
Install Git.

Git is a distributed version source control software used to collaborate code with other users. You must install git in order to contribute to the project.
Install Docker.

The Trino testing framework runs Trino and other databases it connects to on Docker, a tool that runs different services in isolation using containers.
Go ahead and install Docker on your system.
Create and configure your GitHub account.

GitHub is a free hosted Git repository, and a central point of collaboration for the Trino project. If you haven’t done so, please create and configure your GitHub account.
Make a fork of the Trino repository on GitHub

Navigate to the Trino repository and click the “fork” button. Or you can just click it here: Fork.

You want to create a fork so that you can save your work without needing the special privileges it takes to commit code back to the Trino repository. This way, you can upload (also called a “push” in Git) your code to your fork and later open a pull request into the main Trino repository.
Clone your fork of the Trino repository to your computer and import into Intellij.

Execute the following clone command in your terminal:
```
 git clone git@github.com:<your_username>/trino.git
```
Open the Trino project in Intellij.
Add the Airlift code style checks to Intellij.

There are many unspoken rules to code style and formatting in any project. Trino is no exception. To make life simpler on the contributor and reviewer, the Trino code style definition that you can import into Intellij to have the Reformat Code action to format in the desired style of the project.
Build the project.

One of the greatest resources in trino history is this cheat sheet created by Piotr Findeisen. I use it for some of the commands, but the most important use, is the “fast” build command he adds on the top. In your terminal, make sure you are located in the root directory of the Trino project, and run the following command.
```
./mvnw -pl '!:trino-server-rpm,!:trino-docs,!:trino-proxy,!:trino-verifier,!:trino-benchto-benchmarks' clean install \
-TC2 -nsu \
-DskipTests \
-Dmaven.javadoc.skip=true \
-Dmaven.source.skip=true \
-Dair.check.skip-all=true
```
This builds all necessary modules of the project to run almost everything in Trino. The build excludes some modules, runs the compiler on multiple threads, skips the tests, javadocs, and the Airlift code style checks. If you would like to run code style check on a specific module (e.g. trino-elasticsearch) then you can run the following command.
```
./mvnw -pl ':trino-elasticsearch' clean install \
-TC2 -nsu \
-DskipTests \
-Dmaven.javadoc.skip=true \
-Dmaven.source.skip=true 
```
Sign the CLA.

Sign the contributor license agreement (CLA) to agree that all of your code you commit to the project is subject to the Apache License 2.0. Once you sign the agreement, scan and submit the form to cla@trino.io. This email gets checked every few days, and you can check if your name has been added to the contributors list.
At this point you can look for an issue labeled “good first issue” This identifies issues that we think are more approachable for developers that aren’t as familiar with the Trino repository yet.
One final thing before you move on to the contribution process. Before you start jumping in and changing the code, you’ll also want to create a special branch for your changes. A branch in git makes a separate workflow for all the changes you make to be isolated, If something goes wrong, or you need to compare with an older branch you can do so. The default branch may either be named master or main. See more on branching in git.

To make a branch for your feature, you can run the following command:

git checkout -b my-feature-branch

Follow the remaining steps in the contribution process page.

Question of the week: How do I remove nulls from an array in Trino?

A question posted to StackOverflow asked the following question:

I’m extracting data from a json column in Trino and getting the output in an array like this ['AL', NULL, 'NEW']. The problem is I need to remove the null since the array has to be mapped another array.I tried several options but no luck. How can I remove the null and get only ['AL', 'NEW'] without unnesting?

Piotr Findeisen replied:

You can use filter() for this:

trino> SELECT filter(ARRAY['AL', NULL,'NEW'], e -> e IS NOT NULL);
   _col0
-----------
 [AL, NEW]
(1 row)

Events, news, and various links

News

The “frog” book has been translated to Chinese! Keep your eyes peeled for the rebrand into Trino for the translation.

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

Latest training from David, Dain, and Martin(Now with timestamps!):

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

19: Data Ingestion to Iceberg and Trino

2021-06-10T00:00:00+00:00

Guests

Cory Darby, Principal Software Developer at BlueCat (@ckdarby)

Release 358

Martin:

SHOW STATS support for arbitrary queries.
Performance improvements for ORDER BY ... LIMIT queries on sorted data.
Support for Hive views containing LATERAL VIEW.

Manfred:

Reduced graceful shutdown time
A bunch of performance and correctness fixes
Removed support for legacy JDBC string in driver jdbc:presto:

More info at https://trino.io/docs/current/release/release-358.html.

Release 357

Martin:

Support for subquery expressions that produce multiple columns.
Support for CURRENT_CATALOG and CURRENT_SCHEMA.
Aggregation pushdown for ClickHouse connector.
Rule support for identifier mapping in various connectors.
New format_number function.
Cast row types as JSON objects.

Manfred:

Print dynamic filters summary in EXPLAIN ANALYZE
Fix trusted cert usage for OAuth
clear command in CLI
Numerous smaller connector changes - check your favourite connector

More at https://trino.io/docs/current/release/release-357.html

Concept of the week: Ingesting into Iceberg with Pulsar and Flink at BlueCat

Here are Cory’s slides that you can use to follow along while listening to the podcast.

PR of the week: PR 1905 Add format_number function

The PR of the week, is a simple but always useful PR done by maintainer Yuya Ebihara. This fixes issue 1878 that makes a nice format for very large numbers that get returned from the query to be truncated with a value suffix like (B - billion, M - million, K - thousand, etc…). Rather than reuse the CLI’s FormatUtils class, which missed various cases, he created his own implementation that solves for those issues. Thanks Yuya!

Demo: Showing the format_number functionality

Here are the examples we ran in the show.

SELECT format_number(DOUBLE '1234.5');

SELECT format_number(DOUBLE '-9223372036854775808');

SELECT format_number(DOUBLE '9223372036854775807');

SELECT format_number(REAL '-999');

SELECT format_number(REAL '999');

SELECT format_number(DECIMAL '-1000');

SELECT format_number(DECIMAL '1000');

SELECT format_number(999999999);

SELECT format_number(1000000000);

Question of the week: How do I search nested objects in Elasticsearch from Trino?

A question posted to StackOverflow asked how to search nested objects using the Elasticsearch connector.

Trino maps a nested object type to a ROW the same way that it maps a standard object type during a read. The nested designation itself serves no purpose to Trino since it only determines how the object is stored in Elasticsearch.

Check out Brian’s full answer to this question.

Events, news, and various links

News

The “frog” book has been translated to Chinese! Keep your eyes peeled for the rebrand into Trino for the translation.

Blogs

Videos

Trino Meetup: Apache Iceberg: A table format for data lakes with unforeseen use cases

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

Latest training from David, Dain, and Martin(Now with timestamps!):

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

18: Trino enjoying the view

2021-05-20T00:00:00+00:00

Commander Bun Bun enjoying the views...

Guests

Anjali Norwood, Senior Open Source Software Engineer at Netflix (@AnjaliNorwood)

Concept of the week: Trino Views, Hive Views and Materialized Views

Before diving into views, it can be helpful to take a step back to consider a well understood abstraction, like tables, to understand the purpose of a view. Tables contain data in a vertical orientation, referred to as columns. Databases represent instances of the data in a horizontal orientation, referred to as rows. See the following tables, customer and orders tables from the TPCH dataset.

customer table

custkey	name	nationkey	acctbal	mktsegment
376	Customer#000000376	16	4231.45	AUTOMOBILE
377	Customer#000000377	23	1043.72	MACHINERY
378	Customer#000000378	22	5718.05	BUILDING

orders table

orderkey	custkey	orderstatus	totalprice	orderdate	orderpriority
1	376	O	172799.49	1996-01-02	5-LOW
2	376	O	38426.09	1996-12-01	1-URGENT
3	377	F	205654.3	1993-10-14	5-LOW

The columns have a schema that enforce particular data types in particular columns and prevents insertion of invalid data into the table by throwing an exception. This becomes extremely useful when reading and processing the data as there are a clear set of operations that can run on certain columns ased on their type. This information is also useful when deserializing result sets into various in-memory abstractions. Here is an example of the customer table schema:

customer table schema

CREATE TABLE customer (
   custkey bigint,
   name varchar(25),
   address varchar(40),
   nationkey bigint,
   phone varchar(15),
   acctbal double,
   mktsegment varchar(10),
   comment varchar(117)
)

Views and materialized views:

The structure of a view is similar to tables in that they have columns, rows, and schemas similar to regular database tables. What then do views offer over tables? Views offer ways to encapsulate complex SQL statements. For example, take this SQL query that would run over the customer and orders tables defined before.

SELECT 
 c.custkey, 
 name, 
 nationkey, 
 mktsegment, 
 sumtotalprice, 
 openstatuscount, 
 failedstatuscount, 
 partialstatuscount
FROM 
 customer c 
 JOIN (
  SELECT 
   custkey, 
   SUM(totalprice) AS sumtotalprice, 
   COUNT_IF(orderstatus = 'O') AS openstatuscount,
   COUNT_IF(orderstatus = 'F') AS failedstatuscount, 
   COUNT_IF(orderstatus = 'P') AS partialstatuscount
  FROM orders
  GROUP BY custkey
 ) o
 ON c.custkey = o.custkey;

This query performs some aggregations on the orders table grouped by customer. Then there is a join performed on the aggregated orders table and customer table by custkey.

custkey	name	nationkey	mktsegment	sumtotalprice	openstatuscount	failedstatuscount	partialstatuscount
376	Customer#000000376	16	AUTOMOBILE	1600696.4700000002	3	6	1
377	Customer#000000377	23	MACHINERY	803271.9400000001	3	6	0
379	Customer#000000379	7	AUTOMOBILE	3155009.54	7	11	0

From here, there are many ways you could further evaluate the resulting data. You could filter and look at which market segment is spending the most on your products. You could also look at where there are the most failed orders by the nation column to evaluate where shipping lines may need to be improved. The table above which results from the example query, is a good intermediate state of the data that can be reused for many future evaluations. Instead of defining a new table, you can create a view on this data that encapsulates the complex SQL that was used to calculate it. This is done using the CREATE VIEW statement.

CREATE VIEW customer_orders_view AS 
<complex SQL query above>

Now, when you want to run any further analysis on this intermediate dataset, you simply refer to the view instead of having to rewrite the statement before. As mentioned, this view also has a schema and is treated much like a table when the query engine does its planning. In this way it is also easier to map the data to the application logic by enabling different shapes of the same data. It should be made clear that these views are read-only and do not allow inserts, updates, or deleting from the view.

Another reason why you would want to create a view is to control read access to the data. When running the query, you get to choose which columns and rows get filtered out and that return from when users query the view. The authorization of a user is tied to the view and its content, and that can significantly differ from the complete data in the underlying tables. For example, the views can exclude sensitive data like social security numbers, birth dates, credit card numbers, and many other facts.

When creating a view, there are two modes that the view can run in that will indicate the user that will run the queries defined in the view during query runtime. You can either run this query as the DEFINER which indicates to run the view query as the user that created the view, or as the INVOKER, which indicates to run the view query as the user that is running the outer query of the view. The default mode is DEFINER. See more in the security section of the create view documentation.

There are two types of views; materialized and logical views. The view defined above is the standard logical view that gets expanded into its definition. Logical views do not provide any performance benefit since the data is not stored and instead queried at query time. Materialized views persist the view data upon view creation by storing the query data.

Materialized views make overall queries much faster to run as part of the query has already been computed. One issue with materialized views is that the data may become outdated and out of sync with the underlying table data. To keep the data between the tables and materialized view in sync, you have to refresh the view. A special refresh command REFRESH MATERIALIZED VIEW is called periodically to handle this operation, or to schedule the procedure run automatically.

Trino views: So many views, so little time

Views handling in Trino depends on the connector. In general, most connectors expose views to Trino as if they are another set of tables available for Trino to query. The main exceptions for this is the Hive and Iceberg connectors. The table below lists the current possible Hive and Iceberg views.

		Logical	Materialized
Trino Created View	Hive Connector	✅	❌
Trino Created View	Iceberg Connector	✅ (Edit: PR 8540)	✅
Hive Created View		✅ (read-only)	✅ (read-only)

You’ll notice that the materialized views cannot be created through the Hive connector in Trino. You will get the following exception:

Caused by: java.sql.SQLException: Query failed (#...): 
This connector does not support creating materialized views.

Also, you cannot create logical views in Iceberg and you will get the following exception:

Caused by: java.sql.SQLException: Query failed (#...): 
This connector does not support creating views.

Trino reads Hive views

Before Trino there was Hive. Trino is a replacement for the Hive runtime for many users, and it is very useful for these users to also be able to read data from Hive views in Trino. Trino always aims to be compatible with as many Hive abstractions as possible to make migrating away from Hive to Trino as painless as possible. So Trino supports reading data from Hive Views, though it doesn’t support updates on these views. You have to update these views through Hive and ideally you will gradually migrate these views to Trino over time. Trino also supports reading Hive materialized views, though Trino reads these views as another Hive table rather since they are stored similarly to standard Hive tables. Since Hive views are defined in HiveQL, the view definitions need to be translated to Trino SQL syntax. This is done using LinkedIn’s Coral library.

Coral: the unifier of the bee and the bunny

Coral is a project that allows for translation between views from different SQL syntax. It can process Hive QL statements and convert them to an internal representation using Apache Calcite. It then converts the internal representation to Trino SQL.

Trino reading Hive view sequence diagrams

In both of these sequence diagrams, notice that the first actions are to create a Hive view. This is created and maintained by the Hive system and it is impossible to create or update a similar view in Trino.

This diagram shows the creation of a Hive view, then shows the sequence of events when Trino reads that view.

This diagram shows the creation of a Hive materialized view, then shows the sequence of events when Trino reads the materialized view.

Trino native view sequence diagrams

This diagram shows the sequence diagram for a Trino view that is created using the Hive Connector.

This diagram shows the sequence diagram for a materialized Trino view that is created using the Iceberg Connector.

Iceberg materialized view refresh (currently only full refresh in Iceberg connector)

Ideally, as the tables underlying a materialized view change, the materialized view should be automatically and incrementally updated to reflect the results that are in sync with latest data.

Automatically keeping materialized views fresh can be tricky from resource management point of view since the computation to materialize the materialized view can be expensive. Trino currently does not support automatic refresh of materialized views. It instead supports the REFRESH MATERIALIZED VIEW command that the user can issue to ensure that the materialized view is fresh.

As a part of executing REFRESH MATERIALIZED VIEW command in Trino, existing data in the materialized view is dropped and new data is inserted if there are any changes to base data. If the base data has not changed at all, the REFRESH MATERIALIZED VIEW command is a no-op.

What happens if the user issues a query against the materialized view, and the materialized view is not fresh? Trino detects that the materialized view is stale, so it expands the materialized view definition, much like a logical view and executes that SQL statement. Trino runs the query against the base tables.

Incremental or delta refresh of materialized views is a more efficient way of keeping the materialized view in sync with the base data. An incremental refresh means only parts of the data that need to be updated in a materialized view are updated The rest of the data is left untouched. For example, say you have a base table, sales, partitioned on date column. The sales table only gets inserted data for that day. If the materialized view is also partitioned on date, a new partition for a day can be added and data inserted for that day. Data for previous days/months is still fresh and can be left untouched. This is something on Netflix’s roadmap. The incremental refresh of the materialized view can be a partition level refresh, another can be a more granular row-level refresh by using functionality similar to SQL MERGE statement.

Support in Trino and at Netflix:

Netflix materialized views

The main reason Netflix is interested in materialized views is to give analysts an easy way to compute and materialize their frequently used queries and keep the results refreshed without relying on ETL pipeline to create and maintain those result sets. Some materialized views are as simple as queries that project columns and apply filters, selecting data for a time range or for a test-id. Others are more complex that perform multi-level joins and aggregations.

Netflix materialized view cross compatibility extension

Materialized views, much like logical views, are compatible across Trino and Spark, the two main engines used at Netflix. Spark is used at Netflix to do ETL, and creating and populating tables. Trino is the most popular engine with analysts and developers for adhoc and experimental queries as well as audits.

Trino is also used for CREATE TABLE AS SELECT (CTAS) in some use cases. Both the engines access data from tables using Iceberg and Hive connectors where data is stored in S3. Netflix built upon the Trino logical views to create common views that are accessible from both Spark and Trino. The difference between the Trino logical views and Netflix common views is that the metadata is stored in the Hive metastore for Trino logical views, while common views store their metadata in JSON format in S3.

A view object in Hive metastore points to the S3 location of metadata. It tracks evolution of view definition in the form of versions so that you can potentially revert a view to its older version. Main benefit of common views is interoperability between Spark and Trino (can create, replace, query, drop from either engine and can be expanded to other engines). Netflix supports common views through both Hive and Iceberg connectors.

Currently, common views support SQL syntax common to both Spark and Trino. This support can be expanded in future using LinkedIn’s Coral project such that engine specific syntax and semantics can be translated and interpreted by another engine. Netflix materialized views are an extension of Trino materialized views to make them inter-operable between Spark and Trino. The only difference between Trino and Netflix materialized views is where the metadata is stored, very similar to Trino and Netflix logical views.

Roadmap:

Netflix is looking into caching query results using materialized views and memory connector.
Incremental refresh ideas.

PR of the week: PR 4832 Add Iceberg support for materialized views

Our guest, Anjali, is the author of this weeks PR of the week, which adds Iceberg support for materialized views. Thanks Anjali!

Honorable PR mentions:

In order for the PR of the week to work, Anjali added syntax support for Trino materialized views with commands: CREATE MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW, and DROP MATERIALIZED VIEW.

Before any of this was done, user laurachenyu integrated Coral with trino to enable querying hive views.

Demo: Showing the different views in Trino

In Trino, create some Hive tables in a hive catalog named hdfs that represents the underlying storage Trino writes to.

CREATE SCHEMA hdfs.tiny
WITH (location = '/tiny/');

CREATE TABLE hdfs.tiny.customer
WITH (
  format = 'ORC',
  external_location = '/tiny/customer/'
) 
AS SELECT * FROM tpch.tiny.customer;

CREATE TABLE hdfs.tiny.orders
WITH (
  format = 'ORC',
  external_location = '/tiny/orders/'
) 
AS SELECT * FROM tpch.tiny.orders;

Now, create a logical Hive view (hive_view), and a materialized Hive view (hive_materialized_view) from the Hive CLI.

USE tiny;

CREATE VIEW hive_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM customer c JOIN orders o ON c.custkey = o.custkey;

CREATE MATERIALIZED VIEW hive_materialized_view AS
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM customer c JOIN orders o ON c.custkey = o.custkey;

As you create the views, you should check the state in the hive metastore.

SELECT t.TBL_NAME, t.TBL_TYPE, t.VIEW_EXPANDED_TEXT, t.VIEW_ORIGINAL_TEXT 
FROM DBS d
 JOIN TBLS t ON d.DB_ID = t.DB_ID
WHERE d.NAME = 'tiny';

Once the Hive views exist, you can then query them from Trino.

CREATE VIEW hdfs.tiny.trino_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM hdfs.tiny.customer c JOIN hdfs.tiny.orders o ON c.custkey = o.custkey;

/* Fails: Caused by: java.sql.SQLException: Query failed (#20210516_032433_00002_6syuw): 
This connector does not support creating materialized views */
CREATE MATERIALIZED VIEW hdfs.tiny.trino_materialized_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM hdfs.tiny.customer c JOIN hdfs.tiny.orders o ON c.custkey = o.custkey;

/* Fails: Caused by: java.sql.SQLException: Query failed (#20210516_101856_00009_ihjur): 
This connector does not support creating views */
CREATE VIEW iceberg.tiny.iceberg_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM hdfs.tiny.customer c JOIN hdfs.tiny.orders o ON c.custkey = o.custkey;

CREATE MATERIALIZED VIEW iceberg.tiny.iceberg_materialized_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM hdfs.tiny.customer c JOIN hdfs.tiny.orders o ON c.custkey = o.custkey;

/* 
This REFRESH call failed during the show due to the fact that I created the 
materialized Trino view in the Iceberg (`iceberg`) catalog using tables from the
Hive(`hdfs`) catalog. I should have created the materialized view using the
iceberg catalog:

CREATE MATERIALIZED VIEW iceberg.tiny.iceberg_materialized_view AS 
SELECT c.custkey, c.name, nationkey, mktsegment, orderstatus, totalprice, orderpriority, orderdate 
FROM iceberg.tiny.customer c JOIN iceberg.tiny.orders o ON c.custkey = o.custkey;
*/
REFRESH MATERIALIZED VIEW iceberg.tiny.iceberg_materialized_view;

/* query tables */

SELECT * FROM hdfs.tiny.customer LIMIT 3;

SELECT * FROM hdfs.tiny.orders LIMIT 3;

/* query views */

SELECT * FROM hdfs.tiny.trino_view LIMIT 3;

SELECT * FROM hdfs.tiny.hive_view LIMIT 3;

SELECT * FROM hdfs.tiny.hive_materialized_view LIMIT 3;

SELECT * FROM iceberg.tiny.iceberg_materialized_view LIMIT 3;

Question of the week: Are JDBC drivers backwards compatible with older Trino versions?

Full question: Are JDBC drivers backwards compatible with older Trino versions? I’m trying to install the 354 driver on a multi-tenanted Tableau server where there might be older Trino versions in play. Do I need to upgrade my Trino clients right away when upgrading my server to Trino version from <=350 to >350?

For this particular users case, the answer is that they won’t need to upgrade their clients assuming they are on Trino servers. If their server versions are PrestoSQL version <= 350 then they will need to hold off on upgrading to a Trino client.

Trino’s JDBC drivers typically maintain compatibility with older server versions (and vice versa). However, the project was renamed from PrestoSQL to Trino starting version 351, and as a consequence, JDBC drivers with version >= 351 are not compatible with servers with version <= 350. More details at: https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html.

In short, you can have a PrestoSQL client with a Trino server, but you can’t have a Trino client with an PrestoSQL server.

Events, news, and various links

Events

Join for an awesome event on May 26th as Iceberg Creator, Ryan Blue, dives into some interesting and less conventional use cases of Apache Iceberg. Trino Americas meetup

Blogs

https://engineering.linkedin.com/blog/2020/coral

Videos

https://www.arcadiadata.com/lp/tech-talk-on-join-optimization/

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

Latest training from David, Dain, and Martin(Now with timestamps!):

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Row pattern recognition with MATCH_RECOGNIZE

2021-05-19T00:00:00+00:00

This blog post gives you a taste of row pattern matching capabilities, and a quick overview of the MATCH_RECOGNIZE syntax.

A regular expression and a table: a fruitful relationship

The regex matching we all know is about searching for patterns in character strings. But how does a regex match a sequence of rows? Certainly, a row of data is a more complex structure than a character. And so, row pattern matching is more expressive than regex matching in text. Unlike characters, which stay constantly in their places in a string, rows aren’t assigned up-front to pattern components. This is where the additional level of complexity comes from: whether the row is an A, B or C, is conditional. It is revealed as the pattern matching goes forward. It depends on the data in the row, but also on the context of the current match and even on the match number. Also, a row can match different labels at a time.

Consider this simple example:

PATTERN: A B+ C D?

First, let’s match it to the string "ABBCEE". There is exactly one way to match it: the prefix "ABBC" is a match.

Now, let’s see what it takes to match a pattern to rows of a table. Consider the table numbers with a single column number:

You need defining conditions to define how the rows of the table can be mapped to pattern components A, B, C and D:

DEFINE:
    A <- true (matches every row)
    B <- number is greater than previous number
    C <- number is lower or equal to A
    D <- matches every row, but only in the first match;
         otherwise doesn't match any row

As you can see, the conditions can refer to other pattern components (C depends on A), or the sequential match number (D).

When searching for a match, the engine goes row by row, and assigns labels according to the pattern. Every time the pattern shows the next component (label) to be matched, the defining condition of that component is evaluated for the current row in the context of the partial match.

After finding a match, you can step one row forward and search for another one.

So far, two matches were found in the same set of rows. Interestingly, a row that was labeled as B in the first match, became A in the second match. Let’s try to find another match.

Time to get more technical

…and use some real ~~life~~ money examples.

In the preceding examples, the pattern consisted of components A, B, C and D. They were chosen this way to capture the analogy between pattern matching in a string and pattern matching in a set of rows. According to the SQL specification, row pattern components can be named with arbitrary identifiers, as long as they are compliant with the SQL identifier semantics, so you don’t need to limit yourself to single-letter names, and instead you can use more verbose labels.

Officially, the pattern components, or labels, are called the primary pattern variables. They are the basic components of the row pattern. Consider the following example:

PATTERN( START DOWN+ UP+ )

There are three primary pattern variables: START, DOWN and UP. The + is the “one or more” quantifier you know from the regex syntax. Intuitively, this pattern should match a sequence of rows which are first “decreasing”, and then “increasing”. You need to inform the engine how it should map rows to the variables. In other words, you need to define what the “decreasing” and “increasing” rows are:

DEFINE DOWN AS price < PREV(price),
       UP AS price > PREV(price)

Now it’s clear that “decreasing” and “increasing” is about the price values. There is no defining condition for the START variable, which informs the engine that the match can start anywhere.

The preceding example shows the two key clauses of row pattern recognition: PATTERN and DEFINE. Let’s see what other keywords there are in the MATCH_RECOHNIZE clause.

Syntax overview

The MATCH_RECOGNIZE syntax is long and rich enough to capture everything that a pattern matching tool needs, and all the options which let you easily toggle your matching strategies.

Technically, MATCH_RECOGNIZE is part of the FROM clause:

SELECT ...
    FROM some_table
        MATCH_RECOGNIZE (
          [ PARTITION BY column [, ...] ]
          [ ORDER BY column [, ...] ]
          [ MEASURES measure_definition [, ...] ]
          [ rows_per_match ]
          [ AFTER MATCH skip_to ]
          PATTERN ( row_pattern )
          [ SUBSET subset_definition [, ...] ]
          DEFINE variable_definition [, ...]
          )

MATCH_RECOGNIZE can be used in the query as one of the stages of processing data. You can SELECT from its results or even stream them into another MATCH_RECOGNIZE.

The PATTERN and DEFINE clauses are the heart of row pattern recognition. They are also the only two required subclauses of MATCH_RECOGNIZE. They were touched upon in the previous section.

The pattern syntax is close to regular expression syntax. It also supports some extensions specific to row pattern recognition. They are explained in Row pattern syntax.

The PARTITION BY and ORDER BY clauses are similar to those in the WINDOW syntax. They help you structure the input data. You can use PARTITION BY to break up your data into independent chunks. ORDER BY is useful to establish the order of rows before searching for the pattern. Typically, you want to analyze series of events over time, so ordering by date is a good choice.

In the MEASURES clause, you can specify what information you need about every match that is found. In the example, if you’re interested in the order date, the lowest value of price and the sequential number of the match, this is the way to retrieve them:

MEASURES order_date AS date,
         LAST(DOWN.price) AS bottom_price,
         MATCH_NUMBER() AS match_no

date, bottom_price and match_no are exposed by the pattern recognition clause as output columns.

The expressions in the MEASURES and DEFINE clauses allow you to combine the input data with the information about the matched pattern. They support many extensions and special constructs to help you get the most of your data, both when defining the pattern, and retrieving useful information after a successful match. The special keyword LAST is one example. For the full list of the magic spells, check Expressions for special tasks.

The MATCH_RECOGNIZE clause has two useful toggles. The first of them lets you choose whether the output includes all rows of the match, or a single-row summary. For all rows, specify ALL ROWS PER MATCH. For a single row, choose the default ONE ROW PER MATCH. There are also sub-options available, enabling different handling of empty matches and unmatched rows.

Another toggle is the AFTER MATCH SKIP clause. It allows you to specify where the row pattern matching resumes after finding a match. The default option is AFTER MATCH SKIP PAST LAST ROW, but you can also skip to the next row or to a specific position in the match based on the matched pattern variables.

The SUBSET clause is where the union pattern variables are defined. They are a concise way to refer to a group of primary pattern variables:

SUBSET U = (DOWN, UP)

The following expression returns the value of price from the last row matched either to DOWN or UP primary variable:

LAST(U.price)

Row pattern syntax

The basic element of row pattern is the primary pattern variable. Other syntax components include:

Concatenation

A B C

Alternation

A | B | C

Permutation

PERMUTE(A, B, C)

Grouping

(A B C)

Partition start anchor

Partition end anchor

Empty pattern

()

Exclusion syntax

{- row_pattern -}

Exclusion syntax is useful in combination with the ALL ROWS PER MATCH option. If you find some sections of the match uninteresting, you can wrap them in the exclusion, and they are dropped from the output.

Quantifiers

Row pattern syntax supports all kinds of quantifiers: the basic ones *, +, ?, and others, which let you specify the exact number of repetitions, or the accepted range: {n}, {n, m}, {n,}, {,n}. Make sure you don’t confuse those:

{n} is for exactly n repetitions,
{n,} is equal to {n, ∞},
{,n} is equal to {0, n}.

Quantifiers are greedy by default. It means that they prefer higher number of repetitions over lower number. If you want it the other way, you can change a quantifier to reluctant by appending ? immediately after it. So, (pattern)? prefers a single match of the pattern, while (pattern)?? would rather omit the pattern altogether.

Match preference

MATCH_RECOGNIZE is supposed to produce at most one match starting from a specific row. If there are more matches available, the winner is chosen based on the order of preference. The greedy and reluctant quantifiers are one example of preference. Other pattern components have their own rules:

pattern alternation prefers the left-hand components to the right-hand ones.
pattern permutation is equivalent to alternation of all permutations of its components. If multiple matches are possible, the match is chosen based on the lexicographical order established by the order of components in the PERMUTE list. For PERMUTE(A, B, C), the preference of options goes as follows: A B C, A C B, B A C, B C A, C A B, C B A.

Expressions for special tasks

The MATCH_RECOGNIZE clause provides special expression syntax, available in the MEASURES and DEFINE clauses. Its purpose is to combine the input data with the information about the match. The syntax includes:

Pattern variable references

They allow referring to certain components of the match, for example DOWN.price, UP.order_date.

Logical navigation operations: LAST, FIRST

They allow you to navigate over the rows of a match based on the pattern variables assigned to them. For example, LAST(DOWN.price, 3) navigates to the last row labeled as “DOWN”, goes three occurrences of the “DOWN” label backwards, and gets the price value from that row. The default offset is 0: LAST(DOWN.price) gets the price value from the last row labeled as “DOWN”. If the logical navigation goes beyond the match bounds, the operation returns null.

Physical navigation operations: PREV, NEXT

They let you navigate over the rows of the partition by a specified offset. Physical navigations use logical navigations as the starting point. For example, NEXT(DOWN.price, 5) first navigates to the last row labeled as “DOWN”. Starting from there, it goes five rows forward and gets the price value from that row. In the preceding example, the logical navigation LAST is implicit, but you can specify the nested logical navigation explicitly, for example NEXT(FIRST(DOWN.price, 4), 5). The default offset is 1, which means that the physical navigations by default go one row backwards, or one row forward.

The physical navigation can retrieve values beyond the match bounds. It gives you great flexibility. For example, the defining conditions of pattern variables can peek at the values ahead. Also, when computing row pattern measures, you can refer to the wider context of the match.

The CLASSIFIER function

It returns the primary pattern variable associated with the row.

The MATCH_NUMBER function

It returns the sequential number of the match within the partition.

The RUNNING and FINAL keywords

The expressions in the DEFINE clause are evaluated when the pattern matching is in progress. At each step, the engine only knows a part of the match. This is the running semantics.

The expressions of the MEASURES clause are evaluated when the match is complete. The engine can see the whole match from the position of the final row. This is the final semantics.

However, with the ALL ROWS PER MATCH option, when the match result is processed row by row, you can choose either approach to compute the measures. To do that, you can specify the RUNNING or FINAL keyword before the logical navigation operation, for example RUNNING LAST(DOWN.price) or FINAL LAST(DOWN.price).

The running semantics is the default both in the DEFINE and MESAURES clauses. Note that FINAL only applies to the MEASURES clause.

To sum up, here’s one complex measure expression combining different elements of the special syntax:

Trino CLI show-off time!

Now, let’s see the whole machinery come to life. This is the same example data that we used before, and the same goal: detect a “V”-shape of the price values over time for different customers.

trino> WITH orders(customer_id, order_date, price) AS (VALUES
    ('cust_1', DATE '2020-05-11', 100),
    ('cust_1', DATE '2020-05-12', 200),
    ('cust_2', DATE '2020-05-13',   8),
    ('cust_1', DATE '2020-05-14', 100),
    ('cust_2', DATE '2020-05-15',   4),
    ('cust_1', DATE '2020-05-16',  50),
    ('cust_1', DATE '2020-05-17', 100),
    ('cust_2', DATE '2020-05-18',   6))
SELECT customer_id, start_price, bottom_price, final_price, start_date, final_date
    FROM orders
        MATCH_RECOGNIZE (
            PARTITION BY customer_id
            ORDER BY order_date
            MEASURES
                START.price AS start_price,
                LAST(DOWN.price) AS bottom_price,
                LAST(UP.price) AS final_price,
                START.order_date AS start_date,
                LAST(UP.order_date) AS final_date
            ONE ROW PER MATCH
            AFTER MATCH SKIP PAST LAST ROW
            PATTERN (START DOWN+ UP+)
            DEFINE
                DOWN AS price < PREV(price),
                UP AS price > PREV(price)
            );

 customer_id | start_price | bottom_price | final_price | start_date | final_date
-------------+-------------+--------------+-------------+------------+------------
 cust_1      |         200 |           50 |         100 | 2020-05-12 | 2020-05-17
 cust_2      |           8 |            4 |           6 | 2020-05-13 | 2020-05-18
(2 rows)

Two matches are detected, one for cust_1, and one for cust_2.

Empty matches explained

An empty match is a legit result of row pattern recognition. There are different pattern constructs that can result in an empty match. The empty pattern syntax () is the trivial one. Empty match can also result e.g. from quantification: A*, or alternation: A | ().

An empty match does not consume any input rows, but like every match, it is associated with a row, called the starting row. That is the row at which the pattern matching started. Note that if the pattern allows an empty match, it guarantees that no rows remain unmatched. Also, an empty match, as well as non-empty matches, gets a sequential number, which can be retrieved by the MATCH_NUMBER function.

Depending on your use case, you can consider empty matches informative or just see them as a leftover of the algorithm.

There’s one more thing linked to empty matches. Some patterns have the dangerous potential of looping endlessly over a piece that doesn’t consume any rows. It doesn’t have to be as explicit as ()*. There are complex patterns that don’t show their looping potential at first glance. We handled them carefully so that you never have to waste your time on looping queries.

In a few words, what’s so cool about row pattern matching?

From the SQL viewpoint, you can think of row pattern matching as extended window functions. Window functions allow you to capture some dependencies in rows of data based on their relative position or value. Row pattern matching allows you to detect arbitrarily complicated dependencies, based not only on the input values but also on the details of the actual match and on the match number.

Before the introduction of MATCH_RECOGNIZE, you had to feed your data to external tools to reason about trends and patterns. Now, you can achieve it directly in your query, and even build your query upon the pattern recognition clause to further process the match results.

Row pattern matching is typically used:

in trade applications for tracking trends or identifying customers with specific behavioral patterns,
in shipping applications for tracking packages through all possible valid paths,
in financial applications for detecting unusual incidents, which might signal fraud.

What’s your use case?

I hope you enjoy Trino’s new feature. Refer to Trino docs for even more details, examples and usage tips. Please do reach out to us with any questions or issues. We plan to support row pattern matching in the WINDOW clause soon, so stay tuned!

17: Trino connector resurfaces API calls

2021-05-13T00:00:00+00:00

Commander Bun Bun is diving deep to find anomalies!

Resurface links

Guests

Rob Dickinson, Co-founder and CEO of Resurface (@robfromboulder)
Martin Traverso, creator of Trino/Presto, and CTO at Starburst (@mtraverso)

Concept of the week: Resurface and the Resurface connector

What is Resurface?

Resurface is an API system of record, which is a fancy way of saying that Resurface is a purpose-built database for API requests and responses. Like a weblog or access log, but on steroids because Resurface runs on Trino.

Why do you need a system of record for your APIs? Because otherwise you’re guessing about how your APIs are used and attacked, and guessing doesn’t feel good. Resurface helps your DevOps and security teams instantly find API failures, slowdowns, and attacks – easily, responsibly, and at scale.

How Resurface differs from logs & metrics

You probably use system monitoring tools, which tell you about what’s happening on your systems. What code is running, what code is slow, and what error codes are returned. That’s all great — but it still leaves a big gap between the system-level events you can see, and what your API consumers actually see.

Resurface helps you fill this gap with your own API system of record. Now your customers, your DevOps team, and your security team all have the same view of every transaction, because there is a record of the requests and responses.

The other obvious way to compare Resurface against other tools is to look at the data model. System monitoring gives you time-series metrics, or timestamped log messages with a severity and detail string. Resurface gives you all the request and response data fields, including headers and payloads, in a schema where all of those fields are discrete and searchable. Plus it adds a bunch of helpful virtual and computed columns.

The indexing Problem

Resurface has a very descriptive data model, but there’s a problem here – how to partition and index this data for efficient searching. Partitioning based on time is the obvious starting point, but within a time range, what then? Index everything?

Most databases work best when a subset of the columns are constrained at once – but in their case, they have strong reasons for wanting to use all columns at once. A system monitoring tool might give you a count of “500 codes” – but they want to detect silent failures, like malformed JSON payloads or airline tickets selling for less than twenty dollars. That means looking at the URL, content type, other headers, and payloads, all at the same time.

They also want to classify kinds of API consumers by their behaviors – are they using or attacking your API? To classify those behaviors. Again, they look at the URL, content type, payloads. If they can query for the yellow region below, they find lost revenue that they can recover.

Now you might be thinking – maybe the best solution is to do all this processing when the API calls are captured, but then how would you identify a new zero-day failure or attack? The definition of “responses failed” and “threats” needs to be changeable without having to reprocess any data, which really favors query-time processing.

The example below is pretty much as simple as this gets. I struggled to find one of these queries that actually fits in a reasonable amount of space.

So how to build a database that does these kinds of queries in reasonable time?

The Resurface connector

The first prototype actually used the Trino memory connector, which gave them the kind of query performance that they were looking for, but wasn’t shippable (for obvious reasons).

Then they tried Redis as a replacement in-memory db, but the problem is that the queries are gonna pull all the data in Redis over the network for every query. Not cool.

Trino allows you to move the queries closer to the data, and so that’s what they did. They took inspiration from the “local file” connector, where the connector reads directly from the filesystem instead of over the network.

Then the question was, what file format to use? They tried JSON, CSV, Protocol Buffers, and ultimately found the fastest and simplest approach was just to write a simple binary file format that requires no real parsing. When these files fit in memory, their connector can process SQL queries at 4GB/sec per core. The connector was easy to write because they’re just mapping between fields in the binary files and the columns exposed to Trino. They built the first version of their connector in a weekend!

Why not just use Avro?

Simple requirements – basic versioning, no secondary objects, limited data types
Zero-allocation reader for fast linear scan – one memcpy per physical column
Connector can report null/not-null without type conversion
Connector defers type conversion until getXXX() method
getSlice() just wraps an existing buffer (zero allocation)

Most of these optimizations were realized by working backwards from the Trino connector API to get the best linear scan performance imaginable.

Combining API calls with other data

Now they can deliver API call data out to all the different kinds of SQL clients out there, and they’re also able to combine API call data with data stored in other databases.

This is really exciting because your Resurface database plays nicely with all your other databases that are bridged together with Trino. That means that actual API traffic can be brought into your customer data mart, or combined with data from any other systems, in real time!

PR of the week: PR 4022 Add Soundex function

A big shoutout to tooptoop4 for their contribution to this weeks PR of the week.

This PR adds the soundex() function, which is a phonetic function. These functions show up in the WHERE clause of a query to find words that sound similar. There’s a few examples in the demo below.

Thanks for this awesome contribution!

Demo: Using the soundex function

SELECT * 
FROM (
  VALUES 
  (1, 'Bri'), 
  (2, 'Bree'), 
  (3, 'Bryan'), 
  (4, 'Brian'), 
  (5, 'Briann'), 
  (6, 'Brianna'), 
  (7, 'Briannas'),
  (8, 'Bri Jan'),  
  (9, 'Bri Yan'),  
  (10, 'Bob')
) names(id, name)
WHERE soundex(name) = soundex('Brian');

# Results:
# |id |name   |
# |---|-------|
# |3  |Bryan  |
# |4  |Brian  |
# |5  |Briann |
# |6  |Brianna|
# |9  |Bri Yan|

SELECT * 
FROM (
  VALUES 
  (1, 'Man'), 
  (2, 'Fred'), 
  (3, 'Manfred'), 
  (4, 'Can fed'), 
  (5, 'Tan bed'), 
  (6, 'Man Fred'), 
  (7, 'Man dread'), 
  (8, 'Bob')
) names(id, name)
WHERE soundex(name) = soundex('Manfred');

# Results:
# |id |name    |
# |---|--------|
# |3  |Manfred |
# |6  |Man Fred|

SELECT * 
FROM (
  VALUES 
  (1, 'Martin'), 
  (2, 'Mar teen'), 
  (3, 'Mar tin'), 
  (4, 'Marteen'), 
  (5, 'Mart in')
) names(id, name)
WHERE soundex(name) = soundex('Martin');

# Results:
# |id |name    |
# |---|--------|
# |1  |Martin  |
# |2  |Mar teen|
# |3  |Mar tin |
# |4  |Marteen |
# |5  |Mart in |

SELECT * 
FROM (
  VALUES 
  (1, 'Robert'), 
  (2, 'Rob'), 
  (3, 'Bob'), 
  (4, 'Bobert'), 
  (5, 'Bobby')
) names(id, name)
WHERE soundex(name) = soundex('Rob');

# Results:
# |id |name|
# |---|----|
# |2  |Rob |


SELECT * 
FROM (
  VALUES 
  (1, 'Christ'), 
  (2, 'Christeen'), 
  (3, 'Christian'), 
  (4, 'Christine'), 
  (5, 'Chris'), 
  (6, 'Kristine')
) names(id, name)
WHERE soundex(name) = soundex('Christine');

# Results:
# |id |name     |
# |---|---------|
# |1  |Christ   |
# |2  |Christeen|
# |3  |Christian|
# |4  |Christine|

# What the results actually return

SELECT name, soundex(name)
FROM (
  VALUES 
  (1, 'Christ'), 
  (2, 'Christeen'), 
  (3, 'Christian'), 
  (4, 'Christine'), 
  (5, 'Chris'), 
  (6, 'Kristine'), 
  (6, 'Christine')
) names(id, name);

# Results:
# |name     |_col1|
# |---------|-----|
# |Christ   |C623 |
# |Christeen|C623 |
# |Christian|C623 |
# |Christine|C623 |
# |Chris    |C620 |
# |Kristine |K623 |

Question of the week: How to export query results into a file (e.g. CTAS, but into a single file)?

This is possible using the Trino CLI’s --execute option in conjunction with the redirect operator (>). You may also use other options, such as, --output-format to specify the format of the data going to the file (e.g. if you want a csv, tsv, json, headers, etc…)

Output format for batch mode [ALIGNED, VERTICAL, TSV, TSV_HEADER, CSV, CSV_HEADER, CSV_UNQUOTED, CSV_HEADER_UNQUOTED, JSON, NULL] (default: CSV)

Here is an example of the command you would run using the cli executable trino.

trino --execute "select * from tpch.sf1.customer limit 5" \
--server http://localhost:8080 \
--output-format CSV_HEADER > customer.csv

If you’re running Trino in Docker, here is an example command to run this in a temporary Trino container.

docker run --rm -ti \
    --network=trino-hdfs3_trino-network \
    --name export-trino-data \
    trinodb/trino:latest \
    trino --execute "select * from tpch.sf1.customer limit 5" \
    --server http://trino-coordinator:8080 \
    --output-format CSV_HEADER > customer.csv

If you have a very complex query that takes up multiple lines, or you don’t want to spend half of your day escaping quotations, you can put your SQL into a file and reference the query using the -f or --file options. The query above could be represented as this query:

trino --file query.sql \
--server http://localhost:8080 \
--output-format CSV_HEADER > customer.csv

This query along with the following query.sql file produces an equivalent query:

select * 
from tpch.sf1.customer 
limit 5;

Finally, one last trick is to stage the data using the memory connector to stage the data and finally export it. The Trino Definitive Guide has example for adding Iris data set into memory connector storage with CLI.

Events, news, and various links

Apache Iceberg: A table format for data lakes with unforeseen use cases
- Americas meetup
- May 26th, 2021 @ 5:30p EDT
- Link: https://www.meetup.com/trino-americas/events/278103777/
Trino Summit
- Hybrid event
- September 15th, 2021
- Link: http://starburst.io/trinosummit2021

Blogs

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Francisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

Latest training from David, Dain, and Martin(Now with timestamps!):

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Trino on ice I: A gentle introduction To Iceberg

2021-05-03T00:00:00+00:00

Back in the Gentle introduction to the Hive connector blog post, I discussed a commonly misunderstood architecture and uses of the Trino Hive connector. In short, while some may think the name indicates Trino makes a call to a running Hive instance, the Hive connector does not use the Hive runtime to answer queries. Instead, the connector is named Hive connector because it relies on Hive conventions and implementation details from the Hadoop ecosystem - the invisible Hive specification.

I call this specification invisible because it doesn’t exist. It lives in the Hive code and the minds of those who developed it. This is makes it very difficult for anybody else who has to integrate with any distributed object storage that uses Hive, since they had to rely on reverse engineering and keeping up with the changes. The way you interact with Hive changes based on which version of Hive or Hadoop you are running. It also varies if you are in the cloud or over an object store. Spark has even modified the Hive spec in some ways to fit the Hive model to their use cases. It’s a big mess that data engineers have put up with for years. Yet despite the confusion and lack of organization due to Hive’s number of unwritten assumptions, the Hive connector is the most popular connector in use for Trino. Virtually every big data query engine uses the Hive model today in some form. As a result it is used by numerous companies to store and access data in their data lakes.

So how did something with no specification become so ubiquitous in data lakes? Hive was first in the large object storage and big data world as part of Hadoop. Hadoop became popular from good marketing for Hadoop to solve the problems of dealing with the increase in data with the Web 2.0 boom . Of course, Hive didn’t get everything wrong. In fact, without Hive, and the fact that it is open source, there may not have been a unified specification at all. Despite the many hours data engineers have spent bashing their heads against the wall with all the unintended consequences of Hive, it still served a very useful purpose.

So why did I just rant about Hive for so long if I’m here to tell you about Apache Iceberg? It’s impossible for a teenager growing up today to truly appreciate music streaming services without knowing what it was like to have an iPod with limited storage, or listening to a scratched burnt CD that skips, or flipping your tape or record to side-B. The same way anyone born before the turn of the millennium really appreciates streaming services, so you too will appreciate Iceberg once you’ve learned the intricacies of managing a data lake built on Hive and Hadoop.

If you haven’t used Hive before, this blog post outlines just a few pain points that come from this data warehousing software to give you proper context. If you have already lived through these headaches, this post acts as a guide to Iceberg from Hive. This post is the first in a series of blog posts discussing Apache Iceberg in great detail, through the lens of the Trino query engine user. If you’re not aware of Trino (formerly PrestoSQL) yet, it is the project that houses the founding Presto community after the founders of Presto left Facebook. This and the next couple of posts discuss the Iceberg specification and all the features Iceberg has to offer, many times in comparison with Hive.

Before jumping into the comparisons, what is Iceberg exactly? The first thing to understand is that Iceberg is not a file format, but a table format. It may not be clear what this means by just stating that, but the function of a table format becomes clearer as the improvements Iceberg brings from the Hive table standard materialize. Iceberg doesn’t replace file formats like ORC and Parquet, but is the layer between the query engine and the data. Iceberg maps and indexes the files in order to provide a higher level abstraction that handles the relational table format for data lakes. You will understand more about table formats through examples in this series.

Hidden Partitions

Hive Partitions

Since most developers and users interact with the table format via the query language, a noticeable difference is the flexibility you have while creating a partitioned table. Assume you are trying to create a table for tracking events occurring in our system. You run both sets of SQL commands from Trino, just using the Hive and Iceberg connectors which are designated by the catalog name (i.e. the catalog name starting with hive. uses the Hive connector, while the iceberg. table uses the Iceberg connector). To begin with, the first DDL statement attempts to create an events table in the logging schema in the hive catalog, which is configured to use the Hive connector. Trino also creates a partition on the events table using the event_time field which is a TIMESTAMP field.

CREATE TABLE hive.logging.events (
  level VARCHAR,
  event_time TIMESTAMP,
  message VARCHAR,
  call_stack ARRAY(VARCHAR)
) WITH (
  format = 'ORC',
  partitioned_by = ARRAY['event_time']
);

Running this in Trino using the Hive connector produces the following error message.

Partition keys must be the last columns in the table and in the same order as the table properties: [event_time]

The Hive DDL is very dependent on ordering for columns and specifically partition columns. Partition fields must be located in the final column positions and in the order of partitioning in the DDL statement. The next statement attempts to create the same table, but now with the event_time field moved to the last column position.

CREATE TABLE hive.logging.events (
  level VARCHAR,
  message VARCHAR,
  call_stack ARRAY(VARCHAR),
  event_time TIMESTAMP
) WITH (
  format = 'ORC',
  partitioned_by = ARRAY['event_time']
);

This time, the DDL command works successfully, but you likely don’t want to partition your data on the plain timestamp. This results in a separate file for each distinct timestamp value in your table (likely almost a file for each event). In Hive, there’s no way to indicate the time granularity at which you want to partition natively. The method to support this scenario with Hive is to create a new VARCHAR column, event_time_day that is dependent on the event_time column to create the date partition value.

CREATE TABLE hive.logging.events (
  level VARCHAR,
  event_time TIMESTAMP,
  message VARCHAR,
  call_stack ARRAY(VARCHAR),
  event_time_day VARCHAR
) WITH (
  format = 'ORC',
  partitioned_by = ARRAY['event_time_day']
);

This method wastes space by adding a new column to your table. Even worse, it puts the burden of knowledge on the user to include this new column for writing data. It is then necessary to use that separate column for any read access to take advantage of the performance gains from the partitioning.

INSERT INTO hive.logging.events
VALUES
(
  'ERROR',
  timestamp '2021-04-01 12:00:00.000001',
  'Oh noes', 
  ARRAY ['Exception in thread "main" java.lang.NullPointerException'], 
  '2021-04-01'
),
(
  'ERROR',
  timestamp '2021-04-02 15:55:55.555555',
  'Double oh noes',
  ARRAY ['Exception in thread "main" java.lang.NullPointerException'],
  '2021-04-02'
),
(
  'WARN', 
  timestamp '2021-04-02 00:00:11.1122222',
  'Maybeh oh noes?',
  ARRAY ['Bad things could be happening??'], 
  '2021-04-02'
);

Notice that the last partition value '2021-04-01' has to match the TIMESTAMP date during insertion. There is no validation in Hive to make sure this is happening because it only requires a VARCHAR and knows to partition based on different values.

On the other hand, If a user runs the following query:

SELECT *
FROM hive.logging.events
WHERE event_time < timestamp '2021-04-02';

they get the correct results back, but have to scan all the data in the table:

level	event_time	message	call_stack
ERROR	2021-04-01 12:00:00	Oh noes	Exception in thread “main” java.lang.NullPointerException

This happens because the user forgot to include the event_time_day < '2021-04-02' predicate in the WHERE clause. This eliminates all the benefits that led us to create the partition in the first place and yet frequently this is missed by the users of these tables.

SELECT *
FROM hive.logging.events
WHERE event_time < timestamp '2021-04-02' 
AND event_time_day < '2021-04-02';

Result:

level	event_time	message	call_stack
ERROR	2021-04-01 12:00:00	Oh noes	Exception in thread “main” java.lang.NullPointerException

Iceberg Partitions

The following DDL statement illustrates how these issues are handled in Iceberg via the Trino Iceberg connector.

CREATE TABLE iceberg.logging.events (
  level VARCHAR,
  event_time TIMESTAMP(6),
  message VARCHAR,
  call_stack ARRAY(VARCHAR)
) WITH (
  partitioning = ARRAY['day(event_time)']
);

Taking note of a few things. First, notice the partition on the event_time column that is defined without having to move it to the last position. There is also no need to create a separate field to handle the daily partition on the event_time field. The partition specification is maintained internally by Iceberg, and neither the user nor the reader of this table needs to know anything about the partition specification to take advantage of it. This concept is called hidden partitioning , where only the table creator/maintainer has to know the partitioning specification. Here is what the insert statements look like now:

INSERT INTO iceberg.logging.events
VALUES
(
  'ERROR',
  timestamp '2021-04-01 12:00:00.000001',
  'Oh noes', 
  ARRAY ['Exception in thread "main" java.lang.NullPointerException']
),
(
  'ERROR',
  timestamp '2021-04-02 15:55:55.555555',
  'Double oh noes',
  ARRAY ['Exception in thread "main" java.lang.NullPointerException']),
(
  'WARN', 
  timestamp '2021-04-02 00:00:11.1122222',
  'Maybeh oh noes?',
  ARRAY ['Bad things could be happening??']
);

The VARCHAR dates are no longer needed. The event_time field is internally converted to the proper partition value to partition each row. Also, notice that the same query that ran in Hive returns the same results. The big difference is that it doesn’t require any extra clause to indicate to filter partition as well as filter the results.

SELECT *
FROM iceberg.logging.events
WHERE event_time < timestamp '2021-04-02';

Result:

level	event_time	message	call_stack
ERROR	2021-04-01 12:00:00	Oh noes	Exception in thread “main” java.lang.NullPointerException

So hopefully that gives you a glimpse into what a table format and specification are, and why Iceberg is such a wonderful improvement over the existing and outdated method of storing your data in your data lake. While this post covers a lot of aspects of Iceberg’s capabilities, this is just the tip of the Iceberg…

If you want to play around with Iceberg using Trino, check out the Trino Iceberg docs. The next post covers how table evolution works in Iceberg, as well as, how Iceberg is an improved storage format for cloud storage.

16: Make data fluid with Apache Druid

2021-04-29T00:00:00+00:00

Commander Bun Bun the speedy druid!

Druid links

Guests

Samarth Jain, Software Engineer at Netflix (@samarthjain11)
Parth Brahmbhatt, Senior Software Engineer at Netflix (@brahmbhattparth)
Rachel Pedreschi, VP Community and Developer Relations at Imply (@rachelpedreschi)

Release 356

Release notes discussed: https://trino.io/docs/current/release/release-356.html

General:
- MATCH_RECOGNIZE clause support, used to detect patterns in a set of rows within a single query
- soundex function
- Property to limit planning time (and improved behavior about cancel during planning)
- A bunch of performance improvements around pushdown (and start of docs for pushdowns)
- Misc improvements around materialized views support
JDBC driver - OAuth2 token caching in memory
BigQuery - create and drop schema
Hive - Parquet, ORC and Azure ADL improvements
Iceberg - SHOW TABLES even when tables created elsewhere
Kafka - SSL support
Metadata caching improvements for a bunch of connectors
SPI: couple of changes

Concept of the week: Apache Druid and realtime analytics

This week covers Apache Druid, a modern, real-time OLAP database. Joining us is the head of developer relations at Imply, the company that creates an enterprise version of Druid, to cover what Druid is, and the use cases it solves.

Here are the slides that Rachel uses in the show:

Druid Architecture

Druid has several process types:

Coordinator processes manage data availability on the cluster.
Overlord processes control the assignment of data ingestion workloads.
Broker processes handle queries from external clients.
Router processes are optional processes that can route requests to Brokers, Coordinators, and Overlords.
Historical processes store queryable data.
MiddleManager processes are responsible for ingesting data.

The Druid architecture.

Druid processes can be deployed any way you like, but for ease of deployment we suggest organizing them into three server types: Master, Query, and Data.

Master: Runs Coordinator and Overlord processes, manages data availability and ingestion.
Query: Runs Broker and optional Router processes, handles queries from external clients.
Data: Runs Historical and MiddleManager processes, executes ingestion workloads and stores all queryable data.

Source: https://druid.apache.org/docs/latest/design/architecture.html.

PR of the week: PR 3522 Add Druid connector

Our guest, Samarth, is the author of this weeks PR of the week. Puneet Jaiswal is the first engineer that started work to add a Druid connector. Later, Samarth picked up the torch and the Trino Druid connector became available in release 337.

An honorable mention goes to our other guest, Parth, for doing some preliminary work that enabled aggregation pushdown in the SPI. This enabled the use of the Druid connector to actually scale well with the completion of PR 4313 (see future work below).

A third honorable PR, that was completed by @findepi, was adding pushdown to the jdbc client which appeared in release 337 along with the Druid connector.

It is incredible to see the amount of hands that various features and connectors pass through to get to the final release.

Future work:

Demo: Using the Druid Web UI to create an ingestion spec querying via Trino

Let’s start up the Druid cluster along with the required Zookeeper and PostgreSQL instance. Clone this repository and navigate to the trino-druid directory.

git clone git@github.com:bitsondatadev/trino-getting-started.git

cd community_tutorials/druid/trino-druid

docker-compose up -d

To do batch insert, navigate to the Druid Web UI once it has finished starting up at http://localhost:8888. Once that is done, click the “Load data” button, choose, “Example data”, and follow the prompts to create the native batch ingestion spec. Once the spec is created, run the job and ingest the data. More information can be found here: https://druid.apache.org/docs/latest/tutorials/index.html

The Druid architecture.

Once Druid completes the task, open up a Trino connection and validate that the druid catalog exists.

docker exec -it trino-druid_trino-coordinator_1 trino

trino> SHOW CATALOGS;

 Catalog 
---------
 druid   
 system  
 tpcds   
 tpch    
(4 rows)

Now show the tables under the druid.druid schema.

trino> SHOW TABLES IN druid.druid;
   Table   
-----------
 wikipedia 
(1 row)

Run a SHOW CREATE TABLE to see the column definitions.

trino> SHOW CREATE TABLE druid.druid.wikipedia;
             Create Table             
--------------------------------------
 CREATE TABLE druid.druid.wikipedia ( 
    __time timestamp(3) NOT NULL,     
    added bigint NOT NULL,            
    channel varchar,                  
    cityname varchar,                 
    comment varchar,                  
    commentlength bigint NOT NULL,    
    countryisocode varchar,           
    countryname varchar,              
    deleted bigint NOT NULL,          
    delta bigint NOT NULL,            
    deltabucket bigint NOT NULL,      
    diffurl varchar,                  
    flags varchar,                    
    isanonymous varchar,              
    isminor varchar,                  
    isnew varchar,                    
    isrobot varchar,                  
    isunpatrolled varchar,            
    metrocode varchar,                
    namespace varchar,                
    page varchar,                     
    regionisocode varchar,            
    regionname varchar,               
    user varchar                      
 )                                    
(1 row)

Finally, query the first 5 rows of data showing the user and how much they added.

trino> SELECT user, added FROM druid.druid.wikipedia LIMIT 5;
      user       | added 
-----------------+-------
 Lsjbot          |    31 
 ワーナー成増    |   125 
 181.230.118.178 |     2 
 JasonAQuest     |     0 
 Kolega2357      |     0 
(5 rows)

Question of the week: Why doesn’t the Druid connector use the native json over http calls?

To answer this question I’m going to quote Samarth and Parth on this from this super long but enlightening thread on the subject.

Samarth’s take:

Pro JDBC:

Going forward, Druid SQL is going to be the de-facto way of accessing Druid data with native JSON queries being more of an advanced level use case. A benefit of down the SQL route is that we can take advantage of all the changes made in the Druid SQL optimizer land like using vectorized query processing when possible, when to use a TopN vs group by query type, etc. If we were to hit historicals directly, which don’t support SQL querying, we potentially won’t be taking advantages of such optimizations unless we keep porting/applying them to the trino-druid connector which may not always be possible.
If we end up letting a Trino node act as a Druid broker (which is what would happen I assume when you let a Trino node do the final merging), then, you would need to allocate similar kinds of resources (direct memory buffers, etc.) to all the Trino worker nodes as a Druid broker which may not be ideal.
This is not necessarily a limitation but adds complexity - with your proposed implementation, the Trino cluster will need to maintain state about what Druid segments are hosted on what data nodes (middle managers and historicals). The Druid broker already maintains that state and having to replicate and store all that state on the Trino coordinator will demand more resources out of it.
To your point on SCAN query overwhelming the broker - that shouldn’t be the case as Druid scan query type streams results through broker instead of materializing all of them in memory. See: https://druid.apache.org/docs/latest/querying/scan-query.html

Pro HTTP:

One use case where directly hitting the historicals may help is when the group by key space is large (like a group by on UUID like column). For a very large data set, a Druid broker can get overwhelmed when performing the giant merge. By hitting historicals directly, we can let historicals do first level merge followed by multiple Trino workers doing the second level merge. I am not sure if solving for this limited use case is worth going the http native query route, though. IMHO, Druid generally isn’t built for pulling lots of data out of it. You can do it, but whether you want to push that work down to Druid cluster or let Trino directly pull it down for you is debatable.

I would advocate for going the Druid SQL route at least for the initial version of the connector. This would provide a solution for the majority of the use cases that Druid generally is used for (OLAP style queries over pre-aggregated data). We could in the next version of the connector, possibly focus on adding a new mode of the connector which can make native JSON queries directly to the Druid historicals and middle managers instead of submitting SQL queries to the broker.

Parth’s take:

Our general take is that Druid is designed as OLAP cube and so it is really fast when it comes to aggregate queries over reasonable cardinality dimensions and will not work well for use cases that are treating it like a regular data warehouse and trying to do pure select scans with filter. The primary reason most of our users would look to Trino’s Druid connector is:

To be able to join already aggregated data in Druid to some other datastore in our warehouse.
To gain access through tooling that doesn’t have good support for Druid inherently for dashboarding use cases (think Tableau).

Even if we wanted to support the use cases that Druid is not designed for in a more efficient manner by going thorough historicals directly, it has other implications. We are now talking about partial aggregation pushdown which is more complicated IMO than our current approach of complete pushdown. We could choose to take the approach that others have taken where we can incrementally add a mode to Druid connector to either use JDBC or go directly to historical, but I really don’t think it’s a good idea to block the current development in hopes of a more efficient future version specially when this is just implementation detail that we can switch anytime without breaking any user queries.

Events, news, and various links

Trino Summit: http://starburst.io/trinosummit2021

Blogs

Videos

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

Latest training from David, Dain, and Martin(Now with timestamps!):

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Trino: The Definitive Guide

2021-04-21T00:00:00+00:00

Just over a year ago we announced the availability of the first book about Trino - our definitive guide. Back then the project was still called Presto, and the rename with the end of 2020 was a good reason for us to give the book a refresh.

Today, we are happy to announce that a new edition now titled Trino: The Definitive Guide is available.

Get a free copy of Trino: The Definitive Guide from Starburst now!

The new edition of the book from O’Reilly is available in digital formats as well as physical copies. You can find more information about the book on our permanent page about it.

The book is now updated to Trino release 354 for all filenames, installation methods, command, names and properties. We addressed all problems found by our readers and reported to us as well.

Our major supporter, Starburst, allowed us to work on the book and bring it across the finish line again. You can get a free digital copy from Starburst.

So what are you waiting for? Go get a copy, check out the updated example code repository, provide feedback and contact us on Slack.

Looking forward to it all!

Matt, Manfred and Martin

15: Iceberg right ahead!

2021-04-15T00:00:00+00:00

Looks like Commander Bun Bun is safe on this Iceberg
https://joshdata.me/iceberger.html

Iceberg links

Guests

Ryan Blue, creator of Iceberg, and Senior Software Engineer at Netflix (@rdblue)
David Phillips, creator of Trino/Presto, and CTO at Starburst (@electrum32)

Release 355

Release notes discussed: https://trino.io/docs/current/release/release-355.html

Martin’s list:

Multiple password authentication plugins
Column and table lineage reporting in query events
Improved planning performance for queries against Phoenix or SQL Server
Improved performance for ORDER BY … LIMIT queries against Phoenix

Manfred’s notes:

Security overview and TLS pages and authentication types
Reiterate multiple authentication providers (ldap1, ldap2, password)
Improved parallelism for table bucket count is small compared to number of nodes.
Include information about Spill to disk in EXPLAIN ANALYZE
Unixtime function changes
Hive view support improvements

Concept of the week: Apache Iceberg and the Iceberg spec

Interview with Ryan Blue

In the previous episode, we covered the differences between the Iceberg table format, and the Hive table format from a technical standpoint in the context of Trino. We highly recommend watching it before this episode. In this episode we ask Ryan about the origins of Apache Iceberg and why he started the project. We cover some details of the Iceberg specification which is a nice change from the ad-hoc specification that people adhere to when using Hive tables. Then Ryan dives into several amazing use cases how Netflix and others use Iceberg.

PR of the week: PR 7233 Fix queries on tables without snapshot id

This week’s PR of the week was submitted by one of the Trino maintainers, Pratham Desai. Pratham is a Software Engineer at LinkedIn who commits a lot of time in the Trino community helping out on the slack channel, contributing code, and doing PR reviews. Thank you for all you do Pratham!

Had Brian known about this PR, he wouldn’t have had the issue he did with reading the empty snapshot created with the Iceberg Java API and would have been able to read and insert into the table just fine. If you come across this issue, we introduced this feature in release 344!

Another future development for the Trino Iceberg connector

Along with the future developments we discussed in the previous episode, another core Iceberg functionality that we want to add in Trino is support for partition migration. We also discussed future support for UPDATE and MERGE capabilities for the Iceberg connector.

Demo: Creating tables with Iceberg and reading the data in Trino

For this weeks’ demo, we continue to use the Iceberg Java API to create a table. You also have the option to use Trino, Spark, or other to ingest and query the data, but I wanted to use vanilla Iceberg API’s to experience the API and hopefully solidify my learning of Iceberg concepts in the process. Make sure you follow the instructions in the repository if you don’t have Docker or Java installed.

Let’s start up a local Trino coordinator and Hive metastore. Clone this repository and navigate to the iceberg/trino-iceberg-minio directory. Then start up the containers using Docker Compose.

git clone git@github.com:bitsondatadev/trino-getting-started.git

cd iceberg/trino-iceberg-minio

docker-compose up -d

In your favorite IDE, open the files under iceberg/iceberg-java into your project and run the IcebergMain class.

This class creates a logging table if it doesn’t exist along with the logging schema. Once you run this code, you can check to see that the table in Trino exists in the metastore under TABLE_PARAMS.

Now we transition from the Java API to running queries over Iceberg using Trino.

/**
 * This is the equivalent of running IcebergMain in the iceberg-java project.
 * Go ahead and inspect the java code you can use to interact with Iceberg
 * tables and metadata.
 */
CREATE TABLE iceberg.logging.logs (
   level varchar NOT NULL,
   event_time timestamp(6) with time zone NOT NULL,
   message varchar NOT NULL,
   call_stack array(varchar)
)
WITH (
   format = 'ORC',
   partitioning = ARRAY['hour(event_time)','level']
)

/**
 * Read From Trino
 */

SELECT * FROM iceberg.logging.logs;

/**
 * Write data from Trino and check data and snapshots
 */

INSERT INTO iceberg.logging.logs VALUES 
(
  'ERROR', 
  timestamp '2021-04-01' AT TIME ZONE 'America/Los_Angeles', 
  'Oh noes',
  ARRAY ['Exception in thread "main" java.lang.NullPointerException']
);

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging."logs$snapshots";

/**
 * Write more data from Trino and check data and snapshots
 */
INSERT INTO iceberg.logging.logs 
VALUES 
(
  'ERROR', 
  timestamp '2021-04-01' AT TIME ZONE 'America/Los_Angeles', 
  'Oh noes', 
  ARRAY ['Exception in thread "main" java.lang.NullPointerException']
), 
(
  'ERROR', 
  timestamp '2021-04-01 15:55:23.383345' AT TIME ZONE 'America/Los_Angeles', 
  'Double oh noes', 
  ARRAY ['Exception in thread "main" java.lang.NullPointerException']
), 
(
  'WARN', 
  timestamp '2021-04-01 15:55:23.383345' AT TIME ZONE 'America/Los_Angeles', 
  'Maybeh oh noes?', 
  ARRAY ['bad things could be happening']
);

 
SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging."logs$snapshots";

/**
 * Read data from an old snapshot (Time travel)
 */

SELECT * FROM iceberg.logging."logs@2806470637437034115";

/**
 * Add new column, notice there is no snapshots of the metadata
 */

ALTER TABLE iceberg.logging.logs ADD COLUMN severity INTEGER;

SHOW CREATE TABLE iceberg.logging.logs;

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging."logs$snapshots";

/**
 * Insert new data with new column
 */

INSERT INTO iceberg.logging.logs VALUES 
(
  'INFO', 
  timestamp '2021-04-01 19:59:59.999999' AT TIME ZONE 'America/Los_Angeles', 
  'es muy bueno', 
  ARRAY ['It is all normal'], 
  1
);

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging."logs$snapshots";

/**
 * Rename column and drop column
 */

ALTER TABLE iceberg.logging.logs RENAME COLUMN severity TO priority;

SHOW CREATE TABLE iceberg.logging.logs;

SELECT * FROM iceberg.logging.logs;

ALTER TABLE iceberg.logging.logs DROP COLUMN priority;

SHOW CREATE TABLE iceberg.logging.logs;

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging."logs$snapshots";

/**
 * Travel back to previous snapshots
 */

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging."logs$snapshots";

SELECT * FROM iceberg.logging."logs@<insert-earlier-snapshot>";

CALL system.rollback_to_snapshot('logging', 'logs', <insert-earlier-snapshot>)

/**
 * Back to the future snapshot
 */

SELECT * FROM iceberg.logging."logs$snapshots";

SELECT * FROM iceberg.logging."logs@<insert-latest-snapshot>";

CALL system.rollback_to_snapshot('logging', 'logs', <insert-latest-snapshot>)

SELECT * FROM iceberg.logging.logs;

SELECT * FROM iceberg.logging."logs$partitions";

Question of the week: What do I do to restart the test pipeline if it fails on me?

When developing with Trino, there is an automated build that acts as verification of any PR. It is powered by a GitHub actions definition and runs all the tests in Trino when developers add new code. Sometimes test unrelated to the changes in your PR fail, which makes your PR show that it shouldn’t be merged due to a failure, but is actually unrelated.

Developers are aware of these flaky tests, and need a mechanism to resubmit their PR and rerun the tests. There is unfortunately no way to enable users to rerun tests through GitHub without write permissions to the Trino repository, so you have to do a dummy commit.

This can easily be done using this one line hack git commit --amend --no-edit && git push -f.

The good news is, we have gone through some extensive lengths to identify flaky tests in the last year. These test failures are much rarer now, and we are constantly improving the build stability as an ongoing effort.

Events, news, and various links

WTD Portland

Interested in supporting the Trino project, but don’t know where to start? A good place to start with a little less barrier to entry, is adding to the documentation. We will be supporting the writing day at the Write the Docs (WTD) Portland conference this April! Join us to learn how to get involved!

Virtual Trino meetups

Come join us for the inaugural Virtual Trino meetup on April 21st in the virtual meetup group in your region! See the community page for more details.

At these meetups, the four Trino/Presto founders will be updating everyone on the state of Trino. We’ll discuss the rebrand, talk about the recent features, and discuss the trajectory of the project. Then we will host a hangout and an ask me anything (AMA) session. Hope to see you all there!

Blogs

Videos

Trino Meetup groups

Virtual
East Coast (US)
- Trino Boston
- Trino NYC
West Coast (US)
- Trino San Fransisco
- Trino Los Angeles
Mid West (US)
- Trino Chicago

Latest training from David, Dain, and Martin(Now with timestamps!):

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Trino at Writing Day

2021-04-14T00:00:00+00:00

First time Trino blogger, long time lurker on the Trino slack. My name is Rose Williams and I’m an open source docs enthusiast! I’ve had the pleasure of contributing to this community for the past few months. Recently I’ve been working with Brian Olsen, our fearless developer advocate, as well as some of our other Trino doc contributors, to get Trino ready for the Write the Docs Writing Day open source event!

If you’re not familiar with Write the Docs, it’s a global community of people who care about documentation.

“We consider everyone who cares about communication, documentation, and their users to be a member of our community. This can be programmers, tech writers, developer advocates, customer support, marketers, and anyone else who wants people to have great experiences with software.”

Writing Day is the first day of their upcoming virtual documentation conference, Write the Docs Portland (PST) April 25-27, 2021. The goal of Writing Day is to get a bunch of interesting people in a room together and introduce them to cool open source projects that they can onboard and contribute to.

Writing Day is open to all conference attendees and several Trino enthusiasts are attending as mentors. Leading up to the conference, we’re focused on identifying docs issues that are ideal for first time contributors. If you’re a regular Trino contributor, you might notice that we’re going through and tagging items as “good first issue” and “docs” - we’ll be using those tags to create an issues filter for the event. We’re also doing some work on the Trino docs readme to help folks onboard faster.

Snag a ticket if you’re interested in participating, we hope to see you there! Our goal is to continue curating good first issues for future writers and developers.

Join the new #documentation channel on the Trino slack and favorite the Trino project on GitHub.

If you’re interested in learning more about Write the Docs or Writing Day, feel free to reach out to me (Rose Williams), Brian Olsen, or Manfred Moser on twitter or the Trino slack. You can also check out the Write the Docs slack community.

If you have an open source project that you’re interested in bringing to Writing Day, chat with me, Rose Williams, on twitter or on the Trino or Write the Doc slack communities.

14: Iceberg: March of the Trinos

2021-04-01T00:00:00+00:00

March of the Trinos! Be careful Commander Bun Bun! That Iceberg doesn't look stable!
https://joshdata.me/iceberger.html

Iceberg links

Guests

David Phillips, creator of Trino/Presto, and CTO at Starburst (@electrum32)

Release 354

Release notes discussed: https://trino.io/docs/current/release/release-354.html

Martin’s list:

Support for OAuth 2.0 in CLI
Support for MemSQL 3.2
Pushdown of ORDER BY … LIMIT for MemSQL, MySQL and SQL Server connectors
Support for time(p) in SQL Server

Manfred’s notes:

LEFT, RIGHT and FULL JOIN
Preferred write partitioning on by default (needs statistics)
Small but useful fix on Elasticsearch (single value array)
Hive connector
Fix ACID table DELETE and UPDATE - critical fix is in! Boom!
Avro format improvement
CSV and Glue metadata improvement
Iceberg - date and timestamp improvement
CREATE SCHEMA fixes in MySQL, PostgreSQL, Redshift and SQL Server
Bunch of other fixes in those connectors

Concept of the week: Apache Iceberg and the table format

The Hive table format

For the last decade or so, big data professionals’ only option to query their data was to, in some way shape or form, use the Hive model. The Hive model is very simple, but it enabled running queries over files in a distributed file system.

To accomplish this, Hive uses a metastore service which stores and manages metadata. For Hive and Trino, this metadata acts as a pointer to the files containing the data, contains the file format, and has the column structure and types. This enabled Hive to query the correct files and data within those files for a SQL query. For more information on Hive’s architecture, read the Gentle Introduction to Hive blog. After the initial model gained adoption, Hive added other features such as partitioning. It uses the directory structures of the filesystems to split the files of data partitioned on a special column into different directories. We talk about this in more depth a few episodes back.

The Hive model solved some initial issues facing engineers in big data, but there were quite a few issues with this model. It is very rigid and not able to adapt to your needs as requirements change. For example, if you started partitioning your data splitting by date and segmenting by month, that table is stuck with that partitioning forever. The only way to update it is to create a new table with your new partition values, and migrate all of your data from the old table to the new table. With the common data sizes such a migration is often a long process, sometimes even impossible. Another issue stems from the separation of data stored in the metastore and data stored in the file system. The source of many issues in Hive is caused by the Hive metastore getting out of sync. A third but not final issue, is that running operations against the metastore is a timely process when running operations like list files on more modern object storage.

As all these problems amassed over the years, clearly something needed to be done. In the last few years, a few candidate table formats have come to the forefront of data engineering trends. Examples are, Apache Iceberg, Apache Hudi, and the proprietary Databricks’ Deltalake. The goal of these systems is to modernize the old Hive data structure. To Trino, Iceberg is particularly promising due to the list of promising features like schema versioning support and hidden partitioning that made it particularly attractive. Let’s talk about some of these features in detail.

The Iceberg table format

Iceberg, is a new table format developed at Netflix that aims to replace older table formats like Hive to add better flexibility as the schema evolves, atomic operations, speed, and just dependability. To be clear, it’s not a new file format, as it still uses ORC, Parquet, and Avro, but a table format. Netflix donated Iceberg to the Apache Software Foundation and it is now a top level project!

Iceberg handles both the data on disk just like Hive, but instead it stores the metadata in manifest files on disk along with the data itself. These manifest files are AVRO files that contain table metadata that lists a subset of data files. Manifest lists are a special type of manifest file that point to other manifest files. Snapshots contain a manifest list that points to all the manifest files that belong to the snapshot. Another huge difference from Hive is that the manifest files keep track of table data at the file level as opposed to directory level that Hive uses. By doing so, Iceberg avoids having to list all files in a directory, which becomes a very common and expensive operation.

By tracking files this way, we not only get better performance from object storage, it also enables serializable isolation. This addresses the lack of consistency between the metadata and file state experienced in Hive.

One of the greater advantages to Iceberg over Hive is the in-place table evolution. This means that you can add, drop, or rename a column, as well as, reorder and update a column without any expensive refactoring of tables or moving data around and there is no adverse effects on your data or metadata.

Partition evolution and hidden partitions are particularly invaluable. In Iceberg, the partition spec is a description of how to partition data in a table consisting of a list of source columns and transforms. Once the spec is created, it generates a partition tuple that is applied uniformly to the files created with that spec. Unlike Hive, that requires you to modify and send a special column that acts as the partition value, Iceberg stores partition values unmodified. Here’s an example partition spec generated in the Java API.

PartitionSpec spec = PartitionSpec.builderFor(schema)
        .hour("event_time")
        .identity("level")
        .build();

This example creates a separate hourly partition on the event_time field and use the identity() function on level to generate another level of partitioning on the level field. If at a later time, you decide you are getting too many small files because your partitions are too small, then you can update the partition spec and Iceberg starts writing new files by the updated spec. Again, this is all without creating a new table and moving data around and all the queries return correctly. This kind of evolution is a problem with Hive.

If all that isn’t enough, you can also do time travel and version rollback with Iceberg. As we mentioned above, Iceberg keeps track of various snapshots of your data in time through manifest files. As long as you keep those older snapshots around, the files associated with those snapshots stick around as well. This allows you to move around to previous views of the data. This is useful for testing, recovery, and many other purposes. Just as you can time travel, you can make the time travel permanent by rolling back any unintended changes and deleting the undesired snapshot.

Iceberg is also able to offer fast scan planning by filtering out the metadata files that are irrelevant to the scan, and using the partition spec to only find files containing responses to the data. Iceberg filters the metadata using partition value ranges and seeing if that is contained within the files of the metadata. Then while processing the list of manifest files, Iceberg will filter files by query predicates included in the partition, then apply column stats to help prune out files that don’t match. Iceberg also uses multiple concurrent writers to speed things up as a final measure.

Saving the best for last; Iceberg is a community standard and has a full written specification which is a nice change from Hive which is an ad-hoc specification that people adhere to in some ways. There have been many issues over the years due to the variance of how the unwritten specification gets interpreted. This not only enables people to understand how to use it, but documents how others can implement the same features with an entirely different systems. Let’s wait to do a deep dive on the spec for the next episode when we bring on Ryan Blue, creator of Iceberg, to dig into these details.

PR of the week: PR 1067 Add Iceberg connector

A huge shoutout goes to Parth Brahmbhatt, a Senior Software Engineer at Netflix who created this weeks’ PR of the week.

Release 318, introduced this code that supported querying tables from Apache Iceberg in Trino. While the code existed, the Iceberg connector code wasn’t officially released or documented until a little over a year later in release 341 once the connector reached maturity.

Future development for the Trino Iceberg connector

Still, some strange artifacts that we’re still facing today in the connector. For example, if you create a table with the Iceberg Java API, it creates Iceberg tables with <table_type, ICEBERG> but Trino creates and reads with <table_type, iceberg>. See Issue 1592 for status and details. In general, we can track some of the broader changes that are being made to the Iceberg connector here.

Demo: Creating tables with Iceberg and reading the data in Trino

For this weeks’ demo, I wanted to play around with the Iceberg Java API directly. You also have the option to use Trino, Spark, or other to ingest and query the data, but I wanted to use vanilla Iceberg API’s to experience the API and hopefully solidify my learning of Iceberg concepts in the process. Make sure you follow the instructions in the repository if you don’t have Docker or Java installed.

Let’s start up a local Trino coordinator and Hive metastore. Clone this repository and navigate to the iceberg/trino-iceberg-minio directory. Then start up the containers using Docker Compose.

git clone git@github.com:bitsondatadev/trino-getting-started.git

cd iceberg/trino-iceberg-minio

docker-compose up -d

In your favorite IDE, open the files under iceberg/iceberg-java into your project and run the IcebergMain class.

This class creates a logging table if it doesn’t exist. Once you run this code, you can check to see that the table in Trino exists in the metastore under TABLE_PARAMS. But, if run SHOW TABLES IN iceberg.logging; you’ll notice that the table doesn’t show up due to the issue we discussed above.

Let’s update the TABLE_PARAMS entry in the metastore db and then query the table again.

Question of the week: Why does Trino still depend on the Hive metastore if metadata for Iceberg saves to the filesystem?

We kept the metastore as many tests run around using the metastore that exist for the Hive connector, and we want to give the Iceberg connector ample time to mature before we migrate entirely away from the metastore. We also wanted to make the metastore the initial method of use in Iceberg that got developed as most developers initially would be migrating from their existing Hive catalog, and we wanted this transition to use existing tested components.

Currently, the metastore isn’t used the same way as in Hive. Trino stores a top-level directory that points to the metadata manifest file location and other statistics around the table in the TABLE_PARAMS table of the metastore. There is a pull request created by Jack Ye to migrate away from the requirement to use the Hive metastore when using Iceberg with Trino.

Tip of the Iceberg

Last bit of some fun with Iceberg. Let’s do a little experiment called, “Will the iceberg tip?”:

Go to https://iceberg.apache.org/ and take a look at the logo.
Now go to https://joshdata.me/iceberger.html.
Draw the Apache Iceberg logo and see what happens.
Now draw the iceberg in the image above that Commander Bun Bun is on.

When drawing the iceberg like the image with Commander Bun Bun, the iceberg tips over. Careful Commander Bun Bun! It looks like the Apache logo wins! Shout out to Joshua Tauberer for the web page. Shout out to Megan Thompson-Munson for the tweet that started the page. Shout out to Barton Wright from Manfred’s team of writers for being the geek to find this. Shout out to Ali for being a good sport and setting Command Bun Bun on the iceberg.

Events, news, and various links

Come join us for the inaugural Virtual Trino meetup on April 21st in the virtual meetup group in your region!

At this meetup, the four Trino/Presto founders will be updating everyone on the state of Trino. We’ll discuss the rebrand, talk about the recent features, and discuss the trajectory of the project. Then we will host a hangout and AMA. Hope to see you all there!

Blogs

Videos

Trino Meetup Groups

Virtual
East Coast (US)
- https://www.meetup.com/trino-boston/
- https://www.meetup.com/trino-nyc/
West Coast (US)
- https://www.meetup.com/trino-san-francisco/
- https://www.meetup.com/trino-los-angeles/
Mid West (US)
- https://www.meetup.com/trino-chicago/

Latest training from David, Dain, and Martin(Now with timestamps!):

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

13: Trino takes a sip of Pinot!

2021-03-18T00:00:00+00:00

Commander Bun Bun loves sippin' on Pinot after a hard day of data exploration!

Pinot links

Guests

Xiang Fu, project management chair and committer at Apache Pinot and co-founder of stealth mode startup (@xiangfu0)
Elon Azoulay, software engineer at stealth mode startup (@ElonAzoulay)

Release 353

Release notes discussed: https://trino.io/docs/current/release/release-353.html Martin’s list:

New ClickHouse connector
Support for correlated subqueries involving UNNEST
CREATE/DROP TABLE in BigQuery connector
Reading and writing column stats in Glue Metastore
Support for Apache Phoenix 5.1

Manfred’s notes:

New geometry functions
A whole bunch of correctness and performance improvements
Env var (and hence secrets) support for RPM-based installs
Hive - performance for bucketed table inserts
Kafka - schema registry improvements
Experimental join pushdown in a bunch of JDBC connectors
Also a bunch of fixes on JDBC connectors
Quite a list of changes on the SPI - ensure to check if you have a plugin

Concept of the week: Data cubes and Apache Pinot

Before diving into Pinot, I think it’s worthwhile to discuss some theoretical background to motivate some of the use cases Pinot solves for. We cover the concept of data cubes and how they are used in traditional data warehousing to speed up queries and minimize unnecessary work on your OLAP system.

Data cubes and MOLAP (Multi-dimensional online analytics processing)

In data analytics, there are many access patterns that tend to repeat themselves over and over again. It is very common to need to split and merge data based on the date and time values. Or perhaps you ask a lot of questions based on a specific customer, or even a specific product. Answering these questions typically involves aggregation of data like sums, averages, counts, etc… Wouldn’t it make sense to cache some of these intermediary results?

A common way to visualize the columns that are commonly bucketed to some values or range of values is to show them as a cube, that is sliced up into smaller dimensions. This actually derives from the traditional form of OLAP, multi-dimensional OLAP (MOLAP).

This cube represents a caching of data aggregations that are grouped by commonly used dimensions. For example, the displayed cube would be the pre-aggregation of the following query:

SELECT part, store, customer, COUNT(*)
FROM cube_table
GROUP BY part, store, customer

If we want to get the data for a particular customer, we can take a “slice” of that cube by specifying a particular customer. The following query returns the green square above from our cube.

SELECT part, store, COUNT(*)
FROM cube_table
WHERE customer = "Bob"
GROUP BY part, store

Now what if we want to flatten one of the dimensions? While this can be managed with a GROUP BY as before, but depending on the system may ignore any cached data and scan over all the rows. For this, SQL reserved a special set of keywords around cubes. We won’t dive into that in depth now, but for our current goal of flattening a dimension, we can use ROLLUP. Using the keyword ROLLUP indicates to the underlying system that you intend to aggregate over the pre-materialized data rather than scan over all rows to compute again. This gives you the total count of parts per store using the counts of the data cube.

SELECT part, store, COUNT(*)
FROM cube_table
GROUP BY ROLLUP (part, store)

Now, although we used simple counts, you can precompute a lot of other aggregate data like sums, min, max, percentile, etc… These can service various queries that are commonly queried and don’t require a new computation every time. That is the goal of MOLAP and data cubes.

Apache Pinot

Now let’s move on to Apache Pinot. It is a realtime distributed OLAP datastore, designed to answer OLAP queries with low latency. Although there may be a lot of words there that overlap with the Trino description, the key differentiators are realtime and low latency. Trino performs batch processing and is not a realtime system where Pinot is great for ingesting data in batch or stream. The other key word, low latency could technically apply to both Pinot and Trino but in the context of realtime subsecond latency, Trino is slow compared to Pinot. This is due to the specialized indexes that Pinot uses to store the data that we cover shortly. Importantly, another big distinction is that Trino does not store any data itself. It is purely a query engine. Xiang has a really great summary slide that easily shows the strengths of each system and why they work so well together.

While Trino is not as fast as Pinot, it is able to handle a broader set of use cases like performing broad joins over open data formats in data lakes. This is what motivated work on the Trino Pinot connector. You can have the speed of Pinot, while having the flexibility of Trino.

Now that you understand the common use case for Pinot, it’s important to know the main goals of Pinot.

One primary goal is the keep response times of aggregation queries predictable, regardless of how many requests Pinot handles. As it scales you won’t see a degradation of performance. This is achieved by Pinot’s custom indices and storage formats.
Another goal of Pinot is to revive the value of data from a historical context. Data reaches a particular point in its lifecycle where it becomes less valuable as it ages. While all data is able to add some value no matter what the age, there’s a tradeoff of scanning multiple rows to glean information from antiquated data. Pinot aims to remove this tradeoff as most questions around historical data are queried in aggregate and this can be summarized and queried at a low cost.
The final goal is to manage dimension explosion. One of the difficulties with managing a system that caches all this historic data is handling dimension explosion that occurs when you cache every possible combination of data. Above we showed a three-dimensional cube, but Pinot can handle a much larger number of dimensions. However, just because you can, doesn’t mean you should. Pinot has a lot of smarts around using the data, and some good defaults to determine the maximum number of buckets per dimension. This helps balance an exploding cache yet maintains fast results.

Pinot architecture

We just covered Pinot theory and goals, let’s take a quick look at the architecture.

A Pinot cluster consists of a controller, broker, server, and optionally a minion to purge data.

PR of the week: PR 2028 Add Pinot connector

Our guest on the show today, Elon Azoulay, is the author of this PR, so we can ask him all about it now.

Basic configuration (Pinot controller url, Pinot segment limit)
2 ways to connect to Pinot - broker and server, and their tradeoffs (i.e. segment limit for server)
Talk about broker passthrough queries, i.e select * from “select … from pinot_table …
Server limit that we eventually want to eliminate broker query parsing
- How to crash the Pinot server.
- Streaming server alternative

Future Pinot features in Trino

Aggregation pushdown (PR 6069)

Pinot insert (PR 7162)
Pinot create table (PR 7164)
Pinot drop table (PR 7160)
Pinot 6 (PR 7163)
Pinot filter clause parsing (see question of the week below)

Demo: Pinot batch insertion and query using Trino Pinot connector

To put this PR to the test, we set up a Pinot cluster using Docker Compose.

To load the data, we’re going to use a simple batch import, but you can also insert the data in a stream using Kafka.

Let’s start up the Pinot cluster along with the required Zookeeper and Kafka broker. Clone this repository and navigate to the pinot/trino-pinot directory.

git clone git@github.com:bitsondatadev/trino-getting-started.git

cd community_tutorials/pinot/trino-pinot

docker-compose up -d

To do batch insert, we will stage a csv file to read the data in. Create a directory underneath a temp folder locally and then submit this to Pinot.

mkdir -p /tmp/pinot-quick-start/rawdata

echo "studentID,firstName,lastName,gender,subject,score,timestampInEpoch
200,Lucy,Smith,Female,Maths,3.8,1570863600000
200,Lucy,Smith,Female,English,3.5,1571036400000
201,Bob,King,Male,Maths,3.2,1571900400000
202,Nick,Young,Male,Physics,3.6,1572418800000" > /tmp/pinot-quick-start/rawdata/transcript.csv

In order for Pinot to understand the CSV data, we must provide it a schema.

echo "{
    \"schemaName\": \"transcript\",
    \"dimensionFieldSpecs\": [
      {
        \"name\": \"studentID\",
        \"dataType\": \"INT\"
      },
      {
        \"name\": \"firstName\",
        \"dataType\": \"STRING\"
      },
      {
        \"name\": \"lastName\",
        \"dataType\": \"STRING\"
      },
      {
        \"name\": \"gender\",
        \"dataType\": \"STRING\"
      },
      {
        \"name\": \"subject\",
        \"dataType\": \"STRING\"
      }
    ],
    \"metricFieldSpecs\": [
      {
        \"name\": \"score\",
        \"dataType\": \"FLOAT\"
      }
    ],
    \"dateTimeFieldSpecs\": [{
      \"name\": \"timestampInEpoch\",
      \"dataType\": \"LONG\",
      \"format\" : \"1:MILLISECONDS:EPOCH\",
      \"granularity\": \"1:MILLISECONDS\"
    }]
}" > /tmp/pinot-quick-start/transcript-schema.json

Now we are almost ready to create the table. Instead of adding table configurations as part of the SQL command, Pinot enables you to store table configurations as a file. This is a nice option that decouples the DDL which makes for simpler scripting in batch setups.

echo "{
    \"tableName\": \"transcript\",
    \"segmentsConfig\" : {
      \"timeColumnName\": \"timestampInEpoch\",
      \"timeType\": \"MILLISECONDS\",
      \"replication\" : \"1\",
      \"schemaName\" : \"transcript\"
    },
    \"tableIndexConfig\" : {
      \"invertedIndexColumns\" : [],
      \"loadMode\"  : \"MMAP\"
    },
    \"tenants\" : {
      \"broker\":\"DefaultTenant\",
      \"server\":\"DefaultTenant\"
    },
    \"tableType\":\"OFFLINE\",
    \"metadata\": {}
}" > /tmp/pinot-quick-start/transcript-table-offline.json

Once you create these three files and verify that docker containers are running, we can now run the Add Table command:

docker run --rm -ti \
    --network=trino-pinot_trino-network \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-batch-table-creation \
    apachepinot/pinot:latest AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-offline.json \
    -controllerHost pinot-controller \
    -controllerPort 9000 -exec

Now that the table exists, we can see it in the Pinot web UI. Let’s insert some data using a batch job specification:

echo "executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/tmp/pinot-quick-start/rawdata/'
includeFileNamePattern: 'glob:**/*.csv'
outputDirURI: '/tmp/pinot-quick-start/segments/'
overwriteOutput: true
pinotFSSpecs:
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: 'csv'
  className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
  configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
  tableName: 'transcript'
  schemaURI: 'http://pinot-controller:9000/tables/transcript/schema'
  tableConfigURI: 'http://pinot-controller:9000/tables/transcript'
pinotClusterSpecs:
  - controllerURI: 'http://pinot-controller:9000'" > /tmp/pinot-quick-start/docker-job-spec.yml

Now run this batch job by running the LaunchDataIngestionJob task.

docker run --rm -ti \
    --network=trino-pinot_trino-network \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-data-ingestion-job \
    apachepinot/pinot:latest LaunchDataIngestionJob \
    -jobSpecFile /tmp/pinot-quick-start/docker-job-spec.yml

We modified this demo from the tutorials available on the Pinot website:

Question of the week: Why does my passthrough query not work in the Pinot connector?

The passthrough queries may be failing due to upper case constants that need to be surrounded with UPPER(). For example 'Foo' in this query would be rendered as all lowercase once it is passed to Pinot:

SELECT * 
FROM "SELECT col1, col2, COUNT(*) FROM pinot_table WHERE col2 = 'FOO' GROUP BY col1, col2"

The fix is to pass 'Foo' to UPPER() in the passthrough query.

SELECT * 
FROM "SELECT col1, col2, COUNT(*) FROM pinot_table WHERE col2 = UPPER('FOO') GROUP BY col1, col2"

It could also be due to parsing of functions in filters. A workaround is to put the filter outside of the double quotes, which can work in some cases. For example, column table names can be mixed case as the connector will auto resolve them. If there are mixed case constants would not work with upper():

SELECT * 
FROM "SELECT col1, col2, COUNT(*) FROM pinot_table WHERE col2 = 'Foo' GROUP BY col1, col2"

The filter can be hoisted into the outer query:

SELECT * 
FROM "SELECT col1, col2, COUNT(*) FROM pinot_table GROUP BY col1, col2" WHERE col2 = 'Foo';

There is ongoing work to improve this parsing: Pinot filter clause parsing (PR 7161).

Events, news, and various links

Blogs

Trino Meetup Groups

Virtual
- https://www.meetup.com/trino-americas/
- https://www.meetup.com/trino-emea/
- Trino APAC - Coming Soon
East Coast
- https://www.meetup.com/trino-boston/
- https://www.meetup.com/trino-nyc/
West Coast
- https://www.meetup.com/trino-san-francisco/
- https://www.meetup.com/trino-los-angeles/
Mid West
- https://www.meetup.com/trino-chicago/

Latest training from David, Dain, and Martin(Now with timestamps!):

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Introducing new window features

2021-03-10T00:00:00+00:00

In Trino, we are thrilled to get feedback and feature requests from our fantastic community, and we’re tirelessly motivated to meet the expectations! The SQL specification is another source of inspiration. From time to time, we go through those encrypted scrolls to give you a new feature that you didn’t even know you needed!

Recently, there was a push in Trino to extend support for window functions. In this post, we explain the complexities of window function, and describe a couple of our recent additions. If “window” doesn’t sound familiar, read on. Already a window expert? Skip to what’s new.

A window is the structure you run your window function OVER. It has three components:

partitioning
ordering
frame

You use partitioning to break your input data into independent chunks. Ordering is to order rows within the partition. And frame is a kind of “sliding window”. For every processed row, the frame encloses a certain portion of the sorted partition. Your window function processes this portion and yields the result for the row.

A “running average” is one simple example:

SELECT avg(totalprice) OVER (
    PARTITION BY custkey
    ORDER BY orderdate
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM orders

For a particular customer identified by custkey, it sorts their orders by date and computes a sequence of average prices since the beginning up to each consecutive entry. The window frame for a row includes all rows from the start up to and including that row.

According to standard SQL, there are 3 ways to specify the frame. The first way is ROWS (like in the example). With ROWS, you can specify frame bounds by a physical offset from the current row. While ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW means “between the beginning of the partition and the current row”, you can also specify precisely where the frame starts and ends, for example with: ROWS BETWEEN 10 PRECEDING AND 5 FOLLOWING.

RANGE is a more complicated way of defining frame on ordered data. It does not rely on physical offset (in rows), but on logical offset (in value). That is, the frame includes rows where the value is within a certain range from the value in the current row.

Until recently, Trino only supported RANGE in limited cases. You could use RANGE UNBOUNDED PRECEDING, CURRENT ROW and UNBOUNDED FOLLOWING:

UNBOUNDED PRECEDING includes all rows since the partition start,
UNBOUNDED FOLLOWING includes all rows until the partition end,
CURRENT ROW is trickier. It includes all rows where values of the sort key are the same as in the current row. We call them a peer group.

It’s time to introduce the first new feature:

Full support for frame type RANGE

Since version 346, it is possible to specify RANGE with an offset value. The frame includes all rows whose value is within this range from the current row.

Let’s modify our example:

SELECT avg(totalprice) OVER (
    PARTITION BY custkey
    ORDER BY orderdate
    RANGE BETWEEN interval '1' month PRECEDING AND CURRENT ROW)
FROM orders

Now, for every row, we get the average price from the preceding month. Note that the offset interval '1' month applies to orderdate, which is the sorting column.

Of course, we don’t have to order by date. The sorting column can be of any numeric or date/time type, and the offset must be compatible. Also, the offset doesn’t have to be a literal. It can come in another column of a table or, generally, it can be any expression, as long as the type matches.

A frame of type RANGE does not quite fit in the abstraction of a “sliding window”. Frames can be bigger or smaller depending not only on the offset values but also on the actual input data. A long series of similar entries can produce a huge frame, while a gap in input values can result in an empty frame.

For illustration, imagine a group of students, and the results of some test they took. Our table has two columns: student_id and result, which is the number of points. For each student, let’s find how many students did better by 1 to 2 points:

WITH students_results(student_id, result) AS (VALUES
    ('student_1', 17),
    ('student_2', 16),
    ('student_3', 18),
    ('student_4', 18),
    ('student_5', 10),
    ('student_6', 20),
    ('student_7', 16))
SELECT
    student_id,
    result,
    count(*) OVER (
        ORDER BY result
        RANGE BETWEEN 1 FOLLOWING AND 2 FOLLOWING) AS close_better_scores_count
FROM students_results;

 student_id | result | close_better_scores_count
------------+--------+---------------------------
 student_5  |     10 |                         0
 student_7  |     16 |                         3
 student_2  |     16 |                         3
 student_1  |     17 |                         2
 student_3  |     18 |                         1
 student_4  |     18 |                         1
 student_6  |     20 |                         0
(7 rows)

Note that the frame does not contain the current row. For a particular student, it only includes students with better results, and not themselves. For the unfortunate student_5, there are no students with similar test results. The frame is also empty for the lucky student_6 who scored the most points.

Besides ROWS and RANGE, there is another way to specify the frame on ordered data. And yes, Trino supports this mechanism! Let me introduce the second of our recent additions:

Support for frame type GROUPS

This feature, added in version 346, allows you to include or exclude the whole peer groups of rows in ordered data.

For illustration, let’s consider again the students_results table. For each student, let’s find the gap between their result and the result of a student (or students) who did slightly better.

WITH students_results(student_id, result) AS (VALUES
    ('student_1', 17),
    ('student_2', 16),
    ('student_3', 18),
    ('student_4', 18),
    ('student_5', 10),
    ('student_6', 20),
    ('student_7', 16))
SELECT
    student_id,
    result,
    max(result) OVER (
        ORDER BY result
        GROUPS BETWEEN CURRENT ROW AND 1 FOLLOWING) - result AS gap_till_better_score
FROM students_results;

 student_id | result | gap_till_better_score
------------+--------+-----------------------
 student_5  |     10 |                     6
 student_7  |     16 |                     1
 student_2  |     16 |                     1
 student_1  |     17 |                     1
 student_3  |     18 |                     2
 student_4  |     18 |                     2
 student_6  |     20 |                     0
(7 rows)

The window function for each student returns the closest better result. The frame of type GROUPS used here, includes all entries equal to the current entry in terms of points (that is the student’s peer group), and the next group.

In frames of type GROUPS, like in other frame types, the offset doesn’t have to be constant. It can be any expression, as long as its type is exact numeric with scale 0. Simply put, we can skip any integer number of groups.

Under the covers

How do we deal with finding the frame bounds effectively? With ROWS it’s easy. We only need to skip a determined number of rows forward or backwards.

With RANGE, we need to examine the actual values to see if they fall within the given range. Our approach is optimized for the case where the offset values are constant for all rows. Our solution involves caching frame bounds computed for the preceding row, and using them as the starting point to find frame bounds for the current row. Ideally, we never have to move the frame bounds back as we process subsequent rows. In such a case, the amortized cost of frame bound calculations per row is constant.

Our strategy for determining frame bounds for GROUPS is similar. We cache the frame bounds computed for the preceding row and use them as the starting point for the current row. If the frame offset is constant, frame bounds slide from one peer group to another every time the processed row leaves one peer group and enters the next one.

Support for WINDOW clause

As all the preceding examples show, a window function is a big chunk of syntax. What if we wanted to use several window functions over the same window? Say, we need an average price and a total price from the preceding month. And the top price. Does it have to look like the below?

SELECT
    avg(totalprice) OVER (
        PARTITION BY custkey 
        ORDER BY orderdate
        RANGE BETWEEN interval '1' month PRECEDING AND CURRENT ROW),
    sum(totalprice) OVER (
        PARTITION BY custkey 
        ORDER BY orderdate
        RANGE BETWEEN interval '1' month PRECEDING AND CURRENT ROW),
    max(totalprice) OVER (
        PARTITION BY custkey 
        ORDER BY orderdate
        RANGE BETWEEN interval '1' month PRECEDING AND CURRENT ROW)
FROM orders

Well, no more. Starting with Trino 352, you can predefine a window specification, and then use it or redefine it wherever you need. This is thanks to the third of our new additions: support for WINDOW clause.

Technically speaking, the WINDOW clause is part of the FROM clause:

SELECT …
    FROM …
        WHERE …
        GROUP BY …
        HAVING …
        WINDOW …
ORDER BY …
OFFSET …
LIMIT / FETCH …

In the WINDOW clause, you can define any number of named windows. Then you can simply refer to them by their names in the SELECT list or an ORDER BY clause.

Let’s check how the WINDOW clause helps with our example query:

SELECT 
	avg(totalprice) OVER w,
	sum(totalprice) OVER w,
	max(totalprice) OVER w
FROM orders
WINDOW w AS (
    PARTITION BY custkey
    ORDER BY orderdate
    RANGE BETWEEN interval '1' month PRECEDING AND CURRENT ROW)

To be even more concise, the WINDOW clause allows you to define more specialized windows from existing window definitions:

WINDOW 
	w1 AS (PARTITION BY custkey),
	w2 AS (w1 ORDER BY orderdate),
	w3 AS (w2 RANGE BETWEEN interval '1' month PRECEDING AND CURRENT ROW)

Alternatively you can define the window only partially and then complete it where it’s used:

SELECT 
	avg(totalprice) OVER (w ROWS BETWEEN 10 PRECEDING AND CURRENT ROW) AS recent_average,
	sum(totalprice) OVER (w ROWS BETWEEN CURRENT ROW AND 10 FOLLOWING) AS next_buys,
FROM orders
    WINDOW w AS (PARTITION BY custkey ORDER BY orderdate)

There are some ANSI rules, though, you need to follow when redefining windows:

PARTITION BY is only allowed in the base definition,
ORDER BY can only be specified once in the named windows reference chain,
frame can only be specified in the final definition.

In case you wonder, there’s no need to worry if some predefined windows are eventually unused. Unused windows do not affect the efficiency of your query execution. Partitioning, sorting and frame bound computations are costly operations. That’s why we made sure that unused window parts do not appear in the query plan.

There’s one last detail about the WINDOW clause that needs clarification. The columns referenced in the WINDOW clause are columns of the input table. In the following example, country_code is clearly a column of the table countries:

... FROM countries WINDOW w AS (ORDER BY country_code)

Obvious enough. Why am I telling this?

Window functions can be used in two different clauses of a query, SELECT and ORDER BY. With the ORDER BY clause, there is a rule that column references used there refer to the output table rather than the input table. Consider this query:

WITH countries(country_code) AS (VALUES 'pol', 'CAN', 'USA')
SELECT upper(country_code) AS country_code
    FROM countries
    WINDOW w AS (ORDER BY country_code)
ORDER BY row_number() OVER w

Window w is used in the ORDER BY clause. So, does the window’s ordering use the original country_code column from the input table, or does it “see” the uppercased country_code from the output table?

The SQL spec is clear about it: a column reference in the named window always refers to the original column, no matter where you use this window. In the example, the result is ordered according to the original values: lowercase pol after uppercase USA:

As expected:

 country_code
--------------
 CAN
 USA
 POL
(3 rows)

And here the story ends. Thanks for your attention! I hope you enjoy Trino’s new superpowers. In case of questions or issues — you know where to find us. More goodies are on the way, so stay tuned! How about regex matching on tables?

12: Trino gets super visual with Apache Superset!

2021-03-04T00:00:00+00:00

Guests

Srini Kadamati, Developer Advocate at Preset (@SriniKadamati)
Dr. Beto Dealmeida, Staff Engineer at Preset (@dealmeida)

Release 353 – Almost

353 is right around the corner. Last show we said this would be a small release. While there was a correctness issue we resolved, there didn’t seem to be much demand to get it out quick as we initially thought. So it was decided to continue adding more features to 353. It should be coming out shortly!

Concept of the week: Trino clients, Python, and Apache Superset

What is the general data flow from a connected data source?

Trino workers request data from the data source with specific connector
Workers process data and send it to the coordinator
Coordinator does final processing
Supplies the data via HTTP / REST stream to requestor
Requestor is a “client” such as JDBC driver, or Trino CLI
Client translates data further and provides to application (Java application using JDBC driver) or user interface/directly to user (output in CLI)
User views part of data and scrolls down
Client requests more data from coordinator via HTTP / REST (and see above)

What clients are provided by Trino project?

What other clients are there?

ODBC driver from Starburst
Various other clients from the open source community
- R
- NodeJS/Javascript

What happens in the Python world?

Disclaimer: I am not a Pythonista or Pythoneer.

DB-API 2.0
- PEP 249 https://www.python.org/dev/peps/pep-0249/
- Python standard library
trino-python-client
- Wraps complexity of Trino HTTP / REST
- Supports authentication and such
- Provides DB API endpoints / implementation
- Preferred method to query Trino
SQLAlchemy https://www.sqlalchemy.org/
- SQL toolkit
- ORM mapper
- Widely used, eg. in Apache Superset
- Supports dialects
PyHive
- Not really a SQL wrapper
- Aimed at Hive QL
- Only kind of useful for Trino, limited compatibility
JDBC driver (Java !) and PySpark
- Possible, but a hack really
PyJDBC
- Wraps DB API around any JDBC driver
- Kind of a hack since it goes through JDBC to HTTP, when Trino python client does the same more directly
PyODBC
- Similar hack to PyJDBC
Potentially also possible to talk to via HTTP directly
- That’s like reimplementing the trino-python-client
- Also see question of the week later

Beyond that, it will vary from application to application.

Let’s find out from our guests how this hangs together in Apache Superset, since it is using Python.

PR of the week: Superset PR 13105 feat: first step native support Trino

In this week’s pull request https://github.com/apache/superset/pull/13105 that was graciously added by dungdm93.

The first thing we need to understand about this addition is the concept of a database engine in Superset. A database engine handles a lot of the custom interactions between various databases and maps them to the interface that Superset understands. If certain concepts are missing in a certain database, like time granularity or SQL syntax, the database engine for that database indicated to Superset that this is not available. As a result the option does not show in Superset, or a concise error message is reported. By default, database engines use the base.py methods, but each engine, like Trino, add the custom mappings with a specific engine implementation, trino.py.

The pull request adds a few basic custom changes to enable Trino usage with Superset. One change ensures that complex timestamps from Trino are truncated to a format that Superset is able to support during time aggregation operations.

This opens a vast amount of functionality for using Trino and Superset. We wanted to feature this because it goes to show how a small code change, even one that is not in the Trino repository, can have a vast effect on those using Superset and Trino.

Thank you so much to dungdm93 for making this change and further linking Trino into a fantastic project like Apache Superset!

Demo: Superset querying Trino to create visualization dashboard

To put this PR to the test, we need to connect Apache Superset to Trino as our datasource.

First, you need to follow these instructions to install Docker (if you don’t already have it installed), and then clone the Superset repository:

$ git clone https://github.com/apache/superset.git

Next, you need to set up the database driver for Trino. Navigate to the root directory of the local Superset repository you just downloaded and run the following.

echo "sqlalchemy-trino" >> ./docker/requirements-local.txt

This tells Superset scripts to install the sqlalchemy-trino library upon startup. We know the name by looking up the Trino driver page for the driver documentation and how to use the connection string. If you were to install these directly on a Superset node, you would refer to this database drivers page.

Now run the following command to start up Superset and make sure you’re in the root folder of the repo.

docker-compose -f docker-compose-non-dev.yml up.

After Superset is running, you need to start Trino as well. We did so using a separate docker-compose app.

As soon as this is done, you can navigate to Superset’s homepage http://localhost:8088 and scroll to the Data > Databases menu.

Click the +Database button.

Set Name to “Trino” and URI to trino://trino@host.docker.internal:8080 and click Add.

If you want to allow CTAS, CVAS, or DML operations, you’ll want to edit the Database you just created and click on the SQL LAB SETTINGS tab and select in the operations you want to allow.

Connection settings that allows for creation/manipulation of tables.

You should be able to verify under SQL Lab > SQL Editor and run a SELECT query.

We cover adding charts and creating a dashboard in the show. We linked some blogs from Preset around how to do a lot of this workflow in great detail. Find these blogs linked below! Here’s a taste of what we created in Superset with some BTS On-Time : Reporting Carrier On-Time Performance (1987-present) and Covid Cases reported by the CDC.

COVID-19 and flights data dashboard!

Question of the week: How do I use the Trino REST api?

I want to just use the REST API of Trino. Where is the documentation? How do I do that?

The short answer:

Don’t do that. Use a Trino client instead.

The long answer:

The typical desired use case for using the REST API is to run a query and get the result. However that part of the API is not really a traditional REST API (HTTP POST, HTTP GET). That just doesn’t work for large datasets to be returned. Instead, it is a constant open connection and stream of data and interaction between client and Trino.

The clients take care of all this complexity and provide it in standard API for the various platforms (JDBC, …). Use the clients!

And if there is no client, or the existing client is not good enough. Create an open source one or contribute improvements.

The exception:

There are other simple, pure REST API endpoints that you can use just straight out of the box. Try http://localhost:8080/v1/info or http://localhost:8080/v1/status. You could use those for a liveness/readiness probe in k8s or for cluster status display. By the way, the Web UI uses those and others..

Last note

If you really can’t help yourself, here are some docs. https://github.com/trinodb/trino/wiki/HTTP-Protocol

Events, news, and various links

Blogs

Latest training from David, Dain, and Martin(Now with timestamps!):

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

11: Dynamic filtering and dynamic partition pruning

2021-02-18T00:00:00+00:00

Release 352

Release notes discussed: https://trino.io/docs/current/release/release-352.html

No new release to discuss yet except that 353 will be around the corner to fix a low-impact correctness issue that came out in 352 https://github.com/trinodb/trino/pull/6895.

Concept of the week: Dynamic filtering

So we’ve covered a lot on the Trino Community Broadcast to build our way up to tackling this pretty big subject in the space called dynamic filtering. If you haven’t seen episodes five through nine, you may want to go back and watch those for some context for this episode. Episode eight actually diverted to the Trino rebrand so we won’t discuss that one. For the recap;

In episode five, we spoke about Hive partitions. In order to save you time when you run a query, Hive stores data under directories named by the values of the data written underneath that directory. Take this directory structure for the orders table partitioned by the orderdate field:

orders
├── orderdate=1992-01-01
│   ├── orders_1992-01-01_1.orc
│   ├── orders_1992-01-01_2.orc
│   ├── orders_1992-01-01_3.orc
│   └── ...
├── orderdate=1992-01-02
│   └── ...
├── orderdate=1992-01-03
│   └── ...
└── ...

When querying for data under January 1st, 1992, according to the Hive model, query engines like Hive and Trino will only scan ORC files under the orders/orderdate=1992-01-01 directory. The idea is to avoid scanning unnecessary data by grouping rows based on a field commonly used in a query.

In episode six and seven, we discussed a bit about how a query gets represented internally to Trino once you submit your SQL query. First, the Parser converts SQL to an abstract syntax tree (AST) format. Then the planner generates a different tree structure called the intermediate representation (IR) that contains nodes representing the steps that need to be performed in order to answer the query. The leaves of the tree get executed first, and the parents of each node are dependent on the action of its child completing before it can start. Finally, the planner and cost-based-optimizer (CBO) runs various updates on the IR to optimize the query plan until it is ready to be executed. To sum it all up, the planner and CBO generate and optimize the plan by running optimization rules. Refer to chapter four in Trino: The Definitive Guide pg. 50 for more information.

In episode nine, we discussed how hash-joins work by first drawing a nested-loop analogy to how joins work. We then discussed how it is advantageous to read the inner loop into memory to avoid a lot of extra disk calls. Since it is ideal to read an entire table into memory, you likely want to make sure the table that is built in memory is the smaller size of the two tables. This smaller table called the build table. The table that gets streamed is called the probe table. We discussed a bit how hash-joins work which is a common mechanism to execute joins in a distributed and parallel fashion.

Another nomenclature akin to build table and probe tables are dimension and fact table, respectively. This nomenclature comes from the star schema from data warehousing. Typically, there are large tables called fact tables would live at the center of the schema. These tables typically have many foreign keys, and a bit of quantitative or measuarable columns of the event or instance. The foreign keys connect these big fact tables to smaller dimension tables that, when joined, provide human readable context to enrich the recordings in the fact table. The schema ends up looking like a star with the fact table at the center. In essence, you just need to remember when someone is describing a fact table they are saying it is a bigger table that is likely going to end up on the probe side of a join, where a dimension is more likely a candidate to fit into memory on the build side of a join.

So let’s get onto the dynamic filtering shall we? First, let’s cover a few concepts about dynamic filtering, then compare some variations of this concept.

Dynamic filtering takes advantage of joins with big fact tables to smaller dimension tables. What makes this filtering different from other types of filtering is that you are using the smaller build table that is loaded at query time to generate a list of values that exist in the join column between the build table and probe table. We know that only values that match these criteria are going to be returned from the probe side, so we can use this dynamically generated list as a pushdown predicate on the join column of the probe side. This means we are still scanning this data, but only sending the subset that answers the query. We can look at the blog written for the original local dynamic filtering implementation by Roman Zeyde for more insights on the original implementation for dynamic filtering before Raunaq’s changes.

Local dynamic filtering is definitely beneficial as it allows skipping unnecessary stripes or row-groups in the ORC or Parquet reader. However, it works only for broadcast joins, and its effectiveness depends upon the selectivity of the min and max indices maintained in ORC or Parquet files. What if we could prune entire partitions from the query execution based on dynamic filters? In the next iteration of dynamic filtering, called dynamic partition pruning, we do just that. We take advantage of the partitioned layout of Hive tables to avoid generating splits on partitions that won’t exist in the final query result. The coordinator can identify partitions for pruning based on the dynamic filters sent to it from the workers processing the build side of join. This only works if the query contains a join condition on a column that is partitioned.

With that basic understanding, let’s move on to the PR that implement dynamic partition pruning!

PR of the week: PR 1072 Implement dynamic partition pruning

In this week’s pull request https://github.com/trinodb/trino/pull/1072 we return with Raunaq Morarka and Karol Sobczak. This PR effectively brings in the second iteration of dynamic filtering, dynamic partition pruning, where instead of relying on local dynamic filtering we collect dynamic filters from the workers in the coordinator and prune out extra splits that aren’t needed with the partition layout of the probe side table. A query like this for example, seen in Raunaq’s blog about dynamic partition pruning shows that if we partition store_sales on ss_sold_date_sk we can take advantage of this information by sending it to the coordinator.

SELECT COUNT(*) FROM 
sales JOIN items ON sales.item_id = date_dim.items.id
WHERE items.price > 1000;

Below we show how the execution of this would look in a distributed manner if you partitioned the sales table on item_id. This is a visual reference for those listening in on the podcast:

1:
Query is sent to the coordinator to be parsed, analyzed, and planned.

2:
All workers get a subset of the items (build) table and each worker filters out items with price > 1000.

3:
All workers create dynamic filter for their item subset and send it to the coordinator.

4:
Coordinator uses dynamic filter list to prune out splits and partitions that do not overlap with the DF and submits splits to run on workers.

5:
Workers run splits over the sales (probe) table.

6:
Workers return final rows to be assembled into the final result on the coordinator.

PR Demo: PR 1072 Implement dynamic partition pruning

For this PR demo, we have set up one r5.4xlarge coordinator and four r5.4xlarge workers in a cluster. We have a sf100 size tpcds dataset. We will run some of the TPC-DS queries and perhaps a few others.

The first query we run through in the TPC-DS queries was query 54. With this query, we are using the hive catalog pointing to AWS S3 and AWS Glue as our metastore. We initially disable dynamic filtering then compare it to the times when dynamic filtering is enabled. With dynamic filtering we find the query to run at about 92 seconds, where with dynamic filtering it runs for 42 seconds. We see similar findings for the semijoin we execute below and discuss some implications of how the planner actually optimizes the semijoin into an inner join.

/* turn dynamic filtering on or off to compare */
SET SESSION enable_dynamic_filtering=false;

SELECT ss_sold_date_sk, COUNT(*) from store_sales WHERE ss_sold_date_sk IN (
  SELECT ws_sold_date_sk FROM (
    SELECT ws_sold_date_sk, COUNT(*) FROM web_sales GROUP BY 1 ORDER BY 2 LIMIT 100
  )
)
GROUP BY 1

Events, news, and various links

Blogs

Upcoming events

Big Data Technology Warsaw Summit - Workshop Feb 23 - 24 https://bigdatatechwarsaw.eu/agenda/
Big Data Technology Warsaw Summit - Conference Feb 25 - 26 https://bigdatatechwarsaw.eu/agenda/

Past Events

Starburst Datanova - on demand https://www.starburst.io/info/datanova/

Latest training from David, Dain, and Martin(Now with timestamps!):

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

10: Naming the bunny!

2021-02-04T00:00:00+00:00

Release 352

Release Notes discussed: https://trino.io/docs/current/release/release-352.html At the time of recording 352 was not out yet. We will discuss a few of the changes coming down the pipeline to look forward to!

Naming our new bunny!

That’s right, you submitted your names, and we are now happy to announce the top names that were chosen, and we will choose the name by a community poll.

The running names are:

Lepi: short for Lepus, the constellation under Orion that is in the shape of a bunny and said to be chased by Orion or Orion’s dogs. They cannot catch it because the bunny is fast https://en.wikipedia.org/wiki/Lepus_(constellation).
Neut: early name used by community members through hearsay to unofficially name the bunny until it had a real name. This name, which is a portmanteau when combined with Trino (Neut-Trino) became popular among a few members.
Nu: math symbol, with a similar prefix use of Nu + Trino to refer to the neutrino origins. Also in physics nu represents any of three kinds of neutrino in particle physics.
Commander Bun Bun: a name suggested by a community member’s child who loves the bunny!

Events, news, and various links

Blogs

Upcoming events

Feb 9 - Feb 10 http://starburstdata.com/datanova

Latest training from David, Dain, and Martin(Now with timestamps!):

Presto® Summit Series - Real world usage

Podcasts:

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

9: Distributed hash-joins, and how to migrate to Trino

2021-01-21T00:00:00+00:00

Release 351

Release Notes discussed: https://trino.io/docs/current/release/release-351.html

This release was really all about renaming everything from a client perspective to use Trino instead of Presto. Manfred will cover all the work that was done to do this for the release

Question of the week: How do I migrate from presto releases earlier than 350 to Trino releases 351?

https://trino.io/blog/2021/01/04/migrating-from-prestosql-to-trino.html

Concept of the week: Distributed Hash-join

Joins are one of the most useful and powerful operations performed by databases. There are many approaches to joining data. Various types of indices can facilitate joins. The order in which a join gets executed can vary depending on geographic distribution of the data, selectivity of the query where the fewer rows that get returned from a query the higher the selectivity, and the information available from indexes and table statistics can inform an execution engine how to form a query. One thing that stays consistent about virtually every query engine in the world is that they occur over two tables at a time no matter how many tables exist in the query. Some joins may occur in parallel but any given join will only involve two tables.

If you wrote a simple program that did what a join does, it might look something like a nested loop:

public class CartesianProductNestedLoop {
    public static void main(String[] args) {
        int[] outerTable = {2, 4, 6, 8, 10, 12};
        int[] innerTable = {1, 2, 3, 4};

        for (int o : outerTable) {
            for (int i : innerTable) {
                System.out.println(o + ", " + i);
            }
        }
    }
}

Since there is no predicate such as something you would see in a WHERE clause, the join returns the cartesian product of these two tables. It is useful also to portray these joins in relation algbegra. For example, the join above is written as $O \times I$ where $O$ is the outer table and $I$ is the inner table. $\times$ indicates that the join we are using is the cartesian product as we see below. Another useful way to view this is to visualize the join as a graph.

NOTE: When using relational algebra or using a graph to represent a join, it is convention that the table in the outer loop of this join is always shown on the left. This distinction becomes important as you will see below.

Here is the output from the cartesian product join above.

Notice also that we are treating these tables the same since we have to read each of the values to print out the cartesian product it doesn’t make a difference which table is the inner table and which is the outer yet. We could swap these tables for inner and outer and still get the same performance of $O (n^2)$.

Now, what if you did have some criteria that filtered out some rows that get returned from this product. Since it is quite common to join tables by an id, the most common criteria for a join is that the values are equal since values in rows with matching ids are related. Initially we can get away with just adding an if statement, print when true, and be done with it. Let’s do that.

public class NaturalJoinNestedLoop {
    public static void main(String[] args) {
        int[] outerTable = {2, 4, 6, 8, 10, 12};
        int[] innerTable = {1, 2, 3, 4};

        for (int o : outerTable) {
            for (int i : innerTable) {
                if(o == i){
                    System.out.println(o + ", " + i);
                }
            }
        }
    }
}

Lets assume that the integers in these tables are values of a column called id in both tables that uniquely identify a row in each table. When you have a commonly named column like this, the operation of joining based on columns that share the same name is a natural join. In relational algebra it is denoted with a litte bowtie, for example, $O \bowtie I$. We could also use the Equi-join notation that specifies the exact join columns $O \bowtie_(O.id = I.id) I$. The graph will look about the same as before but we only change the operation we are performing.

Now we only get the output of two rows as we should expect.

2, 2
4, 4

One important aspect that that gets glossed over in this simple example is that the data is small and in memory versus a database initially has to retrieve the data from disk. Reading values from a disk using random access is 100,000 times faster on memory. That being said, it’s really important to consider the fact that reading the values over and over again is going to be an exponential exercise, multiplied by 100,000.

It would be better if we could read one table into memory once, and reuse those values as you scan over the data of the other table. There is a common name for both of these. Trino first reads the inner table into memory, to avoid having to read this table for each row in the outer table. We call this table the build table, as with the first scan you build the table in memory. Trino then streams the rows from the outer table and performs the join with the build table. We call this table the probe table.

import java.util.*;

public class BuildProbeLoops {
    public static void main(String[] args) {
        int[] probeTable = {2, 4, 6, 8, 10, 12};
        int[] buildTable = {1, 2, 3, 4};
        Map<Integer, Integer> buildTableCache = new HashMap<>();

        for (int row : buildTable) {
            //in this case the row is actually just the join column
            int hash = row;

            buildTableCache.put(hash, row);
        }

        for (int row : probeTable) {
            //in this case the row is actually just the join column
            int hash = row;

            Integer buildRow = buildTableCache.get(hash);
            if(buildRow != null){
                System.out.println(buildRow + ", " + row);
            }
        }
    }
}

While it may seem redundant to do all of this extra work for this simple example, this saves minutes to hours when reading from disk and the data you are reading is big enough. The runtime complexity has now dropped from $O(n^2)$ to just a linear runtime of $O(n)$. The relational algebra for this table is now $P \bowtie B$, where $P$ is the probe table and $B$ is the build table. Notice the relational algebra for this hasn’t changed, we just now specify that we do a build on the inner table and probe the outer table.

One thing to consider is the size of each table, if we are fitting one of the tables into memory, it’s probably best we choose the smaller table to use as the build table. Hopefully this helps you understand now why we now specify between a build and a probe table. This will help in our discussions about query optimization and dynamic filtering which we will discuss on the next show.

Another interesting subtopic of this that we won’t get into today are left -deep and right-deep plans. Since now we know that the probe table is always on the left and our build table is on the right, the shape of our query matters. Consider the difference between these two trees.

The left-deep tree vs right-deep trees have big implications on the speed of the query. This is a bit tangential for our talk today. Let’s finally move on to hash-joins!

In Trino, a hash-join is the common algorithm that is used to join tables. In fact the last snippet of code is really all that is invovled in implementing a hash-join. So in explaining probe and build, we have already covered how the algorithm works conceptually.

The big difference is that trino implements a distributed hash-join over two types of parallelism.

Joined tables are distributed over the worker nodes to achieve inter-node parallelism. Instead of the hash value simply being used to match with other rows, it is also used to route to specific Trino worker nodes. Rows that meet the equijoin criteria then are processed by the workers for a set of ids.
Within the node, workers can use the hash to further distribute the rows across multithreaded applications. This intranode-parallelism allows for there to be a single thread for every hash partition.
Finally, once all of these threads are finished determining which rows pass the join criteria, the probe side then begins to emit rows in larger batches, which can quickly be thrown out or kept based on which partitions exist on a given worker.

Great resources on this topic where some of the examples above derive:

How to contribute documentation and testimonials

Instead of a PR this week Manfred discusses some notes on how to contribute to documentation and testimonials.

If you want to show us some 💕, please give us a ⭐ on Github.

Events, news, and various links

Blogs

Upcoming events

Feb 9 - Feb 10 http://starburstdata.com/datanova

Latest training from David, Dain, and Martin(Now with timestamps!):

Presto® Summit Series - Real world usage

Podcasts:

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

8: Trino: A ludicrously fast query engine: past, present, and future

2021-01-11T00:00:00+00:00

In this episode…

Well, we’re back, and no longer waving the Presto® flag like we did before. If you haven’t heard, Presto® SQL is now Trino ( Read more about that here). In this episode, we sit down with the four original creators of Presto® and discuss the journey in more detail of what led us to our current trajectory with the Presto® SQL project and why that is now being renamed to Trino. We also discuss how this affects those that are using Trino. If you are developing on Trino and have the old namespace check out the guide to migrate here.

We also discuss the differences between the two projects. It is actually a lot after two years since the split, and we recommend looking at the blog we wrote at the end of 2019 and keep your eyes peeled for the blog we are writing to summarize the changes in 2020!

Finally, we cover some sneak peeks at the roadmap for Trino in 2021.

If you want to show us some 💕, please give us a ⭐ on Github.

Events, news, and various links

Blogs

Upcoming events

Feb 9 - Feb 10 http://starburstdata.com/datanova

Latest training from David, Dain, and Martin(Now with timestamps!):

Presto® Summit Series - Real world usage

Podcasts:

If you want to learn more about Trino, check out the definitive guide from OReilly. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Trino in 2020 - An amazing year in review

2021-01-08T00:00:00+00:00

Wow! If you would have to sum up what happened in the last year in this great community, wow would be it. It is truly awe-inspiring to be part of this incredible journey of Trino. Oh yeah, on that note. Our community and project chose the new name Trino, to be able to continue to innovate and develop freely as a community of peers. Presto® and Presto® SQL are a thing of the past.

Now that is out of the way, let’s dive right in and see what all our community members across the globe have created with us!

2019 was a big year for us, but check out how 2020 eclipsed even that!

By the numbers

Even the size and growth of our community on Slack is impressive:

Started in January 2020 with ~1600 members and 280 weekly active
Over 3200 members by December 2020
560 members active weekly

The innovation and change of the source code on GitHub is a result of the hard work of the community:

Over 4000 commits merged
More than 2800 pull requests received
23 releases, nearly every two weeks basically!

As you can see, much of the excitement around the name change has quickly increased the number of stars we have on GitHub. While some of this certainly stems from an initial buzz around a shiny new name, we also believe that this name change has brought clarity to the community. Trino is an improved version, supported by the founders and creators of Presto®, along with the major contributors.

And if you have not done so already, make sure to star the repository and join us on slack.

Features and code

While everything mentioned is already exciting, the true work is visible in the new features and improvements in Trino. It is a long list, but read on. You won’t want to miss anything.

Improvements to ANSI SQL support

A core feature of Trino is the ability to use the same standard SQL for any connected data source. These improvements empower all users.

Variable-precision temporal types, with precision down to picoseconds (10⁻¹²s). This a very important feature for any time critical systems such as financial transactions processing
Correct, and now SQL specification compliant timestamp semantics, making migration of SQL statements from other compliant systems such as many RDBMSs easier
Implicit coercions for INSERT clause
Support for RANGE and GROUPS-based window frames
More support for various shapes of correlated subqueries
Support for INTERSECT ALL and EXCEPT ALL
Parameter support in LIMIT, FETCH FIRST, and OFFSET clause
Experimental support for recursive queries
Enforcement of NOT NULL constraints when inserting data
Quantified comparisons (e.g., > ALL (...)) in aggregation queries

Other query improvements

A number of other features were added to make querying your data sources with Trino even more powerful:

T-digest data type and functions for approximate quantile computations
Support for setting and reading column comments
Numerous new functions including concat_ws(), regexp_count(), regexp_position(), contains_sequence(), murmur3(), from_unixtime_nanos(), from_iso8601_timestamp_nanos(), human_readable_seconds(), bitwise operations, luhn_check(), approx_most_frequent(), translate(), starts_with()

Performance

Trino is already ludicrously fast. But then again, even faster is better, so we worked on that:

Improved pushdown of complex operations into connectors, including aggregation pushdown and TopN pushdown.
Dynamic filtering and partition pruning, which can improve performance of highly selective joins manyfold.
Cost-based decisions for queries containing IN <subquery> in WHERE clause.
Information_schema performance improvements, which benefit third-party BI tools that need to inspect table metadata, for example DBeaver, Datagrip, Power BI, Tableau, Looker, and others.
Faster queries on nested data in Parquet and ORC.
Faster and more accurate approx_percentile, based on t-digest data structure.
Support of Bloom filters in ORC.
Experimental, optimized Parquet writer.

Security

The more data you access with Trino, the more it becomes critical to secure it. With that in mind we added a lot of improvements:

The Web UI now requires authentication. Various actions such as viewing query details, killing queries, etc., are protected with authorization checks based on the identity of the user. Additionally, the UI now supports OAuth2 for user identification.
External and internal APIs are now properly secured with authentication and authorization checks. Importantly, this fixes a CVE reported vulnerability that affects all older versions of Presto®.
A new mechanism to externalize secrets in configuration files that makes it easier to integrate with third-party secret managers and deployment tools.
Support for JSON Web Key (JWK) authentication and pluggable certificate authenticators.
Add new Salesforce authenticator.
The query engine and access control SPIs now support injecting row filters and column masks.
New syntax for managing permissions (GRANT/REVOKE on schema, ALTER TABLE/SCHEMA/VIEW ... SET AUTHORIZATION).

Data sources

Trino empowers you to use one platform to access all data sources. Connectors enable this and we added numerous new connectors:

All other connectors received a large host of improvements. Let’s just look at two popular connectors:

Hive connector for HDFS, S3, Azure and cloud object storage systems

Complex Hive views, allows integration with Hive or simplifying migration from Hive
ACID transactional tables with INSERT and DELETE support
Built-in storage caching and support for external caching with Alluxio
New procedures: system.drop_stats(), register_partition(), unregister_partition()
Support for Azure object storage
Support for S3 encrypted files, flexible S3 security mappings and Intelligent-Tiering S3 storage

Elasticsearch connector

The Elasticsearch connector received numerous powerful improvements:

Password authentication
Support for index aliases
Support for array types, Nested, and IP type
Support for Elasticsearch 7.x

Runtime improvements

Operating and maintaining a Trino cluster takes a significant amount of resources. So any work to improve the runtime needs have a significant positive impact:

Requirement to use Java 11, with better GC performance, overall performance, and improved container support
Support for ARM64-based processors to run Trino
Support for minimum number of workers before query starts, useful for implementing autoscaling
Data integrity checks for network transfers to prevent data corruption during processing

Everything else

There is so much more to capture, and you really would have to read all the release notes in detail to know it all. To safe you from that, here are a few more noteworthy changes:

Experimental support for materialized views in Iceberg connector
JDBC driver backward compatibility tests
Support for multiple event listeners
Added Python client support for exec with parameters
New look and navigation for the documentation, and lots of new content

Community resources and events

Beyond the raw code and helping each other, the community collaborated on other helpful resources like books and in-depth video tutorials.

Matt, Manfred, and Martin published the book Trino: The Definitive Guide with O’Reilly. Over 5000 readers took advantage of the free digital copy.

Brian and Manfred launched the live streaming event Trino Community Broadcast, and grew their audience and back catalog to include some very useful material. If you have not seen it yet, go and watch some old episodes and join us in the next ones.

We also had a number of other online events and presentations, with direct participation of our community members:

A dedicated conference event for the community in Japan was very successful.
The Argentina Big Data Meetup had a large audience from the community in South America

A series of virtual events around the project started with a roadmap and overview meeting and included a number real world use case examples at scale:

Another series of training classes with the project founders was hugely successful. It includes very valuable content for any Trino user, from beginners to experts, that you should not miss:

Conclusion

2020 was a wild ride for us all. Trino and the Trino community definitely emerged as a winner, and we are looking forward to a very bright future with you all.

A couple of ongoing work is already underway and very promising:

Optimized Parquet reader, on par with ORC reader support
Support for SQL UPDATE and MERGE statements
Oauth2 support for JDBC
Support for SQL WINDOW clause and MATCH_RECOGNIZE usage

We’re starting the new year with a shiny new name, a cute little bunny, and a very vibrant community. The future is looking great for Trino!

Don’t hesitate and miss out on all the benefits of Trino. Join us on Slack to get started!

Migrating from PrestoSQL to Trino

2021-01-04T00:00:00+00:00

As we previously announced, we’re rebranding Presto SQL as Trino. Now comes the hard part: migrating to the new version of the software. We just released the first version, Trino 351, which uses the name Trino everywhere, both internally and externally. Unfortunately, there are some unavoidable compatibility aspects that administrators of Trino need to know about. We hope this post makes the transition as smooth as possible.

Things that haven’t changed

Let’s start with the good news. For end users running queries against Trino, everything should be the same. There are no changes to the SQL language, SQL functions, session properties, etc.

Users now see Trino in error messages, a different logo in the web UI, and error stack traces have a different package name, but otherwise they won’t know that anything has changed. All of their views, reports, or other stored queries will work as before.

Similarly for administrators, except for a few things noted in the Trino 351 release notes, all the configuration properties are the same.

Client protocol compatiblity

The client protocol is how clients, such as the CLI or JDBC driver, talk to Trino. It uses standard HTTP as the underlying communications protocol, with some custom HTTP headers to communicate values to and from Trino. Unfortunately, those header names started with X-Presto- and thus had to be changed to X-Trino-.

The Trino CLI and JDBC driver send the new headers, so they are only compatible with Trino versions 351 and newer. Users should wait to upgrade the CLI or JDBC driver until the Trino servers they talk to have been upgraded.

Out of the box, the Trino server does not work with older clients. However, in order to support a graceful transition, you can allow the server to support older clients by adding a configuration property:

protocol.v1.alternate-header-name=Presto

We recommend using version 350 of CLI and JDBC driver as the transition version. It has all the newest features such as variable precision timestamps, has been tested with a range of older server versions, and is the last version to support older servers.

JDBC driver

The URL prefix for the JDBC driver now starts with jdbc:trino: instead of jdbc:presto:. This means that any client applications using the JDBC driver need to update their connection configuration. The old prefix is still supported, but will be removed in a future release.

The class name of the driver is now io.trino.jdbc.TrinoDriver. This is of no concern to most users, as the driver is normally accessed via the standard JDBC auto-discovery mechanism based on the URL. As with the URL prefix, the old name is still supported, but will be removed in a future release.

Server RPM

The name of the RPM has changed, so it is treated as a different RPM, and thus you cannot simply upgrade from the old version to the new version. All of the directories for the RPM that contained the name presto now use trino instead. You likely want to uninstall the old RPM, rename the config and log directories, then install the new RPM.

Docker image

The Trino Docker image is now published as trinodb/trino. The supported configuration directory is now /etc/trino. The CLI is now named trino instead of presto.

JMX MBean naming

Trino runs on the JVM, which has the JMX framework as a standard way to expose system and application metrics. Trino exposes a huge number of JMX metrics for administrators to monitor their clusters. You might be using these metrics via your monitoring system, or perhaps you are accessing them in SQL via the Trino JMX connector.

The metrics for Trino server now start with trino instead of presto. You might need to update this name in your monitoring system, or you can revert to the old name:

jmx.base-name=presto

Similarly, the metrics for the Elasticsearch, Hive, Iceberg, Raptor, and Thrift connectors now start with trino.plugin instead of presto.plugin. Again, you might need to update these names in your monitoring system, or you can revert to the old name. For example, for the Hive connector:

jmx.base-name=presto.plugin.hive

Thrift connector

The Thrift connector had many backwards incompatible changes to both the Thrift service interface and the configuration properties. You need update all of your implementations of the Thrift service used by the connector.

SPI

If you have any custom plugins for Trino, such as connectors or functions, these need to be updated. The package name is now io.trino.spi, and a few classes were renamed:

PrestoException to TrinoException
PrestoPrincipal to TrinoPrincipal
PrestoWarning to TrinoWarning

There are no functional changes, so all you should need to do is update your imports and rename the references to the above class names.

Migration guide

Now that you understand what is different and what you need to change, you can start thinking about the list of steps needed to perform the migration. The following is a rough plan for upgrading your environment.

Step 1: Prepare to deploy the new version

Let users know the name is changing, so they are not surprised by the logo changes in the UI.
Make sure that users are using recent client versions. Ideally, upgrade them all to version 350, as mentioned above. You can check the HTTP request logs for the coordinator to see what client versions are in use.
Update your server configuration with protocol.v1.alternate-header-name=Presto to allow supporting all of your existing Presto clients.
If you are using the RPM, have a plan to deal with the new RPM name and the trino directory names.
If you are using Docker, use the new image name, make sure your configuration will be mounted using the trino path name, and remember that the CLI is now named trino.
Update any custom plugins to use the new SPI.
Check if you have anything using JMX to monitor your clusters, and decide if you will update them to the new names or set a Trino config to revert to the old names.

Step 2: Upgrade your servers to Trino 351+

Upgrade development and staging servers.
Upgrade production servers. If you have multiple clusters, you can do them one at a time, and verify everything is working before moving on to the next one.

Step 3: Upgrade clients

Upgrade all clients including the CLI, JDBC driver, Python, etc., to the Trino versions.
Update any applications using JDBC to use the new jdbc:trino: connection URL prefix.

Step 4: Cleanup

Remove the protocol.v1.alternate-header-name configuration property.
If you configured Trino to use the old JMX names, convert your monitoring system to use the new JMX names and remove the fallback configs.

Getting help

We’re here to help! If you run into any issues while upgrading, or having any questions or concerns, ask on Slack.

We’re rebranding PrestoSQL as Trino

2020-12-27T00:00:00+00:00

We’re rebranding PrestoSQL as Trino. The software and the community you have come to love and depend on aren’t going anywhere, we are simply renaming. Trino is the new name for PrestoSQL, the project supported by the founders and creators of Presto® along with the major contributors – just under a shiny new name. And now you can find us here:

GitHub: https://github.com/trinodb/trino. Please give it a star!
Twitter: @trinodb
Slack: https://trino.io/slack.html

If you want to learn why we’re doing this, read on…

In 2012, Dain, David and Martin joined the Facebook data infrastructure team. Together with Eric Hwang, we created Presto® to address the problems of low latency interactive analytics over Facebook’s massive Hadoop data warehouse. One of our non-negotiable conditions was for Presto® to be an open source project. Open source is in our DNA - we had all used and participated in open source projects to various degrees in the past, and we recognized the power of open communities and developers coming together to build successful software that can stand the test of time.

Over the next six years, we worked hard to build a healthy open source community and ecosystem around the project. We worked with developers and users all over the world and welcomed them into the Presto® community. Presto® was on a path of increasing growth and success, in large part because of the contributions from developers across many fields and all over the world.

Unfortunately in 2018, it became clear that Facebook management wanted to have tighter control over the project and its future. This culminated with their decision to grant Facebook developers commit rights on the project without any prior experience in Presto®. We strongly believe that this kind of decision is not compatible with having a healthy, open community. Moreover, they made this decision by fiat without engaging the Presto® community. As a matter of principle, we had no choice but to leave Facebook in order to focus on making sure Presto® continued to be a successful project with an open, collaborative and independent community. In reality, the choice was easy.

We started the Presto Software Foundation in January 2019 as an independent entity to oversee the development of the software and community, continuing the meritocratic system that had been in place over the previous 6 years. The community quickly consolidated under this new home. We intentionally stayed unemployed over the next 10 months to focus on expanding and strengthening the community by working directly with major users and contributors, as well as reaching out to a wider group of users and developers across the globe. This resulted in new use cases and an injection of energy, making the project more vibrant than ever before as even more new users and developers became engaged. But, don’t take our word for it, let the data speak for itself:

Months after this consolidation, Facebook decided to create a competing community using The Linux Foundation®. As a first action, Facebook applied for a trademark on Presto®. This was a surprising, norm-breaking move because up until that point, the Presto® name had been used without constraints by commercial and non-commercial products for over 6 years. In September of 2019, Facebook established the Presto Foundation at The Linux Foundation®, and immediately began working to enforce this new trademark. We spent the better part of the last year trying to agree to terms with Facebook and The Linux Foundation that would not negatively impact the community, but unfortunately we were unable to do so. The end result is that we must now change the name in a short period of time, with little ability to minimize user disruption.

On a personal note, and as the founders who named the project Presto® in the first place, this is an incredibly sad and disappointing turn of events. And while we will always have fondness for the name Presto®, we have come to accept that a name is just a name. To be frank, we’re tired of this endless distraction, and we intend to focus on what matters most and what we are best at doing – building high quality software everyone can rely on and fostering a healthy community of users and developers that build it and support it. We’re not going anywhere – we’re the same people, the same amazing software, under a new name: Trino.

If you love this project, you already love Trino. ❤️

Facebook is a registered trademark of Facebook Inc. The Linux Foundation and Presto are trademarks of The Linux Foundation.

7: Cost Based Optimizer, Decorrelate subqueries, and does Presto make my RDBMS faster?

2020-11-30T00:00:00+00:00

Release 348

Release Notes discussed: https://prestosql.io/docs/current/release/release-348.html

Martin’s announcement:

Support for OAuth2 authorization in Web UI
Support for S3 streaming uploads
Support for DISTINCT aggregations in correlated subqueries
Performance improvement for ORDER BY … LIMIT queries
Many improvements and bug fixes to JDBC driver

Manfred’s observations:

SHOW STATS to play around with
switch for Hive view translation off, legacy or new coral system
a bunch of other Hive connector improvements
Iceberg on GCP and Azure
Small SPI changes

Concept of the week: Cost Based Optimizer

We’re continuing our series covering some fundamental topics that build up to dynamic filtering! This week we’re discussing the cost-based optimizer with Presto co-creator Martin Traverso!

Parser/Analyzer

To recap, in episode 6 we discussed a little bit about the various forms a query takes from submission to the coordinator, to actually being executed. We discussed how the parser generates an abstract syntax tree (AST) and the analyzer checks for valid SQL including functions and making sure tables and columns being referenced actually exist.

Here’s an example of an abstract syntax tree from last weeks episode for query SELECT * FROM (VALUES 1) t(a) WHERE a = 1 OR 1 = a OR a = 1;.

Planner

The next phase we discussed was the planner. Internally, the planner and optimizer overlap substantially, but you can think of the planner as the early part of the planning phase that generates the logical query, and over several optimization iterations becomes an optimized distributed query. The planner generates a new tree data structure called the plan IR (intermediate representation) that contains nodes representing the steps that need to be performed in order to answer the query. The leaves of the tree get executed first, and the parents of each node are dependent on the action of its child completing before it can start.

Here’s ab example of a logical plan tree using the same query form the AST above. Since this query isn’t pulling from a data source, the distributed plan is equivalent to the logical plan.

Cost-Based Optimizer (CBO)

In the cost-based optimizer phase, there are various rules that are applied to the Plan IR that slowly optimize the structure into the final distributed plan that is then executed. To do this, the optimizer retrieves some statistical metadata of the tables and their data. This information includes, table row counts, column data size, column low/high value, distinct column value count, and the percentage of null values in a column. With the list of rules that aim to leverage these statistics, the optimizer improves the query structure that improves on parallelism based on the number of workers to the number of sources.

If you want to jump into the code, start at the entry point for the planner/optimizer and the initial planning starts on this line. This loop is where the actual optimization occurs. So if you are interested, maybe grab a brandy 🥃 and take some time to set your debugger at these points and watch the optimizer do its thing!

Refer to chapter 4 in Trino: The Definitive Guide pg. 50.

PR of the week: PR 1415 Decorrelate subqueries with Limit or TopN

In this week’s pull request https://github.com/prestosql/presto/pull/1415, done by Presto contributer and Starburst Engineer kasiafi.

Before we can jump into this PR, let’s discuss what a subquery is and further what a correlated subquery is. In SQL you have a nested query that runs within another query, typically embedded within a WHERE clause or SELECT statement. Take this query for example:

SELECT a, b, c
FROM table
WHERE a > (SELECT t2.a FROM table2 t2 WHERE t2.d < 60);

In this example, we have a standard non-correlated subquery that runs on table2. The reason it is not correlated is because there are no dependencies on the parent query that is being run on table. This type of query enables the SQL engine to run the subquery first and then use those results to run the parent query after. In the case of a correlated query, you typically have at least one criterion in the nested query that depends on the parent. This requires that the nested query gets executed for each row of the parent query. Take a look at this correlated query:

SELECT a, b, c
FROM table t1
WHERE a > (SELECT t2.a FROM table2 t2 WHERE t2.d < 60 AND t1.b <> t2.b);

In this example, we are running the subquery in the context of the row in order to evaluate the value of t1.b. Having this query run for every row of the parent query is certainly not ideal if it is not required and that is why subquery decorrelation is a common optimization technique if an equivalent non-correlated subquery exists for a given correlated subquery.

This pull request adds a rule that added the ability for Presto to handle the decorrelation of a subquery containing a LIMIT or (ORDER + LIMIT i.e. TopN) clauses. So, the common trick during decorrelation is to turn it into a query that can process the results from the inner table in one shot. The approach is to flatten the results of executing the subquery for every row into a single stream of rows before it is finally ready for execution.

This change also applies to a LATERAL join, which behaves a lot like a nested subquery only that it acts as a table and returns multiple rows instead of just a single row.

PR Demo: PR 1415 Decorrelate subqueries with Limit or TopN

SELECT (
   SELECT t.a 
   FROM (VALUES 1, 2, 3) t(a)
   WHERE t.a = t2.b
   LIMIT 2
) 
FROM (VALUES 1) t2(b);

SELECT (
   SELECT t.a 
   FROM (VALUES (1, 2), (1, 3)) t(a, b) 
   WHERE t.a = t2.a AND t2.b > 1 
   LIMIT 1
) FROM (VALUES (1, 2)) t2(a, b);

SELECT (
   SELECT t.b 
   FROM (VALUES (1, 2), (1, 3)) t(a, b) 
   WHERE t.a = t2.a AND t2.b > 1 
   ORDER BY t.b 
   LIMIT 1
) FROM (VALUES (1, 2)) t2(a, b)

### Fails
# 1) Returns more than one row from subquery.
# This query actually fails on execution on not during planning/optimizing where
# the other two below fail .
SELECT (
   SELECT t.a 
   FROM (VALUES 1, 1, 2, 3) t(a)
   WHERE t.a = t2.b
   LIMIT 2
) 
FROM (VALUES 1) t2(b);

# 2) Limit and correlated non-equality predicate in the subquery
SELECT (
   SELECT t.b 
   FROM (VALUES (1, 2), (1, 3)) t(a, b)
   WHERE t.a = t2.a AND t.b > t2.b 
   LIMIT 1
) FROM (VALUES (1, 2)) t2(a, b);

# 3) TopN and correlated non-equality predicate in the subquery
SELECT (
   SELECT t.b 
   FROM (VALUES (1, 2), (1, 3)) t(a, b) 
   WHERE t.a = t2.a AND t.b > t2.b 
   ORDER BY t.b 
   LIMIT 1
) FROM (VALUES (1, 2)) t2(a, b)

After the show Kasia pointed out that the failing queries were not all failing for the same reason. The first failing query above actually gets planned and executed, but the exception occurs during the execution. The rest actually fail during the planning and optimization phase as they were unable to be decorrelated due to the issue I line out in the comments above.

Question of the week:

In this week’s question, we answer: Will running Presto on my relational database make processing faster?

I have been going over the docs of PrestoSQL and it seems to fit some of my requirements. I am little concerned about the resources needed to run Presto in production. Because the size of my prod data is between 3-5GB and there will be very minimal data growth. Is Presto suitable for such a small data size?

Many times, the idea that Presto is fast gets conflated with the idea that Presto is a good fit for all use cases. It is important to understand that Presto is a) not a database b) not developed for OLTP workloads and c) built to handle data at the scale of Terabytes to Petabytes over distributed queries. Since Presto uses a connector framework, it also has an added benefit of running federated queries to whatever data source that returns data that can be represented in some columnar fashion.

For relatively small size data sets you should try directly using your relational database first. Doing this is better for small data sets. Database indexes are really nice if you’re not in big data world and if you give your SQL Server say 10 GB memory, it should be running fully in-memory and thus — fast.

Events, news, and various links

Blogs

Upcoming events

Dec 16 https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit
Feb 9 - Feb 10 http://starburstdata.com/datanova

Latest training from David, Dain, and Martin(Now with timestamps!):

Presto Summit Series - Real world usage

Podcasts:

If you want to learn more about Presto yourself, you should check out the O’Reilly Trino Definitive guide. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

6: Query Planning, Remove duplicate predicates, and Memory settings

2020-11-30T00:00:00+00:00

Release 347

We discuss the Trino 347 release notes: https://trino.io/docs/current/release/release-347.html

Official release announcement from Martin Traverso:

We’re happy to announce the release of Presto 347! This version includes:

Support for EXCEPT ALL and INTERSECT ALL
New syntax for changing the owner of a view
Performance improvements when inserting data into Hive tables

Notes from Manfred:

contains_sequence function for arrays.
CentOS 8 on docker image.
Kudu get dynamic filtering.

Concept of the week: Query planning

All happening on coordinator in cluster.
Before a query can be planned, the coordinator receives a SQL query and passes it to a parser.

Parser/Analyzer

The Parser parses the sql query into an AST (abstract syntax tree).
Then the analyzer checks for valid SQL including functions and such.

Planner/Optimizer

Request metadata about structure from catalogs.
- Do the tables and columns exist?
- What data types are used?
Request metadata about content (table stats, data location).
Create logical plan
- Are function parameters using right data types?
- What catalogs/schema/tables/columns need to be accessed?
- Are joins using compatible field data types?
- Optimize
  - Eliminate redundant conditions.
  - Figure best order of operations.
  - Decide on filtering early.
Create distributed plan (More on this in the next episode!)
- Break logical plan up.
- Adapt to parallel access by multiple workers to data source.
- Break up operations so workers aggregate and process data from other workers.

Use EXPLAIN to learn what is planned. Also refer to chapter 4 in Trino: The Definitive Guide pg. 50.

PR of the week: PR 730 Remove duplicate predicates

In this week’s pull request https://github.com/trinodb/trino/pull/730, came from one of the co-creators Martin Traverso. This pull request removes duplicate predicates in logical binary expressions (AND, OR) and canonicalizes commutative arithmetic expressions and comparisons to handle a larger number of variants. Canonicalize is a big word but all it is saying is that if there are multiple representations of the same logic or data, then simplify it to a simpler or agreed upon normal form.

For example the statement COALESCE(a * (2 * 3), 1 - 1) is equivalent to COALESCE(6 * a, 0) as the expression 2 * 3 can be simplified to static integer.

This is an example of a logical plan because we are talking about the query syntax by optimizing the SQL. It differs from the distributed plan as we are not determining how the plan will be distributed, where this plan will run and it does not run further optimizations that are handled by the cost based optimizer such as pushdown predicates. We’ll talk about this step more in the next episode. For now let’s cover a few examples

Demo: PR 730 Remove duplicate predicates

The format of the EXPLAIN used is graphviz. The online tool used during the show is Viz.js. You can paste the output of your EXPLAIN queries to visualize the query in a tree form.

EXPLAIN (
 FORMAT GRAPHVIZ,
 TYPE LOGICAL
 )
SELECT * FROM (VALUES 1) t(a) WHERE a = 1 OR 1 = a OR a = 1;

EXPLAIN (
 FORMAT GRAPHVIZ,
 TYPE LOGICAL
 )
SELECT * FROM (VALUES 1) t(a) WHERE a = 2 OR 1 = a OR a = 3; 

EXPLAIN (
 FORMAT GRAPHVIZ,
 TYPE DISTRIBUTED
 )
SELECT * FROM tpch.tiny.orders
WHERE custkey > 100 and custkey > 50 and custkey > 50 and custkey > 50 and custkey > 50;  

SELECT * 
FROM tpch.tiny.orders o 
  JOIN tpch.tiny.customer c 
  ON o.custkey = c.custkey AND o.custkey > 50 
WHERE c.custkey > 100 AND c.custkey > 50 LIMIT 10;

Question of the week: How should I allocate memory properties?

In this week’s question, we answer:

How should I allocate memory properties? CPU : 16Core MEM:64GB

Before answering this, we should make sure a few things about memory are clear.

User memory

Space needed that the user is capable of reasoning about:

Input Data
Hash tables execution
Sorting

Settings

query.max-memory-per-node - maximum amount of user memory that a query is allowed to use on a given worker.
query.max-memory (without the -per-node at the end) - This config caps the amount of user memory used by a single query over all worker nodes in your cluster.

System memory

Memory needed to facilitate internal usage

Shuffle buffers

NOTE: There are no settings for this memory as it is implicitly set by the user and total memory settings. Use this to calculate system memory:

max system memroy per node = query.max-total-memory-per-node - query.max-memory-per-node
max system memory = query.max-total-memory - query.max-memory

Total memory

Total Memory = System + User, but there are only properties for total and user memory.

Settings

query.max-total-memory-per-node - maximum amount of total memory that a query is allowed to use on a given worker.
query.max-total-memory(without the -per-node at the end) - This config caps the total memory used by a single query over all worker nodes in your cluster.

Heap headroom

The final setting I would like to cover is the memory.heap-headroom-per-node. This config sets aside memory for the JVM heap for allocations that are not tracked by Presto. You can typically go with the default on this setting which is 30% of the JVM’s max heap size (-Xmx setting).

JVM heap memory (-Xmx setting)

Now knowing that Presto is a java application means it runs on the JVM. None of these memory settings mean anything until we actually have the JVM that Presto is running on set aside sufficient memory. So how do I know I am setting sufficient memory based on my settings?

query.max-total-memory-per-node + memory.heap-headroom-per-node < -Xmx setting (Java heap)

Dain really covers the proportions well in detail on the recent training videos. Here’s a snippet of what he recommends.

All in all, try to estimate the amount of memory needed by your max anticipated query load, and if possible try to get even more than your estimate. Once Presto is discovered by users, they will start to use it even more and demands on the system will grow.

Events, news, and various links

Blogs

Upcoming events

Latest training from David, Dain, and Martin(Now with timestamps!):

Presto Summit Series - Real world usage

Podcasts:

If you want to learn more about Presto yourself, you should check out the O’Reilly Trino Definitive guide. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

A Report about Presto Conference Tokyo 2020 Online

2020-11-21T00:00:00+00:00

On Nov 11th, 2020, Japan Presto Community held the 2nd Presto Conference welcoming Martin Traverso and Brian Olsen. The conference was hosted at Youtube Live. This article is the summary of the conference aiming to share their great talks.

Presto Community Updates

First of all, Martin introduced recent Presto updates in these days. It covers recent changes and enhancements achieved by the community activities. Attendees also learned several new functions that will be available soon.

Update / Merge (https://github.com/prestosql/presto/issues/3325)
Materialized Views (https://github.com/prestosql/presto/pull/3283)
Dynamically resolved functions
Optimized Parquet reader

In addition, at Q&A, he suggests new developers who want to contribute to PrestoSQL to check “good first issue” tag on Github. The tag is a good first step for a new joiner to contribute. Ref. link

Presto Community - How to get involved

To make attendees get used to Presto Community, Martin provided a guide for walking around Presto community. He gives us their team’s principles about the Presto community, and talk about their education strategy for new Presto users. I would like to quote the pricinpals here.

We are passionate about open source
We help others be succesfful with what we create
We create robust long-lasting software
We are egalitarian (nobody is more important than the other)

Support Presto as a feature of SaaS

Then, Satoru Kamikaseda, Technical Support Engineer at Treasure Data, provides an overview of how Treasure Data supports Presto in their service. Presto is heavily used to support many enterprise use cases as a customer data platoform, and it is becoming the hub component processing high throughput workload from many kinds of clients such as Spark, ODBC and JDBC.

He described statistics about Presto queries on their platform, and how to support each cases. In the stats, 1/3 is any investigation of job failure and query result, 1/3 is a request to help their client’s SQL, and others are a sort of notifications to their clients and performance investigation. His talk must be useful for any SaaS companies that provides a query engine to their clients to learn how difficult it is to support a distibuted query engine.

Support Presto as a feature of SaaS from SatoruKamikaseda

How to use Presto with AWS efficiently

We could learn how to use Presto with AWS including Presto on EMR, Presto on EC2, Presto by Athena and AWS Glue. Noritaka Sekiyama, Sr. Big Data Architect at Amazon Web Service, Japan, also shares a comparison of Presto on AWS (EC2, EMR, Athena). If you are a new to Presto, his talk gives you an insight to choose your first Presto environement.

AWS で Presto を徹底的に使いこなすワザ from Noritaka Sekiyama

Presto @ LINE 2020

LINE is the biggest company providing the mobile communication tool in Japan (say WhatsApp in Japan). HYuya Ebihara, one of Presto maintainers, gives us how they improve Presto at their platform since they presented in the previous conference. Their Presto usage significantly increases from 2019. Num of Presto workers from 100 to 300 and Num of daily queries reaches to 50,000 queries from 20,000 queries. We could learn how to upgrade Presto from 314 to 339 and how they resolved issues through Presto upgrade.

Dive into Amazon Athena - Serverless Presto, 2020

Makoto Kawamura, Solution Architect at Amazon Web Service Japan, introduces the latest features of AWS Athena and performance tuning tips. It must be helpful for developers who tied to AWS to explore Amazon Athena.

Presto Cassandra Connector Hack at Repro

Repro provides Customer Engagement Platform that enables companies to personalize their communication strategies with the right message at the right time to drive better retention and lifetime value. They use Presto for a segmentation backend system in their service to make a list of audiences with a certain condition.

Takeshi Arabiki gives us an in-depth presentation on the modification of Presto Casandra to stabilize and improve the performance of Presto, in addition to the use of Presto in Repro. His talk covers a wide range of topics from investigation of the bottleneck to its resolution.

Testing Distributed Query Engine as a Service

At the end, Naoki Takezoe from Treasure Data, talks their challenges towards Presto upgrade and how hard to migrate variety of workload with performance stability. In actual production-scale enviroment that are running multiple client, testing is one of big challenges. He shows how they simulate their client workload with theier developed query simulator to cover various corner cases and to verify data correctness.

Testing Distributed Query Engine as a Service from takezoe

Wrap Up

This conference was the first online Presto conference in Tokyo. Unfortunately, We couldn’t have a chance to discuss with the community developers and creators in face-to-face. We hope we’ll get such a great opportunity in the near future. However, that was a great time to have many presentations with the community members to learn a lot of new things from their wornderful experience. During the conference, the average number of Youtube Live viewers are over 100 people, and the total of attendees are around 180 people. In the previous conference, there were 89 attendees. I think the number of Presto developers/users in Japan has been increasing gradually. We really appreciate developers in the community and creators. Thank you so much for coming to the conference and see you next time!

Youtube Live link

The event is mainly talked in Japanese.

Presto Conference Tokyo 2020 Online

5: Hive Partitions, sync_partition_metadata, and Query Exceeded Max Columns!

2020-11-19T00:00:00+00:00

In this week’s concept, Manfred discusses Hive Partitioning.

Concept from RDBMS systems implemented in HDFS
Normally just multiple files in a directory per table
Lots of different file formats, but always one directory
Partitioning creates nested directories
Needs to be set up at start of table creation
CTAS query
Uses WITH ( partitioned_by = ARRAY[‘date’])
Results in tablename/date=2020-11-19
Can also nest deeper WITH ( partitioned_by = ARRAY[‘date’, ‘countrycode’])
Can greatly enhance performance
Optimizer can determine what directories to read based on field
Especially useful when fields are used in WHERE clauses
Also useful for historic data management over time such as moving data out to archive, deleting data, or replacing data with aggregates, or even just running compaction on subsets
Presto can use DELETE on partitions using DELTE FROM table WHERE date=value
Also possible to create empty partitions upfront CALL system.create_empty_partition

See here for more details: https://www.educba.com/partitioning-in-hive/

In this week’s pull request https://github.com/trinodb/trino/pull/223, came from contributor Hao Luo. What this function does is similar to Hive’s MSCK REPAIR TABLE where if it finds a hive partition directory in the filesystem that exist but no partition entry in the metastore, then it will add the entry to the metastore. If there is an entry in the metastore but the partition was deleted from the filesystem, then it will remove the metastore entry. You can find more information about this procedure in the documentation.

Here are the commands and SQL I ran during the show on Presto

SHOW CATALOGS;

SHOW SCHEMAS in minio;
SHOW TABLES IN minio.part;

CREATE SCHEMA minio.part
WITH (location = 's3a://part/');

-- Create a table with no partitions
CREATE TABLE minio.part.no_part (id int, name varchar, dt varchar)
WITH (
  format = 'ORC'
);

INSERT INTO minio.part.no_part 
VALUES 
  (1, 'part-1', '2020-11-18'), 
  (2, 'part-2', '2020-11-18'),
  (3, 'part-3', '2020-11-19'), 
  (4, 'part-4', '2020-11-19'),
  (5, 'part-5', '2020-11-20'), 
  (6, 'part-6', '2020-11-20');

CREATE TABLE minio.part.orders (id int, name varchar, dt varchar)
WITH (
  format = 'ORC',
  partitioned_by = ARRAY['dt']
);

INSERT INTO minio.part.orders 
VALUES 
  (1, 'part-1', '2020-11-18'), 
  (2, 'part-2', '2020-11-18'),
  (3, 'part-3', '2020-11-19'), 
  (4, 'part-4', '2020-11-19'),
  (5, 'part-5', '2020-11-20'), 
  (6, 'part-6', '2020-11-20');

SELECT *
FROM minio.part.no_part
WHERE dt = '2020-11-20';
 
SELECT *
FROM minio.part.orders
WHERE dt = '2020-11-20';

DELETE FROM minio.part.orders 
WHERE dt = '2020-11-18';


SELECT *
FROM minio.part.orders;

-- Make sure you are using minio (which is a rename of hive) catalog
CALL system.sync_partition_metadata('part', 'orders', 'ADD');
CALL system.sync_partition_metadata('part', 'orders', 'DROP');
CALL system.sync_partition_metadata('part', 'orders', 'FULL');

 -- Create a table with multi partitions
CREATE TABLE minio.part.multi_part (id int, name varchar, year varchar, month varchar, day varchar)
WITH (
  format = 'ORC',
  partitioned_by = ARRAY['year', 'month', 'day']
);

INSERT INTO minio.part.multi_part 
VALUES 
  (1, 'part-1', '2020', '11', '18'), 
  (2, 'part-2', '2020', '11', '18'),
  (3, 'part-3', '2020', '11', '19'), 
  (4, 'part-4', '2020', '11', '19'),
  (5, 'part-5', '2020', '11', '20'), 
  (6, 'part-6', '2020', '11', '20'),
  (7, 'part-7', '2019', '11', '18'), 
  (8, 'part-8', '2019', '01', '18'),
  (9, 'part-9', '2019', '11', '19'), 
  (10, 'part-10', '2019', '01', '19'),
  (11, 'part-11', '2019', '11', '20'), 
  (12, 'part-12', '2019', '01', '20');

We ran some queries against the metastore database. It’s a complicated model so here is a database diagram to show the different tables and their relations in the metastore.

This diagram was generated by niftimusmaximus on The Analytics Anvil.

MariaDB (metastore database)

USE metastore_db;

-- show database
SELECT * FROM DBS;

-- show tables given a database
SELECT t.*
FROM DBS d
 JOIN TBLS t ON d.DB_ID = t.DB_ID
WHERE d.NAME = 'part';

-- show location and input format of the table given database/table names
SELECT s.SD_ID, s.INPUT_FORMAT, s.LOCATION, s.SERDE_ID 
FROM DBS d
 JOIN TBLS t ON d.DB_ID = t.DB_ID
 JOIN SDS s ON t.SD_ID = s.SD_ID
WHERE t.TBL_NAME = 'orders' AND d.NAME='part';

-- show (de)serializer format of the table given database/table names
SELECT sd.SERDE_ID, sd.NAME, sd.SLIB
FROM DBS d
 JOIN TBLS t ON d.DB_ID = t.DB_ID
 JOIN SDS s ON t.SD_ID = s.SD_ID
 JOIN SERDES sd ON s.SERDE_ID = sd.SERDE_ID
WHERE t.TBL_NAME = 'orders' AND d.NAME='part';

-- show columns of the table given database/table names
SELECT c.* 
FROM DBS d
 JOIN TBLS t ON d.DB_ID = t.DB_ID
 JOIN SDS s ON t.SD_ID = s.SD_ID
 JOIN COLUMNS_V2 c ON s.CD_ID = c.CD_ID
WHERE t.TBL_NAME = 'orders' AND d.NAME='part'
ORDER by CD_ID, INTEGER_IDX;

-- show partitions of the table given database/table names
SELECT p.*, s.LOCATION
FROM DBS d
 JOIN TBLS t ON d.DB_ID = t.DB_ID
 JOIN PARTITIONS p ON t.TBL_ID = p.TBL_ID
 JOIN SDS s ON p.SD_ID = s.SD_ID
WHERE t.TBL_NAME = 'orders' AND d.NAME='part';

In this week’s question, we answer:

Why am I getting, “Query exceeded maximum columns. Please reduce the number of columns referenced and re-run the query.”?

Example:

I’m running this query to check for duplicates. My table has approx. 650 columns and I get this error.

SELECT *, COUNT(1) 
FROM tbl 
GROUP BY * 
HAVING COUNT(1) > 1

getting a stacktrace like this

io.prestosql.spi.PrestoException: Compiler failed
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitScanFilterAndProject(LocalExecutionPlanner.java:1306)
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitProject(LocalExecutionPlanner.java:1185)
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitProject(LocalExecutionPlanner.java:705)
	at io.prestosql.sql.planner.plan.ProjectNode.accept(ProjectNode.java:82)
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitAggregation(LocalExecutionPlanner.java:1119)
	at io.prestosql.sql.planner.LocalExecutionPlanner$Visitor.visitAggregation(LocalExecutionPlanner.java:705)
	at io.prestosql.sql.planner.plan.AggregationNode.accept(AggregationNode.java:204)
	at io.prestosql.sql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:461)
	at io.prestosql.sql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:432)
	at io.prestosql.execution.SqlTaskExecutionFactory.create(SqlTaskExecutionFactory.java:75)
	at io.prestosql.execution.SqlTask.updateTask(SqlTask.java:382)
	at io.prestosql.execution.SqlTaskManager.updateTask(SqlTaskManager.java:383)
	at io.prestosql.server.TaskResource.createOrUpdateTask(TaskResource.java:128)
	at jdk.internal.reflect.GeneratedMethodAccessor480.invoke(Unknown Source)

The throwable that causes this error MethodTooLargeException comes from the ASM library https://asm.ow2.io/ when you ask it to create a method with more bytecode than is allowed by the JVM specification.

We try to generate code for handling given query and the code generated is too large. Since the code is proportional to number of columns referenced, we rewrap the exception in something more meaningful to the user.

The general strategy would be to lower the number of columns that you reference.

The problem is that in removing columns you will remove important information to the query. For example, in the example looking for duplicates above, you won’t be able to discard false positive duplicate matches, but this may be good enough to help narrow the search space. As always, it depends…

To learn more about the JVM limit and search for code_length in the Java SE specification.

SE8
SE11

Special thanks to Ashhar Hasan for asking this question and providing some useful context!

Release Notes discussed: https://trino.io/docs/current/release/release-346.html

Manfred’s Training - SQL at any scale https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/ https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/

Blogs

Upcoming events

Nov 19 Presto Tokyo Conference - Japanese https://techplay.jp/event/795265
Nov 24 EMEA - Polish https://www.meetup.com/Warsaw-Data-Engineering/events/274666392/
Dec 2 https://www.evanta.com/cdo/atlanta/2020-atlanta-cdo-virtual-executive-summit
Dec 3 EMEA https://www.starburstdata.com/introduction-to-presto/
Dec 9 https://techtalksummits.com/event/virtual-commercial-it-providence-ri/
Dec 10 https://techtalksummits.com/event/virtual-commercial-it-denver-co/
Dec 10 https://www.evanta.com/cdo/san-francisco/2020-san-francisco-cdo-virtual-executive-summit
Dec 16 https://www.evanta.com/cdo/boston/2020-boston-cdo-virtual-executive-summit

Latest training from David, Dain, and Martin(Now with timestamps!):

Presto Summit Series - Real world usage

Recent Podcasts:

If you want to learn more about Presto yourself, you should check out the O’Reilly Trino Definitive guide. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

4: Presto on ACID, row-level INSERT/DELETE, and why JDK11?

2020-11-04T00:00:00+00:00

In this week’s concept, Manfred discusses ACID in general, CAP theorem, HDFS and Hive before ACID, and now ORC ACID and similar support.

ACID https://en.wikipedia.org/wiki/ACID

Atomicity - Transaction completely succeeds or completely fails, no partial results so no inconsistent relationships left tangling and such. The database remains in a consistent state.
Consistency - database content always adheres to defined rules (key constraints).
Isolation, transactions are isolated from each other and can run in parallel with same result as sequentially.
Durability - no data is lost after transaction completion.

ACID used to be a crucial criteria for a “serious” relational database system.

Then came big data and the CAP theorem. https://en.wikipedia.org/wiki/CAP_theorem

Consistency
Availability
Partition tolerance

In this week’s pull request https://github.com/trinodb/trino/pull/5402, came from contributor David Stryker. David covers some interesting aspects to working on this pull request. This commit adds support for row-level insert and delete for Hive ACID tables, and product tests that verify that row-level insert and delete where allowed.

Here is the SQL that we ran in the INSERT/DELETE demo

/*
  Ran against Presto
*/
SHOW SCHEMAS IN minio;
SHOW TABLES IN minio.acid;

CREATE SCHEMA minio.acid
WITH (location = 's3a://acid/');


CREATE TABLE minio.acid.test (a int, b int)
WITH (
   format='ORC',
   transactional=true
);

INSERT INTO minio.acid.test VALUES (10, 10), (20, 20);

SELECT * FROM  minio.acid.test;

DELETE FROM minio.acid.test WHERE a = 10;

/*
  Ran against Hive
*/

SHOW DATABASES;

SELECT * FROM acid.test;

David also mentioned this blog to better understand the hive acid model.

In this week’s question we answer, “Why is Java 11 needed in the newer version of Presto and how do I get the older version of Presto as I need the 328 latest on Java 8 as Java 11 isn’t available to use?

Using Java 11 because it is the next LTS verison of java since 8. Java 11 provides significant performance and stability improvements, so we believe everyone should be running that version to get the best experience out of Presto. Moving to Java 11 allows us to take advantage of many improvements to the JDK and the Java language that were introduced since Java 8.

For older versions, you can download it from maven or an older document version. https://repo.maven.apache.org/maven2/io/prestosql/presto-server/ https://trino.io/docs/328/

One thing to point out is you’re only required to use JDK11 for the server. The client can be on JDK8. One reason you would need to run Presto on JDK8 is if the server had to be run with another service running JDK8 which we do not recommend as this will degrade the performance of your cluster and could cause other issues if Presto is fighting for resources.

Another possibility is that there is a company policy requiring specific JDKs be installed on all servers. You can have side-by-side installs of multiple versions of the JDK and use the appropriate one. You just need to launch Presto with the correct java command. If your company is against using a newer JDK, you can point out the arguments above to update the policy to at least include JDK11.

Release Notes discussed: https://trino.io/docs/current/release/release-345.html

Manfred’s Training - SQL at any scale https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/ https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/

Blogs

https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto

https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more

https://blog.bigdataboutique.com/2020/09/presto-meets-elasticsearch-our-elasticsearch-connector-for-presto-video-mbywtm

Upcoming events

Latest training from David, Dain, and Martin(Now with timestamps!):

Presto Summit Series - Real world usage

Recent Podcasts:

If you want to learn more about Presto yourself, you should check out the O’Reilly Trino Definitive guide. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

3: Running two Presto distributions and Kafka headers as Presto columns

2020-10-22T00:00:00+00:00

In this week’s concept, Manfred discusses what an SPI (service provider interface) is and covers the connector architecture of Presto, Starburst, and Custom.

In this week’s pull request https://github.com/trinodb/trino/pull/4462, came from user Sven Pfennig. Sven works for Syncier GmbH and as part of his role there he gets to contribute to open source projects such as Presto. Thanks Sven! We jump into a quick setup of a kafka broker using the kafka quickstart tutorial and I use the kafkacat tool to show off the addition of headers in Kafka that Sven has provided us and discuss why this is beneficial.

Here’s the crazy select statement I used to decode the binary values to utf text of the foo column

SELECT 
   _message, 
   reduce(element_at(_headers,'foo'), '', (s, c) -> s || from_utf8(c), s -> s) AS foo 
FROM kafka.default.pcb 
WHERE contains(map_keys(_headers), 'foo');

An alternative tutorial that uses the TPC dataset can be located on the website site. https://trino.io/docs/current/connector/kafka-tutorial.html

This weeks question was accidentally cut off as I had mapped my Shift + R key to toggle streaming/recording and this cut the broadcast when I typed the R in FROM.

Release Notes discussed: https://trino.io/docs/current/release/release-344.html

Manfred’s Training - SQL at any scale https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/ https://learning.oreilly.com/live-training/courses/presto-first-steps/0636920462859/

Blogs

https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto

https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more

https://blog.bigdataboutique.com/2020/09/presto-meets-elasticsearch-our-elasticsearch-connector-for-presto-video-mbywtm

Upcoming events

Latest training from David, Dain, and Martin: https://trino.io/blog/2020/07/15/training-advanced-sql.html https://trino.io/blog/2020/07/30/training-query-tuning.html https://trino.io/blog/2020/08/13/training-security.html https://trino.io/blog/2020/08/27/training-performance.html

Presto Summit Series - Real world usage https://trino.io/blog/2020/05/15/state-of-presto.html https://trino.io/blog/2020/06/16/presto-summit-zuora.html https://trino.io/blog/2020/07/06/presto-summit-arm-td.html https://trino.io/blog/2020/07/22/presto-summit-pinterest.html

Recent Podcasts: https://www.contributor.fyi/presto https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/

If you want to learn more about Presto yourself, you should check out the O’Reilly Trino Definitive guide. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Announcing Presto Conference Tokyo 2020

2020-10-21T00:00:00+00:00

Last year, Presto Conference Tokyo 2019 was held in Japan with Martin Traverso, Dain Sundstrom and David Phillips, the founders of the Presto Software Foundation.

This year, the event changes to be an online only event. Presto Conference Tokyo 2020 is happening on the 20th of November. You can find out details and register right now!

The event includes six sessions from Treasure Data, Amazon Web Services Japan, Repro and LINE, as well as open sessions with Martin and Brian Olsen, a Developer Advocate at Starburst Data. This is a valuable opportunity to hear from engineers who are actually using Presto. It has something for those who are using Presto for data engineering and those who don’t use Presto yet but are interested in it.

A gentle introduction to the Hive connector

2020-10-20T00:00:00+00:00

TL;DR: The Hive connector is what you use in Trino for reading data from object storage that is organized according to the rules laid out by Hive, without using the Hive runtime code.

One of the most confusing aspects when starting Trino is the Hive connector. Typically, you seek out the use of Trino when you experience an intensely slow query turnaround from your existing Hadoop, Spark, or Hive infrastructure. In fact, the genesis of Trino, formerly known as Presto, came about due to these slow Hive query conditions at Facebook back in 2012.

So when you learn that Trino has a Hive connector, it can be rather confusing since you moved to Trino to circumvent the slowness of your current Hive cluster. Another common source of confusion is when you want to query your data from your cloud object storage, such as AWS S3, MinIO, and Google Cloud Storage. This too uses the Hive connector. If that confuses you, don’t worry, you are not alone. This blog aims to explain this commonly confusing nomenclature.

Hive architecture

To understand the origins and inner workings of Trino’s Hive connector, you first need to know a few high level components of the Hive architecture.

You can simplify the Hive architecture to four components:

The runtime contains the logic of the query engine that translates the SQL -esque Hive Query Language(HQL) into MapReduce jobs that run over files stored in the filesystem.

The storage component is simply that, it stores files in various formats and index structures to recall these files. The file formats can be anything as simple as JSON and CSV, to more complex files such as columnar formats like ORC and Parquet. Traditionally, Hive runs on top of the Hadoop Distributed Filesystem (HDFS). As cloud-based options became more prevalent, object storage like Amazon S3, Azure Blob Storage, Google Cloud Storage, and others needed to be leveraged as well and replaced HDFS as the storage component.

In order for Hive to process these files, it must have a mapping from SQL tables in the runtime to files and directories in the storage component. To accomplish this, Hive uses the Hive Metastore Service (HMS), often shortened to the metastore to manage the metadata about the files such as table columns, file locations, file formats, etc…

The last component not included in the image is Hive’s data organization specification. The documentation of this element only exists in the code in Hive and has been reverse engineered to be used by other systems like Trino to remain compatible with other systems.

Trino reuses all of these components except for the runtime. This is the same approach most compute engines take when dealing with data in object stores, specifically, Trino, Spark, Drill, and Impala. When you think of the Hive connector, you should think about a connector that is capable of reading data organized by the unwritten Hive specification.

Trino runtime replaces Hive runtime

In the early days of big data systems, many expected query turnaround to take a long time due to the high volume of unstructured data in ETL workloads. The primary goal in early iterations of these systems was simply throughput over large volumes of data while maintaining fault-tolerance. Now, more businesses want to run fast interactive queries over their big data instead of running jobs that take hours and produce possibly undesirable results. Many companies have petabytes of data and metadata in their data warehouse. Data in storage is cumbersome to move and the data in the metastore takes a long time to repopulate in other formats. Since only the runtime that executed Hive queries needs replacement, the Trino engine utilizes the existing metastore metadata and files residing in storage, and the Trino runtime effectively replaces the Hive runtime responsible for analyzing the data.

Trino Architecture

The Hive connector nomenclature

Notice, that the only change in the Trino architecture is the runtime. The HMS still exists along with the storage. This is not by accident. This design exists to address a common problem faced by many companies. It simplifies the migration from using Hive to using Trino. Regardless of the storage component used the runtime makes use of the HMS and that is the reason this connector is the Hive connector.

Where the confusion tends to come from, is when you search for a connector from the context of the storage systems you want to query. You may not even be aware the metastore is a necessity or even exists. Typically, you look for an S3 connector, a GCS connector or a MinIO connector. All you need is the Hive connector and the HMS to manage the metadata of the objects in your storage.

The Hive Metastore Service

The HMS is the only Hive process used in the entire Trino ecosystem when using the Hive connector. The HMS is actually a simple service with a binary API using the Thrift protocol. This service makes updates to the metadata, stored in an RDBMS such as PostgreSQL, MySQL, or MariaDB. There are other compatible replacements of the HMS such as AWS Glue, a drop-in substitution for the HMS.

Getting started with the Hive Connector on Trino

To drive this point home, I created a tutorial that showcases using Trino and looking at the metadata it produces. In the following scenario, the docker environment contains four docker containers:

trino - the runtime in this scenario that replaces Hive.
minio - the storage is an open-source cloud object storage.
hive-metastore - the metastore service instance.
mariadb - the database that the metastore uses to store the metadata.

You can play around with the system and optionally view the configurations. The scenario asks you to run a query to populate data in MinIO and then see the resulting metadata populated in MariaDB by the HMS. The next step asks you to run queries over the mariadb database which holds the generated metadata from the metastore.

If you have any questions or run into any issues with the example, you can find us on slack on the #dev or #general channels.

Have fun!

2: Kubernetes, arrays on Elasticsearch, and security breaks the UI

2020-10-07T00:00:00+00:00

This week we had a bit of a technical issue between zoom and OBS so there was some editing done to remove a portion of the broadcast which mainly cuts out us covering the releases. We circle back and give a small summary but unfortunately lost the majority of that part of the conversation.

In this week’s concept, we cover a general overview of kubernetes and how kubernetes is used when deploying and scaling up . We also dive into how this is being used at our guest Cory Darby’s company, BlueCat.

In this week’s pull request covers a pull request https://github.com/trinodb/trino/pull/2462 which closes ticket https://github.com/trinodb/trino/issues/2441. This was actually a PR Brian submitted some months ago. He dives into a bit about Elasticsearch mappings and how Elasticsearch models their data. He then covers how this motivated the pull request addressing the need for explicit mappings of which Elasticsearch fields are array types vs scalar.

In this week’s question, we answer, “Why does the web ui say “disabled”?” This typically comes from a security setup issue and there’s another similar issue when you are using a proxy that we cover as a bonus.

Release Notes discussed: https://trino.io/docs/current/release/release-342.html https://trino.io/docs/current/release/release-343.html

Manfred’s Training - SQL at any scale https://www.simpligility.com/2020/10/join-me-for-presto-first-steps/ <https://learning.oreilly.com/live-training/courses/presto-first-steps /0636920462859/>

Blogs

https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-creating-a-single-point-of-access-to-multiple-postgres-servers-using-starburst-presto

https://postgresconf.org/conferences/postgres-webinar-series/program/proposals/live-demo-unlock-data-in-postgres-servers-to-query-it-with-other-data-sources-like-hive-kafka-other-dbmss-and-more

https://medium.com/@joshua_robinson/presto-and-fast-object-putting-backups-to-use-for-devops-and-machine-learning-s3-46876eef4ffa

Upcoming events

Recent Podcasts: https://www.contributor.fyi/presto https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/

If you want to learn more about Presto yourself, you should check out the O’Reilly Trino Definitive guide. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Launching Presto First Steps training

2020-10-07T00:00:00+00:00

Writing the book Trino: The Definitive Guide with Matt and Martin earlier this year, and then publishing it with O’Reilly was a great experience and has been a great success. Lots of readers took advantage of getting a free digital copy of the book from Starburst.

Now it is time to follow up with a training class. I am pleased to let you know that you can join me for three hours of Presto First Steps in November.

The new course is aimed at beginners with Presto, who want to accelerate their initial understanding and adoption. You ramp up quickly to install and configure Presto, use the CLI, and learn how to query connected data sources with SQL. The class is completely interactive, and I look forward to many of you joining me and bring lots of great questions to ask.

The class includes three interactive training exercises on Katacoda. They allow you to get hands on experience with Presto immediately. Lots of useful tips and tricks are covered in my material, and of course I plan to run a bunch of additional demos. You can find more details about the content of the class in the registration page.

Don’t miss out and make sure you reserve your ticket now!

Hello I'm Brian, Presto Developer Advocate

2020-10-01T00:00:00+00:00

Hello, Presto nation!

My name is Brian, and I’m a new developer advocate working at Starburst. Let me give you a little background on how I got here, and cover how my role can help the Presto community.

My career in computation and databases started in the military. As luck would have it, I worked on a big data team as my first job out of college! I was in a Hive shop that dealt with the typical outdated runtime and slow query turnaround. Eventually, our architect introduced us to Presto as an alternative. I worked with him to start testing and moving our existing use cases built on Hive to use Presto. We also used Elasticsearch and had a few cases that needed to perform joins and unions over the datasets in both Elasticsearch and Hive. There were a few use cases that were not going to immediately be transferable without some modification to the Presto Elasticsearch connector.

Joining the Presto community

The first modification was adding support for Elasticsearch array types, and the second was, support for nested types. My first interaction with the Presto community was incredible! As a serial open-source attempter, I always wanted to get invested in an open-source project. I had started pull requests in various projects. Sometimes I ran into unpleasant maintainers, in other cases the rules were daunting or too confusing to start. I created a pull request only to have it sit there with no communication as to why it wasn’t accepted or even looked at. However, when I first joined Slack, I searched to see if there was already a discussion about array types in the history. I ran into a discussion between Dain and Martin about this issue. I conversed with Martin, who was incredibly polite and willing to take time to discuss how this should be implemented.

Contributing

When I actually pulled the code, I saw how well written and maintained it was compared to many open-source projects I had seen in the past. I made a few changes, wrote a test around my use case, and signed a CLA agreement. After a couple of weeks, my pull request was merged and I had finally contributed to an open-source project. After that interaction, and seeing the code, I wanted to do more. I really saw something special with this community.

While many Presto contributors are doing amazing work contributing code, I noticed there were some holes in other areas of the community that needed to be filled. I started answering questions on Slack, LinkedIn, and Twitter and I planned out a Udemy course for Presto. The initial video I piloted is about tuning the memory configuration of Presto.

Becoming a developer advocate

Around this time I got into contact with some folks at Starburst about joining them to work with the community and Presto full-time! As I joined, we hadn’t figured out what my exact role was at Starburst. Eventually, we decided I would best serve as a developer advocate. What I’ve come to find is this role is aiming to do exactly what I set out to do before I joined. As a developer advocate, I serve the community and act as a liaison between Starburst and the Presto community. Up until this time, that responsibility has been unofficially shared by many of the maintainers of Presto. I am here to simply take some of that responsibility from them and focus all of my efforts on community growth and health.

The health of a community is difficult to define and is generally subject to various signals that we can observe. These signals include an increase in helpful interactions within the community, new members joining the community, members who are actively engaging in the community, diversity of the community, and more. If we start by focusing on making the community successful, the success of the project will follow. Keeping the goal in mind that co-creator David Phillips mentions:

This is the type of project that we look at Postgres as the inspiration. Postgres started in the eighties, it became a SQL system in the nineties, and it’s still in active use and active development today. We say we want Presto to have the same kind of history. - David Phillips

Next Steps

My first goal is to create a larger set of free learning materials, that expand upon my initial goals when planning for my Udemy course. I recently started a show with Manfred Moser called the Presto Community Broadcast. The show landing page is here and contains all the information about the show schedule and where to find new and old episodes. This helps as we can use any relevant material we create on this show for future teaching or blogs. We want these live sessions to be interactive, and look forward to your feedback to understand if our efforts are actually helping, or if you have ideas to improve the show. This show, along with blogs, documentation, and interactive tutorials are how I initially intend to fill some common questions that are received through our Slack and Stack Overflow channels. Another goal of adding these materials is to attract new members to the community. Not all the material may be super relevant to the existing members of the community, but this makes the community much more viable for newer members.

Outside of providing new learning materials, your feedback helps us to understand common problems and allows us to fix them. This feedback will aid us in focusing on issues commonly voiced within the community but somehow get lost in translation. This could be improving the Presto code itself, or it could be making the documentation better, or to address common confusion, even if the confusion comes from a force outside of the Presto community.

For example, I recently wrote a blog about some shady benchmarketing practices that were painting Presto in a bad light. The goal here was to make fun of the wildly bogus claims brought against Presto and the community. What better way to do that than to write a nerdy Justin Bieber parody?

While I have hopefully convinced you all of my mission here. I can’t accomplish any of this in a vacuum. The whole point of my work starts and ends with all of you. I look forward to speaking with and one day post COVID-19, meeting you all at meetups and conferences. For now virtual meetups and the Presto Community Broadcast are a great start. If you have ideas or want to reach out to introduce yourself, you can find me on Slack or Twitter.

Thanks for reading this and being a part of this community. One last thing to tell you about myself, I’m a sucker for cheesy sign-offs so…

For fast data at resto, Presto is the besto!

Presto at Argentina Big Data Meetup 2020-09-23

2020-09-28T00:00:00+00:00

Martin made a guest appearance at the Argentina Big Data Meetup (online) where in the first hour Martin talks about Presto’s past, present, and future. This includes the history from Facebook to Starburst, some context to some early architectural decisions, as well as, why Presto was open-sourced. Finally, Martin covers recent changes along with some upcoming changes on the roadmap.

Slides

The next hour is an interesting talk given by Federico Palladoro covering his company, Jampp’s, migration strategy from EMR Presto to Docker using Nomad vs Kubernetes.

Slides

These presentations are in Spanish.

1: What is Presto, WITH RECURSIVE, and Hive connector

2020-09-24T00:00:00+00:00

Today’s concept covers a big overview of what Presto is for those that are new to Presto. For mor information about Presto, check out the following resources: Website Documentation Download the Free Presto O’Reilly Book Learn how to contribute Join our community on the Slack channel

In this PR we covered pull request 5163 which is actually just a documentation update around the existing experimental features of the WITH RECURSIVE feature. The extended development of this feature is still being tracked and documented in issue 1122. As with many problems in recursion, the solution space typically exponentially increases and so it is something that can easily be misused and cause problems. We run the query and discuss it as well as some of the things that can go wrong. Check out he pull request to see more documentation that was added around it.

In the question of the week, we covered a lot of the confusion around the hive connector(https://trino.io/docs/current/connector/hive.html). Feel free to try out the katacoda example I created and will be nesting within an intro to the hive connector blog. This is running on a non-paid katacoda account so resources are scarce at times and it may take a while to load. Nevertheless, the information written around it will help you quickly have a Presto environment to play with.

Release Notes discussed: https://trino.io/docs/current/release/release-341.html

Upcoming events

Recent Podcasts: https://www.contributor.fyi/presto https://www.dataengineeringpodcast.com/presto-distributed-sql-episode-149/

If you want to learn more about Presto yourself, you should check out the O’Reilly Trino Definitive guide. You can download the free PDF or buy the book online.

Music for the show is from the Megaman 6 Game Play album by Krzysztof Słowikowski.

Read support for original files of Hive transactional tables in Presto

2020-09-23T00:00:00+00:00

In Presto 331, read support for Hive transactional tables was introduced. It works well, if a user creates a new Hive transactional table and reads it from Presto. However, if an existing table is converted to a Hive transactional table, Presto would fail to read data from such a table because read support for original files was missing. Original files are those files in a Hive transactional table that existed before the table was converted into a Hive transactional table. Until version 340, Presto expected all files in a Hive transactional table to be in Hive ACID format. Users would have to perform a major compaction to convert original files into ACID files (i.e. base files) in such tables. This is not always possible as the original flat table (table in non-ACID format) could be huge and converting all the existing data into ACID format can be very expensive.

This blog is an extension of the blog Hive ACID and transactional tables’ support in Presto. It first describes original files and then goes into details of read support for such files that was added in Presto 340.

What are the original files?

Files present in non-transactional ORC tables have the standard ORC schema. When a flat table is converted into a transactional table, existing files are not converted into Hive ACID format. Such files, in a transactional table, that are not in Hive ACID format, are called original files. These files are named as 000000_X, 000000_X_copy_Y. These files don’t have ACID columns and have differences in the schema as follows:

Table Schema

n_nationkey : int,
n_name : string,
n_regionkey : int,
n_comment : string

Original File Schema

struct {
    n_nationkey : int,
    n_name : string,
    n_regionkey : int,
    n_comment : string
}

Delta File Schema

struct {
    operation : int,
    originalTransaction : bigint,
    bucket : int,
    rowId : bigint,
    currentTransaction : bigint,
    row : struct {
        n_nationkey : int,
        n_name : string,
        n_regionkey : int,
        n_comment : string
    }
}

Before Presto 340, Presto used to fail the query if it reads from a Hive transactional table having original files.

Update and delete support on original files

Hive achieves updates/deletes on a row in original files by synthetically generating ACID columns for original files. Presto follows the same mechanism of generating ACID columns synthetically as discussed later.

ACID column generation on original files

Files in Hive ACID format have 5 ACID columns, but we need only 3 columns i.e. originalTransactionId, bucketId and rowId to uniquely identify a row. In this section, we will see how these 3 columns are synthetically generated for original files.

Original transaction ID

An original transaction ID is the write ID when a record is first created. For original files, the original transaction ID is always 0.

Bucket ID

Bucket ID is retrieved from the original file name. For the original file 0000ABC_DEF or 0000ABC_DEF_copy_G, the bucket ID will be ABC.

Row ID

To calculate the row ID, the total row count of all the original files, which come before the current one in lexicographical order, is calculated. Then, the row ID is equal to the sum of the value calculated and local row ID in the current original file.

Here is an example to calculate the global Row ID of the 3rd row of an original File 000000_0_copy_2.

000000_0            -> 	X1 Rows (returned by ORC footer field numberOfRows)

000000_0_copy_1     -> 	X2 Rows (returned by ORC footer field numberOfRows)

000000_0_copy_2     ->	[ Row 0 ]
                        [ Row 1 ]
                        [ Row 2 ]   <- Local Row ID (returned by filePosition in OrcRecordReader) = 2
                                       Global Row ID = (X1+X2+2)
                        [ Row 3 ]

000000_0_copy_3     ->  X4 Rows

Note: As we see, additional computations are required to generate row IDs while reading original files, therefore, read is slower than ACID format files in the transactional table.

Once Presto has the 3 ACID columns for a row, it can check for update/delete on it. Delete deltas, written by Hive for Original files, have row IDs generated by following the same strategy as discussed above, hence, the same logic of filtering out deleted rows as discussed in Hive ACID and transactional tables’ support in Presto works with the original files too.

Changes in Presto to support reading original files

Presto split generation logic and ORC reader is modified to add read support for original files. Following are the changes done at coordinator and worker level:

Split generation

We use a new class named AcidInfo to store OriginalFiles, DeleteDeltaFiles for HiveSplit. BackgroundSplitLoader.loadPartitions is called in an executor to create splits for each partition. In addition to the steps mentioned in blog Hive ACID and transactional tables’ support in Presto, Presto does the following:

Original files, ACID subdirectories (base, delta, delete_delta) are figured out by listing the partition location by Hive AcidUtils Helper class.
Registry for delete deltas DeleteDeltaInfo is created which has minimal information through which delete_delta path can be constructed by the workers.
Registry for original files OriginalFileInfo is created which has information like file name, size and bucket ID.
AcidInfo.Builder keeps a map AcidInfo.Builder.bucketIdToOriginalFileInfoMap of bucket ID to the list of original files belonging to the same bucket.
Hive splits are created for each original file, base and delta directories. Each hive split has a construct AcidInfo.
For an original file split, AcidInfo has:
1. Bucket ID: Bucket ID of the original file.
2. OriginalFilesList: List of all the original files belong to the same bucket calculated from AcidInfo.Builder.bucketIdToOriginalFileInfoMap.
3. DeleteDeltaFilesList: List of delete deltas.
For an base/delta file split, AcidInfo has:
1. DeleteDeltaFilesList: List of delete deltas.

Reading Hive original files data in workers

Hive splits generated during the split generation phase make their way to worker nodes where OrcPageSourceFactory is used to create PageSource for TableScan operator. In addition to the steps mentioned in blog Hive ACID and transactional tables’ support in Presto , Presto does the following:

OrcDeletedRows is created for delete_delta locations, if any.
For original file split, OrcPageSourceFactory fetches originalFilesList from AcidInfo and calculates originalFileRowId by calling OriginalFilesUtils.getPrecedingRowCount and sends this information to OrcPageSource.
OrcPageSouce returns rows from OrcRecordReader which are not present in OrcDeletedRows.

Follow up

For an original file split, the current implementation may take quadratic time in the worst case to calculate global row ID by reading row IDs from the original files’ footer. It may be optimized by keeping a query level cache in worker nodes or by precomputing global row IDs in coordinator during split computation.

Acknowledgements

I would also like to express my gratitude to everyone who helped me throughout developing the feature. Thank you Shubham Tagra for brainstorming sessions and providing continuous guidance on Presto Hive ACID. Thank you Piotr Findeisen for helping me further refine the code with insightful code reviews.

Configuring and Tuning Presto Performance with Dain

2020-08-27T00:00:00+00:00

With the help of David’s training about advanced SQL, you composed a number of useful queries. You gained valuable insights from the resulting data. However these complex queries take time to run. If only you could make them run faster. I think we have just what you need:

Join us for a free webinar Understanding and Tuning Presto Query Processing with Dain Sundstrom.

Update:

We did it again! Joined by over 120 eager students we discussed all sorts of aspects of sizing and tuning your Presto cluster. Yet again we received so many questions that we went over our planned time budget. The material covered is crucial to run a Presto deployment successfully in production, so make sure you check out the recording and the slide deck:

Download the slides

In our new Presto Training Series we give Presto users an opportunity to learn advanced skills from the co-creators of Presto – David Phillips, Martin Traverso and Dain Sundstrom. Beyond the basics, each of the four training sessions covers critical topics for scaling Presto to more users and use cases.

This training session is geared towards helping users tune and size their Presto deployment for optimal performance. Delivered by Dain Sundstrom, this session covers the following topics:

Cluster configuration and node sizing
Memory configuration and management
Improving task concurrency and worker scheduling
Tuning your JVM configuration
Investigating queries for join order and other criteria
Tuning the cost-based optimizer

Date: Wednesday, 9 September 2020

Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC

Duration: 2h

Register now!

We look forward to many Presto users joining us.

Faster Queries on Nested Data

2020-08-14T00:00:00+00:00

Presto 334 adds significant performance improvements for queries accessing nested fields inside struct columns. They have been optimized through the pushdown of dereference expressions. With this feature, the query execution prunes structural data eagerly, extracting the necessary fields.

Motivation

RowType is a built-in data type of Presto, storing the in-memory representation of commonly used nested data types of the connectors, eg. STRUCT type in Hive. Datasets often contain wide and deeply nested structural columns, i.e. a struct column having hundreds of fields, with the fields being nested themselves.

Although such RowType columns can contain plenty of data, most of the analytical queries access just a few fields out of it. Without dereference pushdown, Presto scans the whole column, and shuffles all that data around before projecting the necessary fields. This suboptimal execution causes higher CPU usage, higher memory usage and higher query latencies, than required. The unnecessary operations get even more expensive with wider/deeper structs and more complex query plans.

LinkedIn’s data ecosystem makes heavy usage of nested columns. It is common to have 2-3 levels of nesting, and up to 50 fields in most of our tracking tables. Because of the query execution inefficiency for nested fields, ETL pipelines were set up at LinkedIn to copy the nested columns as a set of top-level columns corresponding to subfields. This step added overhead in our ingestion process and delayed data availability for analytics. It also caused ORC schemas to be inconsistent with the rest of the infrastructure, making it harder to migrate from existing flows on row-oriented formats.

Similarly, Lyft’s schemas make heavy use of nested data to decompose a ride into its routes, riders, segments, modes, and geo-coordinates. Prior to the performance improvements, analytical queries would either need to be run on clusters with very long timeouts, or the data would have to be flattened before being analyzed, adding an extra ETL step. Not only would this be costly, it would also cause the original schema to diverge in our data warehouse making it more difficult for data scientists to understand.

The dereference pushdown optimization in Presto is having a massive impact on the ingestion story at both LinkedIn and Lyft. Nested data is now being made available faster for consumption with a consistency of structure across all stores, while maintaining performance parity for analytical queries.

Example

Say we have a Hive table jobs, with a struct-typed column job_info in the schema. The column job_info is wide and deeply nested, i.e. ROW(company varchar, requirements ROW(skills array(...), education ROW(...), salary ...) , ...). Most queries would access a small percentage of data from this struct using the dereference projection (the . operation). Consider such a query Q below.

SELECT A.appid id, J.job_info.company c
FROM applications A JOIN jobs J
ON A.jobid = J.jobid
LIMIT 100

It should suffice to scan only one field company from J.job_info for executing this query. But, without dereference pushdown, Presto scans and shuffles everything from job_info, only to project a single field at the end.

Solution: Pushdown of Dereference Expressions

With dereference pushdown, Presto optimizes queries by extracting the sufficient fields from a ROW as early as possible. This is enforced by modifying the query plan through a set of optimizers, and can be broadly divided into two parts.

First, dereference projections are extracted in the query plan and pushed as close to the table scan as possible. This happens independent of what the connector is. Secondly, there is a further improvement for Hive tables. The Hive Connector and ORC/Parquet readers have been optimized to scan only the sufficient subfield columns.

Pushdown of predicates on the subfields is also a crucial optimization. For example, if a query has filters on subfields (i.e. a.b > 5), they should be utilized by ORC/Parquet readers while scanning files. The pushdown helps with the pruning of files, stripes and row-groups based on column-level statistics. This optimization is achieved as a byproduct of the above two optimizations.

With the dereference pushdown, queries observe significant performance gains in terms of CPU/memory usage and query runtime, roughly proportional to the relative size of nested columns compared to the accessed fields.

Pushdown in Query Plan

The goal here is to execute dereference projections as early as possible. This usually means performing them right after the table scans.

A projection operation that performs dereferencing on input symbols (i.e. job_info.company) reduces the amount of data going up the plan tree. Pushing dereference projections down means that we are pruning data early. It reduces the amount of data being processed and shuffled in query execution. For the example query Q, the query plan looks like the following when dereference pushdown is enabled.

The projection job_info.company now directly follows the scan of jobs table, avoiding the propagation the job_info through Limit and Join nodes. Note that all of job_info is still being scanned, and pruning it in the reader requires connector-dependent optimizations.

Pushdown in the Hive Connector

In columnar formats like ORC and Parquet, the data is laid out in a columnar fashion even for subfields. If we have a column STRUCT(f1, f2, f3), the subfields f1, f2 and f3 are stored as independent columns. An optimized query engine should only scan the required fields through its ORC reader, skipping the rest. This optimization has been added for Hive connector.

Dereference projections above a TableScanNode are pushed down in the Hive connector as “virtual” (or “projected”) columns. The query plan is modified to refer to these new columns. For the query Q, jobs table would be scanned differently with this optimization, as shown below. The projection is now embedded in the Hive connector. Here, job_info#company can be thought of as a virtual column representing the subfield job_info.company.

The Hive connector handles the projections before returning columns to Presto’s engine. It provides the required virtual columns to format-specific readers. ORC and Parquet readers optimize their scans based on subfields required, increasing their read throughput. Subfield pruning is not possible for row-oriented format readers (e.g. AVRO). For them, Hive connector performs adaptation to project the required fields.

Pushdown of Predicates on Subfields

Columnar formats store per-column statistics in the data files, which can be used by the readers for filtering. eg. if a query contains filter y = 5 for a top-level column y, Presto’s ORC reader can skip ORC stripes and files by looking at the upper and lower bounds for y in the statistics.

The same concept of predicate-based pruning can work for filters involving subfields, since the statistics are also stored for subfield columns. i.e. Presto’s ORC/Parquet reader should be able to filter based on a constraint like x.f1 = 5 for more optimal scans. Good news! In the final optimized plan, predicates on a subfield are pushed down to the hive connector as a constraint on the corresponding virtual column, and later used for optimizing the scan. The complete logic is a bit complicated to explain here, but can be illustrated through the following example.

Given an initial plan with a predicate on a dereferenced field (x.f1 = 5), a chain of optimizers transform it to a more optimal plan with reader-level predicates. In the future, the same optimization will be added to the Parquet reader.

In the final plan, Hive connector knows to scan the column y and the subfield x.f1. It also takes advantage of the “virtual” column constraint x#f1 = 5 for reader-level pruning.

Performance Improvement

Dereference pushdown improves performance for queries accessing nested fields in multiple ways. First, it increases the read throughput for table scans, reducing the CPU time. The pruning of fields during the scan also means lesser data to process for all downstream operators and tasks. So the early projections result in more optimal execution for any operations that involve shuffle or copy of data. Moreover, for ORC/Parquet, the read performance improves in the case of selective filters on subfields.

Below are some experimental results on a production dataset at LinkedIn which contains 3 STRUCT columns, having ~20-30 small subfields in each. The example queries used in the analysis access only a few subfields. The queries have been listed as their approximate query shape for the sake of brevity. The plots compare CPU usage, peak memory usage and averaged query wall time.

CPU usage and peak memory usage show orders-of-magnitude improvement in presence of dereference pushdown. Query wall times also reduce considerably, and this improvement is more drastic for the relatively complex JOIN query, as expected.

Please note that these are not benchmarks! The performance improvement you’ll see will vary depending on how many columns are contained in your nested data versus how many you’ve referenced. At Lyft we saw improvements of 50x for some queries!

Future Work

The pushdown of dereference expressions can be extended to arrays. i.e. dereference operations applied after unnesting an array should also get pushed down to the readers. For example, using our jobs table from before, our jobs.job_info structure may contain a repeating structure such as required_skills. With the following query, the entire required_skills structure would be read even though only a small part of it is being referenced.

SELECT S.description
FROM jobs J
CROSS JOIN UNNEST (job_info.required_skills) S
WHERE S.years_of_experience >= 2

The work for this improvement is being tracked in this issue.

Similar to Hive Connector, connector-level dereference pushdown can be extended to other connectors supporting nested types.

Another future improvement will be the pushdown of predicates on subfields for data stored in Parquet format. Although the pruning of nested fields occurs with Parquet, the predicates are not yet pushed down into the reader.

Conclusion

Pushing down dereference operations in the query provides massive performance gains, especially while operating on large structs. At LinkedIn and Lyft, this feature has shown great impact for analytical queries on nested datasets.

We’re excited for the Presto community to try it out. Feel free to dig into this github issue for technical details. Please reach out to us on Slack for further disucssions or reporting issues.

Securing Presto with Dain

2020-08-13T00:00:00+00:00

All the useful and fast running queries your created with the knowledge from David’s training about advanced SQL and Martin’s training about query tuning created a problem. You now have lots of users on your Presto cluster that want to access all sorts of different data source, have different privileges and corporate security asked about your plans. How about you tap into some help from Dain:

Join us for a free webinar Securing Presto with Dain Sundstrom.

Update:

What a great training session! Dain captured the audience and lots of questions were covered beyond all the great material from the slides. Everything is now available for your convenience:

Download the slides

In this training session Dain teaches you how to securely deploy Presto at scale. We cover how to secure Presto itself, access to Presto, and access to your underlying data. This session covers the following topics:

Presto authentication, including password & LDAP Authentication
Authorization to access your data sources
Encryption including Presto client-to-coordinator communication
Secure communication in the cluster
Support for Kerberos
Secrets usage for configuration files including catalogs

Date: Wednesday, 26 August 2020

Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC

Duration: 2h

Register now!

We look forward to many Presto users joining us.

Happy Eighth Birthday Presto!

2020-08-08T00:00:00+00:00

Today, Presto turned eight years old! As Presto co-creator Dain Sundstrom points out, there’s a reason why the eighth birthday is a little special:

Even though Presto is a relatively young project, countless consumers, developers, and business personnel have felt its impact. It’s pretty clear that there’s a lot going on with this project since its inception eight years ago. Recently, the Presto project hit a stunning twenty thousand commits:

It makes you ponder how Presto became so successful in such a short amount of time. Should the credit be given to the four founders who brought Presto to life? Perhaps the supporting companies that provided the conditions that called for such innovation? Or was it the community built around Presto since its inception that has enabled this radical success?

In my mind, it’s a combination of these conditions but with a special emphasis on the latter. Without the founders’ dedication to designing Presto for speed and extensibility and putting emphasis on a welcoming and inclusive open-source community we wouldn’t have seen Presto outside the walls of Facebook. Without companies like Facebook, Teradata, Netflix, and Treasure Data that acted as a catalyst to this change, we wouldn’t have the initial use cases that tested Presto’s scalable design and shined a light on Presto to bring the awareness to the masses. Finally, without the passionate community of developers who took an interest in giving back their time and efforts, Presto wouldn’t be anywhere near as robust or flexible as it is today. Now Presto has reached an unprecedented level of maturity and helped many developers, scientists, and analysts find the answers they were looking for. It speaks volumes about just how special the project really is.

This community of developers is really special in that the level of expectations for developers new to OSS (open source software) is really a low bar. Speaking from personal experience as a serial OSS attempter, when I joined I noticed everyone treating each other with respect, a willingness to teach, and a deliberate openness to new ideas. I interfaced with engineers working at Starburst, the founders of Presto, and many passionate developers like myself who also knew a thing or two about the project that were so helpful to me. This was unlike other experiences I had in the past that made joining an open source community an elite club that only existing members had access to. To me, this inclusiveness is why the presto community is thriving.

The Presto community is most vibrant in the slack channel. Here users and developers may ask questions such as installing and using presto, discussing bug fixes or design changes, or sometimes just sharing great experiences or news related to presto. This slack channel has recently grown to 2300 users with around 500 active users at any given time.

To celebrate Presto really means to celebrate this community, and while we can’t thank every individual who has contributed, we want to thank just a handful of you for your hard work. Thanks to these engineers for their contributions to the Presto project!

While linking you to a blog post may not be a satisfactory thank you, the gratitude is perhaps best stated on the presto-users google group by co-creator Martin Traverso:

When Dain, David, Eric and I started the project that many years ago, we had the goal to make it open source and build a community around it. What we never imagined was how far it would go, how widely it would be adopted across the entire world, and how many amazing people we would meet and get a chance to work with along the way.

Congratulations to everyone who played a part in that journey. It’s been a great ride so far. Here’s to another 8 years!”

Thanks to everyone who has contributed to Presto, congratulations to the founders for starting such an amazing project. Together let’s make Presto the most useful analytics tool yet!

Understanding and Tuning Presto Query Processing with Martin

2020-07-30T00:00:00+00:00

With the help of David’s training about advanced SQL you composed a number of useful queries. You gain valuable insights from the resulting data. However these complex queries take time to run. If only you could make them run faster. I think we have just what you need coming up.

Join us for a free webinar Understanding and Tuning Presto Query Processing with Martin Traverso.

Update:

We are delighted that such an advanced topic attracted close to 150 attendees. Everyone learned a lot and many additional questions came up during class and in the Q&A overtime. Take advantage of the slides and recording to recapture, or if you could not attend:

Download the slides

In this training session Martin helps to understand how Presto executes query. That knowledge can help you improve query performance. For example, the explain plan is a powerful tool, but reading the plans and make sense of them can be overwhelming. We explore how to create an explain plan for you query and how to read it. We look at the work the cost-based optimizer performs and how you can potentially help Presto run your queries even faster. This session covers to following topics:

Explain the EXPLAIN
Learn how queries are analyzed and executed
Understand what the optimizer does, including some of its limitations
Showcase the cost-based optimizer

Date: Wednesday, 12 August 2020

Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC

Duration: 2h

Register now!

We look forward to many Presto users joining us.

Presto for Analytics at Pinterest

2020-07-22T00:00:00+00:00

After State of Presto and the two real world examples from Zuora and Arm Treasure Data, I hope you are ready to hear from a well known brand using Presto in their analytics ecosystem – Pinterest:

Presto: A key component for analytics at Pinterest

Update:

Our webinar was well received and caused a whole bunch of questions. Check out the slides and video recording:

Download the slides

Join us to learn how Pinterest uses Presto to meet the company’s rapidly increasing analytics need, while keeping the cost low.

Presto plays an important role in Pinterest’s analytics ecosystem. Find out how runs Presto at the company, how Pinterest leverages warning systems to guide users to write better queries, and how Pinterest scales up their clusters to meet with their rapid growing and complex workloads.

The following topics are discussed:

Presto integrated with Pinterest infrastructure
Setup of a warning systems to guide users write better queries
Management of complex workloads

Speakers:

Pucheng Yang is a software engineer at Pinterest working on the Presto, SparkSQL and Hive query engines. He joined the company two years ago as a new grad.
Yi He is a software engineer at Pinterest. Prior to Pinterest, he worked at Facebook on Presto OLAP and query federation.

Date: Wednesday, 19 August 2020

Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC

Register now!

We look forward to many Presto users joining us and participating in the webinar with their questions.

Advanced SQL in Presto with David

2020-07-15T00:00:00+00:00

You have read our book Trino: The Definitive Guide, practiced with various SQL examples, and consulted our Presto documentation. Great steps to become a Presto and SQL expert. However, learning efficient and advanced SQL can take years of experience. Luckily we have some help from an expert coming your way.

Join us for a free webinar Advanced SQL in Presto with David Phillips.

Update:

With nearly 200 live attendees and a two hour session we ended with lots of questions from the engaged audience. After 20 minutes overtime we wrapped up the successful event. Check out the presentation slides and the recording:

Download the slides

Our first session with David is geared towards helping users understand how to run more complex and comprehensive SQL queries with Presto. Delivered by David Phillips, this session covers to following topics:

Using JSON and other complex data types
Advanced aggregation techniques
Window functions
Array and map functions
Lambda expressions
Many other SQL functions and features

Date: Wednesday, 29 July 2020

Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC

Duration: 2h

Register now!

We look forward to many Presto users joining us.

Presto Migration at Arm Treasure Data

2020-07-06T00:00:00+00:00

Both events of our virtual Presto Summit tour event, State of Presto and the Zuora presentation were well received and recordings are available for you to watch. Your next chance to learn more about Presto in the real world comes from Arm Treasure Data and is presented by Taro L. Saito:

Presto at Arm Treasure Data: A Journey of Migrating 1 Million Presto Queries

Update:

We had a great event with some in-depth, detailed questions from the audience. Check out the recording to learn more:

Join us to discover how as part of their customer data platform, Arm Treasure Data utilizes Presto as the query engine processing over 1 million queries per day. This system supports the data business of over 500 companies in three regions - US, EU, and Asia.

Arm Treasure Data has been using Presto 0.205 and in 2019 started a big migration project to Presto 317. Although they performed extensive query simulations to check any incompatibilities, the team faced many unexpected challenges. In this session you learn more about their migration of the production system:

Technical details on many challenges
Key lessons learned
Latest updates on AWS Graviton2, the next generation of 64-bit Arm instance types that can be used for running Presto

Our speaker, Taro L. Saito, is a principal software engineer at Arm Treasure Data and Ph.D. of computer science at the University of Tokyo. He has built a cloud database service at Arm Treasure Data, which is processing over millions of queries every day. Previously, he worked as an assistant professor at the University of Tokyo, studying distributed database systems and their applications to genome sciences. He has created several open-source projects, including Airframe, MessagePack, and various sbt plugins (sbt-sonatype, sbt-pack) for Scala that help to publish thousands of OSS projects.

Date: Thursday, 16 July 2020

Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC

Register now!

We look forward to many Presto users joining us.

Data Integrity Protection in Presto

2020-06-25T00:00:00+00:00

It all started on an Thursday afternoon in March, when Karol Sobczak was grilling Presto with heavy rounds of benchmarks, as we were ramping up to Starburst Enterprise Presto 332-e release. Karol discovered what seemed to be a serious regression, and turned out to be even more serious Cloud environment issue.

Presto Benchmarks

At the Presto project, we take serious care of stability and efficiency, so releases undergo rigorous performance benchmarks. The intention is to safe guard against any performance regressions or stability problems. Usually, the performance improvements are benchmarked separately when they are being added to the codebase. At Starburst, those benchmarks are even more important, especially for the Starburst Enterprise Presto LTS releases.

On a side note, we use Benchto for organizing Presto benchmark suites, executing them and collecting the results. We use managed Kubernetes in a public cloud for provisioning Presto clusters, along with Starburst Enterprise Presto Kubernetes. We use Jupyter for producing result reports in HTML and PDF formats.

Alleged Regression

It all started in March, when Karol Sobczak was grilling Presto with heavy rounds of benchmarks for the Starburst Enterprise Presto 332-e release. On one Thursday afternoon he reported stability problems, with few benchmark runs failing with exceptions similar to:

Query failed (#20200326_150852_00338_dj225): Unknown block encoding:
LONG_ARRAY� � �� � @@@���� �@  @ � �@@@ @@� @�@D�� @@��@ `� @@� @#�@ � 0�
... (9550 more bytes)

In Presto, a block encoding is a way of encoding a particular Block type (here, a LongArrayBlock). They are used when exchanging blocks of data between Presto nodes, or in spill to disk. Blocks form a polymorphic class hierarchy, so every time a block is encoded, we need to also store the encoding identifier. The encoding identifier (here, the LONG_ARRAY string) is written as <string length> (4-byte, signed integer in little-endian) followed by <string bytes> containing the UTF-8 representation of the encoding id. Clearly, in the case above, the receiver read the <encoding id length> as 9623 instead of 10! How could that be ever possible?

Presto 332 brought a lot of good changes and upgrade to Java 11 was one of them. Therefore, Starburst Enterprise Presto 332-e was the first Starburst release using Java 11 by default. For earlier releases, we ran benchmarks using AWS EC2 machines orchestrated with Starburst’s Presto CloudFormation Template (CFT). This was also the first time we did Presto release benchmarks running on Kubernetes clusters, with AWS EKS. We could suspect many different factors as being the cause. We started to sift through the code, search team’s “collective brain” and the Internet for any ideas. One of the important sources was Vijay Pandurangan’s writeup on data corruption bug discovered by Twitter in 2015. Of course, we also repeated benchmark runs. Seeing is believing.

Production issues

On the next day, a customer reported similar problems with their Presto cluster. Of course, they were not running a yet-to-be-released version that we were still benchmarking. They run into what seemed to be a very serious regression in a Starburst Enterprise Presto 323-e release line. The customer was also using the AWS cloud, but not the Kubernetes deployment. They were using CFT-based deployment – the same stack we were using for all our release benchmarks so far – and we had never run into issues like this before. As the customer was using a fresh-off-press latest minor release, we decided (in spirit of global health care trend) to “quarantine” that release and roll back the customer installation to the previous version.

However, the fact that a small bug fix release triggered data problems was unnerving. The fact that we did not discover any of these problems before, was even more unnerving.

More testing – the data corruption

As we were running more and more, and even more test runs, we discovered new failure modes. For example:

Query failed (#20200327_001931_00020_8di4r): Cannot cast DECIMAL(7, 2) '18734974449861284.67' to DECIMAL(12, 2)

Well, this message is not wrong. It’s not possible to cast 18734974449861284.67 to DECIMAL(12, 2). Except that it is also not possible to have a DECIMAL(7, 2) with such value. Something wrong happened to the data. At that moment, we realized the problem was very serious, because data could become corrupted. This corrupted data could lead to a failure (like above), but it could also lead to incorrect query results, or incorrect data being persisted (in case of INSERT or CREATE TABLE AS queries). We created a virtual War Room (that is, a Slack channel), got together all Presto experts and our experienced field team to discuss potential causes, further diagnostics and mitigation strategies.

Since the problem was affecting data exchanges between Presto nodes, we listed the following strategies to try to dissect the problem:

determining which query (queries) is (are) causing failures,
running with HTTP/2,
reverting to running on Java 8,
enabling exchange compression (as decompression is very sensitive to data corruption),
trying to upgrade Jetty,
determining whether failures correlate with JVM GC activity,
inspecting the source code.

Different configuration

We were able to quickly prototype and verify some of the ideas. Switching to HTTP/2 or upgrading Jetty to the latest version did not help. Nor did downgrading to Jetty version that had been using for a long time. We also verified that problem was reproducible with Java 8, so we concluded Java 11 was not the cause of it.

Checksums

We identified the problem occurs somewhere within exchanges, between one Presto worker node serializing a Page object (basic unit of data processing in Presto) and another node deserializing it.

While decimal cast failure didn’t directly point at the data corruption problem (there could be many other reasons for it), there was no other explanation for the Unknown block encoding exceptions. The serialization is done in PagesSerde.serialize (used by TaskOutputOperator, the data sender) and deserialization is done in PagesSerde.deserialize (used by ExchangeOperator, the receiver of the data). As the logic is nicely encapsulated in PagesSerde class, we added checksums to the serialized data: <checksum> <serialized page>. This felt like a smart move – except that it gave us nothing more than a confirmation that there is a problem (“checksum failure”). This we already knew.

We considered adding logging to capture data going out from one node and going in on another node, but that would be huge amount of logs. One run of benchmarks transfers hundreds of terabytes of data between the nodes.

We went ahead and created a Presto build that added data redundancy to be able to reconstruct the data on the receiving side. There are many well-known error-correction codes (e.g. Reed–Solomon error correction available in Hadoop 3). In our case, speed of implementation (a.k.a. simplicity) was a deciding factor, so we added data mirroring: <checksum> <serialized page> <serialized page>. In order to avoid logging of all the data exchanges, we added the deserialized pages (both copies) to the exceptions being raised.

java.sql.SQLException: Query failed (#20200401_113622_00676_p7qp7): Hash mismatch, read: 1251072184702746109, calculated: 7591448164918409110
    Suppressed: java.lang.RuntimeException: Slice, first half: 040000000A0000004C4F4E475F415252.... (945 kilobytes)
    Suppressed: java.lang.RuntimeException: Slice, secnd half: 040000000A0000004C4F4E475F415252.... (945 kilobytes)

The exception told us the first part was changed, since read checksum did not match the calculated checksum (it was calculated based on the first copy of the data and was different than the checksum calculated on the sending side). Having the encoded data in the exception like that, it was easy to extract the actual data and compare, so now we could see how the data was changed.

cat failure.txt | grep 'Slice, first half' | cut -d: -f4- | sed 's/^ *//' | xxd -r -p > changed
cat failure.txt | grep 'Slice, secnd half' | cut -d: -f4- | sed 's/^ *//' | xxd -r -p > original

Comparing binary files is fun, but in practice it can be more convenient to compare hexdump output. The output below was created with vimdiff <(hexdump -Cv original) <(hexdump -Cv changed).

++--6064 lines: 00000000  04 00 00 00 0a 00 00 00  4c 4f 4...|+ +--6064 lines: 00000000  04 00 00 00 0a 00 00...
 00017b00  00 cb 6a 25 00 00 00 00  00 cb 6a 25 00 00 00 00  |  00 cb 6a 25 00 00 00 00  00 cb 6a 25 00 00 00 00
 00017b10  00 cb 6a 25 00 00 00 00  00 cb 6a 25 00 00 00 00  |  00 cb 6a 25 00 00 00 00  00 cb 6a 25 00 00 00 00
 00017b20  00 cb 6a 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 cb 6a 25 00 00 00 00  00 e1 67 25 00 00 00 00
 00017b30  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00
 00017b40  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00
 00017b50  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00
 00017b60  00 e1 67 25 00 00 00 00  00 e1 67 25 00 00 00 00  |  00 e1 67 25 00 00 00 00  e1 67 25 00 00 00 00 00
 00017b70  00 e1 67 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  e1 67 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017b80  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017b90  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017ba0  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017bb0  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017bc0  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017bd0  00 fb 69 25 00 00 00 00  00 fb 69 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  fb 69 25 00 00 00 00 00
 00017be0  00 fb 69 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  fb 69 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017bf0  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c00  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c10  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c20  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c30  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c40  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c50  00 5e 6a 25 00 00 00 00  00 5e 6a 25 00 00 00 00  |  5e 6a 25 00 00 00 00 00  5e 6a 25 00 00 00 00 00
 00017c60  00 34 68 25 00 00 00 00  00 34 68 25 00 00 00 00  |  34 68 25 00 00 00 00 00  34 68 25 00 00 00 00 00
 00017c70  00 34 68 25 00 00 00 00  00 34 68 25 00 00 00 00  |  34 68 25 00 00 00 00 00  34 68 25 00 00 00 00 00
 00017c80  00 34 68 25 00 00 00 00  00 34 68 25 00 00 00 00  |  34 68 25 00 00 00 00 00  34 68 25 00 00 00 00 00
 00017c90  00 34 68 25 00 00 00 00  00 34 68 25 00 00 00 00  |  34 68 25 00 00 00 00 00  34 68 25 00 00 00 00 00
 00017ca0  00 34 68 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  34 68 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017cb0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017cc0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017cd0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017ce0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017cf0  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017d00  00 2e 6b 25 00 00 00 00  00 2e 6b 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  2e 6b 25 00 00 00 00 00
 00017d10  00 2e 6b 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  2e 6b 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d20  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d30  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d40  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d50  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d60  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d70  00 cf 68 25 00 00 00 00  00 cf 68 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  cf 68 25 00 00 00 00 00
 00017d80  00 cf 68 25 00 00 00 00  00 6b 69 25 00 00 00 00  |  cf 68 25 00 00 00 00 00  6b 69 25 00 00 00 00 00
 00017d90  00 6b 69 25 00 00 00 00  00 6b 69 25 00 00 00 00  |  6b 69 25 00 00 00 00 00  6b 69 25 00 00 00 00 00
 00017da0  00 6b 69 25 00 00 00 00  00 6b 69 25 00 00 00 00  |  6b 69 25 00 00 00 00 00  6b 69 25 00 00 00 00 00
 00017db0  00 6b 69 25 00 00 00 00  00 6b 69 25 00 00 00 00  |  6b 69 25 00 00 00 00 00  6b 69 25 00 00 00 00 00
 00017dc0  00 6b 69 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  6b 69 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017dd0  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017de0  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017df0  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017e00  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017e10  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017e20  00 7e 66 25 00 00 00 00  00 7e 66 25 00 00 00 00  |  7e 66 25 00 00 00 00 00  7e 66 25 00 00 00 00 00
 00017e30  00 a9 66 25 00 00 00 00  00 a9 66 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  a9 66 25 00 00 00 00 00
 00017e40  00 a9 66 25 00 00 00 00  00 a9 66 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  a9 66 25 00 00 00 00 00
 00017e50  00 a9 66 25 00 00 00 00  00 a9 66 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  a9 66 25 00 00 00 00 00
 00017e60  00 a9 66 25 00 00 00 00  00 a9 66 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  a9 66 25 00 00 00 00 00
 00017e70  00 a9 66 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  a9 66 25 00 00 00 00 00  fb 67 25 00 00 00 00 00
 00017e80  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  fb 67 25 00 00 00 00 00  fb 67 25 00 00 00 00 00
 00017e90  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  fb 67 25 00 00 00 00 00  fb 67 25 00 00 00 00 00
 00017ea0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017eb0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017ec0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017ed0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017ee0  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 fb 67 25 00 00 00 00
 00017ef0  00 fb 67 25 00 00 00 00  00 5e 6b 25 00 00 00 00  |  00 fb 67 25 00 00 00 00  00 5e 6b 25 00 00 00 00
++--23429 lines: 00017f00  00 5e 6b 25 00 00 00 00  00 5e ...|+ +--23429 lines: 00017f00  00 5e 6b 25 00 00 0...

It is perhaps no surprise that 0 bytes occupied a lot of the data transfer. For performance reasons, Presto uses fixed-length representation for fixed-length data types, such as integers or decimals. Compressing data for the sake of network exchanges makes sense, if your network is saturated and CPU is not, and is off by default. If we replace 0 bytes with __, we see that the difference between original (left) and changed (right) is pretty interesting: it looks like one 0 byte was shifted from offset 0x00017b60+5 (approximately) to 00017e90+12 (approximately). This is very unusual data change. We got other failure samples showing similar data changes, with varying offset numbers.

++--6064 lines: 00000000  04 00 00 00 0a 00 00 00  4c 4f 4...|+ +--6064 lines: 00000000  04 00 00 00 0a 00 00...
 00017b00  __ cb 6a 25 __ __ __ __  __ cb 6a 25 __ __ __ __  |  __ cb 6a 25 __ __ __ __  __ cb 6a 25 __ __ __ __
 00017b10  __ cb 6a 25 __ __ __ __  __ cb 6a 25 __ __ __ __  |  __ cb 6a 25 __ __ __ __  __ cb 6a 25 __ __ __ __
 00017b20  __ cb 6a 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ cb 6a 25 __ __ __ __  __ e1 67 25 __ __ __ __
 00017b30  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __
 00017b40  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __
 00017b50  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __
 00017b60  __ e1 67 25 __ __ __ __  __ e1 67 25 __ __ __ __  |  __ e1 67 25 __ __ __ __  e1 67 25 __ __ __ __ __
 00017b70  __ e1 67 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  e1 67 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017b80  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017b90  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017ba0  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017bb0  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017bc0  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017bd0  __ fb 69 25 __ __ __ __  __ fb 69 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  fb 69 25 __ __ __ __ __
 00017be0  __ fb 69 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  fb 69 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017bf0  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c00  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c10  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c20  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c30  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c40  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c50  __ 5e 6a 25 __ __ __ __  __ 5e 6a 25 __ __ __ __  |  5e 6a 25 __ __ __ __ __  5e 6a 25 __ __ __ __ __
 00017c60  __ 34 68 25 __ __ __ __  __ 34 68 25 __ __ __ __  |  34 68 25 __ __ __ __ __  34 68 25 __ __ __ __ __
 00017c70  __ 34 68 25 __ __ __ __  __ 34 68 25 __ __ __ __  |  34 68 25 __ __ __ __ __  34 68 25 __ __ __ __ __
 00017c80  __ 34 68 25 __ __ __ __  __ 34 68 25 __ __ __ __  |  34 68 25 __ __ __ __ __  34 68 25 __ __ __ __ __
 00017c90  __ 34 68 25 __ __ __ __  __ 34 68 25 __ __ __ __  |  34 68 25 __ __ __ __ __  34 68 25 __ __ __ __ __
 00017ca0  __ 34 68 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  34 68 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017cb0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017cc0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017cd0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017ce0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017cf0  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017d00  __ 2e 6b 25 __ __ __ __  __ 2e 6b 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  2e 6b 25 __ __ __ __ __
 00017d10  __ 2e 6b 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  2e 6b 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d20  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d30  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d40  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d50  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d60  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d70  __ cf 68 25 __ __ __ __  __ cf 68 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  cf 68 25 __ __ __ __ __
 00017d80  __ cf 68 25 __ __ __ __  __ 6b 69 25 __ __ __ __  |  cf 68 25 __ __ __ __ __  6b 69 25 __ __ __ __ __
 00017d90  __ 6b 69 25 __ __ __ __  __ 6b 69 25 __ __ __ __  |  6b 69 25 __ __ __ __ __  6b 69 25 __ __ __ __ __
 00017da0  __ 6b 69 25 __ __ __ __  __ 6b 69 25 __ __ __ __  |  6b 69 25 __ __ __ __ __  6b 69 25 __ __ __ __ __
 00017db0  __ 6b 69 25 __ __ __ __  __ 6b 69 25 __ __ __ __  |  6b 69 25 __ __ __ __ __  6b 69 25 __ __ __ __ __
 00017dc0  __ 6b 69 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  6b 69 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017dd0  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017de0  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017df0  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017e00  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017e10  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017e20  __ 7e 66 25 __ __ __ __  __ 7e 66 25 __ __ __ __  |  7e 66 25 __ __ __ __ __  7e 66 25 __ __ __ __ __
 00017e30  __ a9 66 25 __ __ __ __  __ a9 66 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  a9 66 25 __ __ __ __ __
 00017e40  __ a9 66 25 __ __ __ __  __ a9 66 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  a9 66 25 __ __ __ __ __
 00017e50  __ a9 66 25 __ __ __ __  __ a9 66 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  a9 66 25 __ __ __ __ __
 00017e60  __ a9 66 25 __ __ __ __  __ a9 66 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  a9 66 25 __ __ __ __ __
 00017e70  __ a9 66 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  a9 66 25 __ __ __ __ __  fb 67 25 __ __ __ __ __
 00017e80  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  fb 67 25 __ __ __ __ __  fb 67 25 __ __ __ __ __
 00017e90  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  fb 67 25 __ __ __ __ __  fb 67 25 __ __ __ __ __
 00017ea0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017eb0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017ec0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017ed0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017ee0  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ fb 67 25 __ __ __ __
 00017ef0  __ fb 67 25 __ __ __ __  __ 5e 6b 25 __ __ __ __  |  __ fb 67 25 __ __ __ __  __ 5e 6b 25 __ __ __ __
++--23429 lines: 00017f00  00 5e 6b 25 00 00 00 00  00 5e ...|+ +--23429 lines: 00017f00  00 5e 6b 25 00 00 00...

Outside of Presto

We captured a cluster of 10 nodes manifesting the problem and hold on to it in further investigation. Our testing showed that TPC-DS query 72 is significantly more likely to fail than other queries. On the isolated cluster, a loop running TPC-DS query 72 would reproduce a failure within 2 hours. We added additional information in the exception reporting checksum failure, to identify on which node the failure happens and which node is the sender of the data. For all the failures on the isolated 10-node cluster, the failure would always happen with one worker node (10.83.28.124, the Receiver) reading data from certain other worker node (10.142.0.84, the Sender). We stopped all other workers and attempted to reproduce the problem outside of Presto.

One of the things we tried was checking the network reliability with netcat. On the Sender node, we ran the following:

dd if=/dev/urandom of=/tmp/small-data bs=$[1024*1024] count=1
ncat -l 20165 --keep-open --max-conns 100 --sh-exec "cat /tmp/small-data" -v

On the Receiver node we run the following in a loop:

ncat --recv-only 10.142.0.84 20165 > "/tmp/received"
sha1sum "/tmp/received"

Running this in a loop for just a few dozens of seconds resulted in /tmp/received different than /tmp/small-data. Sometimes the /tmp/received would be “just” a prefix of the original data and sometimes there would be data displacements within the /tmp/received file. We cross-checked these observations on a different pair of nodes and also on a different public cloud, using same netcat version. We observed the same behavior everywhere we checked it, with varying, but high error rate, over 1%. This high error rate was what led us to discard this evidence – there was either something wrong with the way we used netcat, we violated netcat’s assumptions or netcat was not the right tool for this task.

We searched for other tools that we could use. iperf is a well-known tool for stressing out the network. Sadly, iperf does not have an ability to verify exchanged data integrity yet. We deployed a home-made, Java-based tool instead. using this tool we were able to reproduce the data corruption problem between Sender and Receiver nodes. The error rate was very low. To reproduce the problem we had to saturate the network and use multiple concurrent TCP connections (which is very similar to how Presto uses the network). This validated our observations that the data corruption problem was happening outside of Presto. Interestingly, we were unable to reproduce the problem when stressing the network with a single TCP connection.

Mystery unsolved

Obviously, with such a strong evidence gathered so far, we opened a support ticket with AWS. The support team was great and did a lot of investigation on their own. Unfortunately, the problem went away before the support team was able to get to the bottom of it. It was April already. Perhaps, one day someone will find the smoking gun and write the rest of this story.

Conclusions

We implemented data integrity protection measure in Presto. We used Martin Traverso’s Java implementation of the XXHash64 algorithm. Thanks to its speed, we could enable it by default, with negligible impact on overall query performance. By default, data integrity violation results in query failure, but Presto can be configured to retry as well, by setting the exchange.data-integrity-verification configuration property.

This chapter of the Presto history should remain closed and we should be able to forget about all this. However, a couple days ago, a customer running Presto on Azure Kubernetes Service (AKS) reported an exception like the one below. On the next day, we bumped into this as well. We were doing CREATE TABLE AS SELECT to prepare a new benchmark dataset on Azure Storage.

Query failed (#20200622_124803_00000_abcde): Checksum verification failure on 10.12.3.47
    when reading from http://10.12.3.53:8080/v1/task/20200622_124803_00000_abcde.2.6/results/5/8:
    Data corruption, read checksum: 0xe17e6eaeb665dc6e, calculated checksum: 0xb3540697373195f1

It is no fun when a query fails like this. However – what a joy and pride that it did not silently return incorrect query results. Rest assured, Presto will not return incorrect results, wherever you run it.

Credits

Special thanks go to our customers, for your understanding and the trust you have in us. Without you, Starburst wouldn’t be as fun place as it is! Thanks to Łukasz Walkiewicz and Karol Sobczak for fantastic benchmark and experimentation automation and your help with running the experiments! Thanks to Will Morrison for finding the Sender and Receiver machines that reproduced the problem so nicely! Thanks to Martin Traverso, Dain Sundstrom and David Phillips for guidance, ideas, clever tips and code pointers! Thanks to Łukasz Osipiuk for running experiments, cross-checking the results and helping keep sanity. Shout out to the whole Starburst team – it was truly a team’s work!

□

Presto at Zuora

2020-06-16T00:00:00+00:00

The Presto Summit is morphing into a series of virtual events, and we already started with the State of Presto webinar recently. Next up is a talk about Presto with lots of practical insights at Zuora presented by Henning Schmiedehausen:

Using Presto as Query Layer in a Distributed Microservices Architecture

Update:

We had a great event with lots of questions from the audience, taking us beyond the planned time frame. Check out the recording to learn more:

Presto has found its place as a SQL-based query engine for big data in the new stack, but it does not have to be limited to big data and large scale analytics applications.

In this presentation, Henning highlights how Presto helped Zuora to transform its monolithic data architecture for an online transactional system into a loosely coupled, services-based architecture. In doing so it helped to solve the most pressing problem when splitting up data, providing direct to access production data across many services and enabling complex data queries across live data. Zuora Data Query was an instant success when it was launched.

In this webinar you discover:

The technical architecture that embedded Presto in the Zuora service stack
The pieces of Presto that could be used directly off the shelf
How we productized it into a system that now serves huge numbers of small queries against live data

Our speaker, Henning Schmiedehausen, Chief Architect at Zuora, is a thought leader in the open source Java community with more than 25 years of experience contributing to successful open source projects. At Zuora he serves as the chief architect and is responsible for the technical aspects of transforming the Zuora system to a new, scalable, and flexible Microservices Architecture. Prior to Zuora he worked at Facebook and Groupon as a principal engineer. Henning also served as a board member at the Apache Software Foundation

Date: Tuesday, 30 June 2020

Time: 10am PDT (San Francisco), 1pm EDT (New York), 6pm BST (London), 5pm UTC

Register now!

We look forward to many Presto users joining us.

Dynamic partition pruning

2020-06-14T00:00:00+00:00

Star-schema is one of the most widely used data mart patterns. The star schema consists of fact tables (usually partitioned) and dimension tables, which are used to filter rows from fact tables. Consider the following query which captures a common pattern of a fact table store_sales partitioned by the column ss_sold_date_sk joined with a filtered dimension table date_dim:

SELECT COUNT(*) FROM 
store_sales JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
WHERE d_following_holiday='Y' AND d_year = 2000;

Without dynamic filtering, Presto will push predicates for the dimension table to the table scan on date_dim but it will scan all the data in the fact table since there are no filters on store_sales in the query. The join operator will end up throwing away most of the probe-side rows as the join criteria is highly selective. The current implementation of dynamic filtering improves on this, however it is limited only to broadcast joins on tables stored in ORC or Parquet format. Additionally, it does not take advantage of the layout of partitioned Hive tables.

With dynamic partition pruning, which extends the current implementation of dynamic filtering, every worker node collects values eligible for the join from date_dim.d_date_sk column and passes it to the coordinator. Coordinator can then skip processing of the partitions of store_sales which don’t meet the join criteria. This greatly reduces the amount of data scanned from store_sales table by worker nodes. This optimization is applicable to any storage format and to both broadcast and partitioned join.

Design considerations

This optimization requires dynamic filters collected by worker nodes to be communicated to the coordinator over the network. We needed to ensure that this additional communication overhead does not overload the coordinator. This was achieved by packing dynamic filters into Presto’s existing framework for sending status updates from worker to coordinator.

DynamicFilterService was added on the coordinator node to perform dynamic filter collection asynchronously. Queries registered with this service can request dynamic filters while scheduling splits without blocking any operations. This service is also responsible for ensuring that all the build-side tasks of a join stage have completed execution before constructing dynamic filters to be used in the scheduling of probe-side table scans by the coordinator.

Implementation

For identifying opportunities for dynamic filtering in the logical plan, we rely on the implementation added in #91. Dynamic filters are modeled as FunctionCall expressions which evaluate to a boolean value. They are created in the PredicatePushDown optimizer rule from the equi-join clauses of inner join nodes and pushed down in the plan along with other predicates. Dynamic filters are added to the plan after the cost-based optimization rules. This ensures that dynamic filters do not interfere with cost estimation and join reordering. The PredicatePushDown rule can end up pushing dynamic filters to unsupported places in the plan via inferencing. This was solved by adding the RemoveUnsupportedDynamicFilters optimizer rule which is responsible for ensuring that:

Dynamic filters are present only directly above a TableScan node and only if the subtree is on the probe side of some downstream JoinNode
Dynamic filters are removed from JoinNode if there is no consumer for it on its probe side subtree.

We also run DynamicFiltersChecker at the end of the planning phase to ensure that the above conditions have been satisfied by the optimized plan.

We reuse the existing DynamicFilterSourceOperator in LocalExecutionPlanner to collect build-side values from each inner join on each worker node. In addition to passing the collected TupleDomain to LocalDynamicFiltersCollector within the same worker node for use in broadcast join probe-side scans, we also pass them to TaskContext to populate task status updates for the coordinator.

ContinuousTaskStatusFetcher on the coordinator node pulls task status updates from all worker nodes up to every task.status-refresh-max-wait seconds (default is 1 second) or less (if task status changes). DynamicFilterService on the coordinator regularly polls for dynamic filters from task status updates through SqlQueryExecution and provides an interface to supply dynamic filters when they are ready. The ConnectorSplitManager#getSplits API has been updated to optionally utilize dynamic filters supplied by the DynamicFilterService.

In the Hive connector, BackgroundHiveSplitLoader can apply dynamic filtering by either completely skipping the listing of files within a partition, or by avoiding the creation of splits within a loaded partition if the dynamic filters become available in InternalHiveSplitFactory#createInternalHiveSplit due to lazy enumeration of splits.

Benchmarks

We ran TPC-DS queries on 5 worker nodes cluster of r4.8xlarge machines using data stored in ORC format. TPC-DS tables were partitioned as:

catalog_returns on cr_returned_date_sk
catalog_sales on cs_sold_date_sk
store_returns on sr_returned_date_sk
store_sales on ss_sold_date_sk
web_returns on wr_returned_date_sk
web_sales on ws_sold_date_sk

createAllORCTables.hql

The following queries ran faster by more than 20% with dynamic partition pruning (measuring the elapsed time in seconds, CPU time in minutes and Data read in MB).

Query	Baseline elapsed	Dynamic partition pruning elapsed	Baseline CPU	Dynamic partition pruning CPU	Baseline data read	Dynamic partition pruning data read
q01	10.96	8.50	10.2	8.9	17.91	14.53
q04	21.63	10.80	23.6	16.1	34.81	12.99
q05	41.38	14.94	57.1	16.8	54.81	11.45
q07	12.35	9.26	26.4	14.6	30.28	17.31
q08	10.48	6.43	11.0	4.7	10.19	3.52
q11	20.04	14.82	35.6	27.8	25.37	9.72
q17	24.05	9.87	26.4	12.0	30.18	9.75
q18	13.98	6.00	17.5	7.7	20.29	8.81
q25	18.91	8.04	26.9	9.1	37.54	11.12
q27	11.98	5.58	25.1	8.6	26.69	10.12
q29	24.11	15.46	30.5	18.5	30.18	13.50
q31	27.81	12.77	48.2	21.3	39.53	13.73
q32	11.51	8.15	12.7	10.3	15.05	12.76
q33	15.95	4.31	24.3	5.4	31.26	6.67
q35	15.10	5.22	13.8	6.2	4.83	1.70
q36	11.68	6.43	22.4	11.4	24.28	12.78
q38	21.08	16.20	39.4	31.6	5.65	3.15
q40	37.40	11.98	37.7	8.4	17.02	9.20
q46	11.57	9.06	24.4	17.3	18.51	14.19
q48	20.48	12.65	42.3	22.5	20.71	11.54
q49	26.69	16.01	38.8	12.0	68.67	30.57
q50	46.90	33.22	43.4	42.5	21.30	16.77
q54	43.05	11.39	27.5	14.8	17.71	11.52
q56	16.23	4.12	23.8	5.5	31.26	6.72
q60	16.39	6.02	25.1	6.6	31.26	7.42
q61	17.18	5.50	33.4	7.1	42.63	9.37
q66	13.67	6.59	19.1	8.9	19.63	8.34
q69	9.89	7.46	10.5	6.1	4.83	3.16
q71	17.32	6.11	23.3	6.6	31.26	8.06
q74	16.86	9.44	24.1	17.6	22.59	8.08
q75	122.04	69.45	102.7	62.9	110.86	63.91
q77	23.94	7.51	29.3	6.8	49.95	12.20
q80	43.46	18.57	45.8	11.5	37.25	11.78
q85	20.97	16.54	16.9	14.7	14.65	10.52

18 TPC-DS queries improved runtime by over 50% while decreasing CPU usage by an average of 64%. Data read was decreased by 66%.
7 TPC-DS queries improved between 30% to 50% while decreasing CPU usage by an average of 47%. Data read was decreased by 54%.
29 TPC-DS queries improved by 10% to 30% while decreasing CPU by an average of 20%. Data read was decreased by 27%.

Note that the baseline here includes the improvements from the existing node local dynamic filtering implementation.

Discussion

In order for dynamic filtering to work, the smaller dimension table needs to be chosen as a join’s build side. Cost-based optimizer can automatically do this using table statistics from the metastore. Therefore, we generated table statistics prior to running this benchmark and rely on the CBO to correctly choose the smaller table on the build side of join.

It is quite common for large fact tables to be partitioned by dimensions like time. Queries joining such tables with filtered dimension tables benefit significantly from dynamic partition pruning. This optimization is applicable to partitioned Hive tables stored in any data format. It also works with both broadcast and partitioned joins. Other connectors can easily take advantage of dynamic filters by implementing the new ConnectorSplitManager#getSplits API which supplies dynamic filters to the connector.

Future work

Support for using min-max range in DynamicFilterSourceOperator when the build-side contains too many values.
Passing dynamic filters back to the worker nodes from coordinator to allow ORC and Parquet readers to use dynamic filters with partitioned joins.
Allow connectors to block probe-side scan until dynamic filters are ready.
Support dynamic filtering with inequality operators
Support for semi-joins
Take advantage of dynamic filters in connectors other than Hive.

Hive ACID and transactional tables' support in Presto

2020-06-01T00:00:00+00:00

Hive ACID and transactional tables are supported in Presto since the 331 release. Hive ACID support is an important step towards GDPR/CCPA compliance, and also towards Hive 3 support as certain distributions of Hive 3 create transactional tables by default.

In this blog post we cover the concepts of Hive ACID and transactional tables along with the changes done in Presto to support them. We also cover the performance tests on this integration and look at the future plans for this feature.

How to use Hive ACID and transactional tables in Presto

Hive transactional tables are readable in Presto without any need to tweak configs, you only need to take care of these requirements:

Use Presto version 331 or higher
Use Hive 3 Metastore Server. Presto does not support Hive transactional tables created with Hive before version 3.

Note that Presto cannot create or write to Hive transactional tables yet. You can create and write to Hive transactional tables via Hive or via Spark with Hive ACID Data Source plugin and use Presto to read these tables.

What is Hive ACID and Hive transactional tables

Hive transactional tables are the tables in Hive that provide ACID semantics. This excerpt from Hive documentation covers ACID traits well:

“ACID stands for four traits of database transactions: Atomicity (an operation either succeeds completely or fails, it does not leave partial data), Consistency (once an application performs an operation the results of that operation are visible to it in every subsequent operation), Isolation (an incomplete operation by one user does not cause unexpected side effects for other users), and Durability (once an operation is complete it will be preserved even in the face of machine or system failure). These traits have long been expected of database systems as part of their transaction functionality.“

Need for Hive ACID and transactional tables

In any organisation, there is always a need to update or delete existing entries in tables e.g., a user writes or updates the review for an item purchased a week back or a transaction status is changed after a day, etc.. With regulations like GDPR/CCPA updates/deletes become even more frequent as the users can ask the organisation to delete the data on them, and organisations are obligated to fulfill these requests.

The standard practice to update data has been to overwrite the partition or table with the updated data but this is inefficient and unreliable. It takes a lot of resources to overwrite all of the existing data to update a few entries, but more importantly there are issues around isolation when reads on old data are going on and the overwrite starts deleting that data. To solve these issues several solutions have been developed, many of them are covered in this blog post, and Hive ACID is one of them.

Concepts of Hive ACID and transactional tables

Several concepts like transactions, WriteIds, deltas, locks, etc. are added in Hive to achieve ACID semantics. To understand the changes done in Presto to support Hive ACID and transactional tables, covered in the next section, it is important to understand these concepts first. So let’s look at them in detail.

Types of Hive transactional tables

There are two types of Hive transactional tables: Insert-Only transactional tables and CRUD transactional tables. Following table compares the two:

Type of transactional table	Hive DML Operations Supported	Input Formats supported	Synthetic columns in file?	Additional Table Properties
Insert-Only Transactional Tables	INSERT	All input formats	No	`'transactional'='true'`, `'transactional_properties'='insert_only'`
CRUD Transactional Tables	INSERT, UPDATE, DELETE	ORC	Yes	`'transactional'='true'`

Hive Transactions

Hive transactional tables should be accessed under Hive Transactions only. Note that these transactions are different from Presto transactions and are managed by Hive. Running DML queries under separate transactions helps in atomicity. Each transaction is independent and when rolled back will not have any impact on the state of the table.

WriteIds

DML queries under a transaction write to a unique location under partition/table described in detail later in “New Sub-Directories” section. This location is derived by WriteId allocated to the transaction. This provides Isolation of DML queries and such queries can run in parallel, whenever they can, without interfering with each other.

Valid WriteIds

Read queries under a transaction get a list of valid WriteIds that belong to the transactions which were successfully committed. This ensures Consistency by making results of committed transactions available to all the future transactions and also provides Isolation as DML and read queries can run in parallel with read queries not reading partial data written by DML queries.

New Sub-Directories

Results of a DML queries are written to a unique location derived from WriteId of the transaction. These unique locations are delta directories under partition/table location. Apart from the WriteId, this unique location is made up of the DML operation and depending on the operation type there can be two types of delta directories:

Delete Delta Directory: This delta directory is created for results of DELETE statements and is named delete_delta_<writeId>_<writeId> under partition/table location.
Delta Directory: This type is created for the results of INSERT statements and is named delta_<writeId>_<writeId> under partition/table location.

Apart from delta directories, there is another sub-directory that is now added called “Base directory” and is named as base_<writeId> under partition/table location. This type of directory is created by INSERT OVERWRITE TABLE query or by major compaction which is described later.

The following animation shows how these new sub-directories are created in the filesystem along with transaction management at metastore with different queries:

RowID

To uniquely identify each row in the table, a synthetic rowId is created and added to each row. RowIds are added to CRUD transactional tables only because it is used in case of DELETE statements only. When a DELETE is performed, the rowIds of the rows that it would delete are written into the delete_delta directory and subsequents reads will read all but these rows.

RowId is made of 5 entries today: operation, originalTransaction, bucket, rowId, currentTransaction but operation and currentTransaction fields are redundant now. RowId is added in the root STRUCT of ORC and hence the schema of ORC files is different from the schema defined in the table, e.g.:

Schema of CRUD transactional Hive Table:

n_nationkey : int,
n_name : string,
n_regionkey : int,
n_comment : string

Schema of ORC file for this table:

struct {
    operation : int,
    originalTransaction : bigint,
    bucket : int,
    rowId : bigint,
    currentTransaction : bigint,
    row : struct {
        n_nationkey : int,
        n_name : string,
        n_regionkey : int,
        n_comment : string
    }
}

Note that one level of nesting of table schema, like the inner struct above, is applicable to flat Hive tables too. The two level nesting of data columns is added for Orc files of CRUD transactional tables to keep rowId columns isolated from data columns.

Compactions

The working described above with delta and delete_delta directories for each transaction makes the DML queries execute fast but have the following impact on read queries:

Many delta directories with small data in each directory will slow down execution of read queries. This is a known problem around small files where engines end up spending more time opening files than actually processing the data.
Cross referencing all delete_delta directories to remove all deleted rows slows down the reads.

To solve these problems, Hive compacts delta directories asynchronously at two levels:

Minor Compaction: This compaction combines active delta directories into one delta directory and active delete_delta directories into one delete_delta directory thereby decreasing the number of small files. Limiting scope of this compaction to combining only delta directories keeps it fast. Minor compaction is automatically triggered as soon as active delta directories count reaches 10 (configurable). This compaction creates new delta directories like delta_<start_write_id>_<end_write_id> where [start_write_id, end_write_id] gives the range of existing delta directories that we compacted. Similar naming convention is used for delete_delta directory.
Major Compaction: Minor compaction does not work on merging base, delta and delete_delta directories as that requires rewriting of data with only the non-deleted rows, hence time consuming. This work is handled by a separate, less frequent and longer running, compaction called Major compaction. Major compaction is triggered when the total size of delta directories reaches 10% (configurable) of the base directory size. This compaction creates a new Base directory.

Locks

Hive uses shared locks to control what operations can run in parallel on partition/table. For example, DML queries take a write-lock on partitions they are modifying while read queries take a read-lock on partitions they are reading. The read-locks taken by read queries prevents Hive from cleaning up the delta directories that have been compacted while they are being read by the query.

Changes in Presto to support Hive ACID and transactional ables

At high level, there are changes at two places in Presto to support Hive ACID and transactional tables: In split generation logic that runs in coordinator and in ORC reader that is used in workers.

Split generation

Hive ACID State is setup in SemiTransactionalHiveMetastore.beginQuery, only for Hive transactional tables:
1. A new Hive transaction is opened per Query
2. A shared read-lock is obtained from Metastore server for the partitions read in the query
3. A Heartbeat mechanism is set up to inform the Metastore server about liveliness periodically. Frequency of heartbeats is figured out from the Metastore server but can be overridden with hive.transaction-heartbeat-interval property.
BackgroundSplitLoader is set up with valid WriteIds for the partitions as provided by Metastore server
BackgroundSplitLoader.loadPartitions is called in an Executor to create splits for each partition:
1. ACID sub-directories: base, delta and delete_delta directories are figured out by listing the partition location
2. DeleteDeltaLocations, a registry of delete_delta directories, is created. It contains minimal information through which delete_delta directory paths can be recreated at workers.
3. HiveSplits are created with each location of base and delta directories. Each HiveSplit contains the DeleteDeltaLocations
4. If the table is Insert-Only transactional table then DeleteDeltaLocations is empty and the HiveSplit is same as the HiveSplit on flat/non-transactional Hive table

Reading Hive transactional data in workers

The HiveSplit generated during the split generation phase make their way to worker nodes where OrcPageSourceFactory is used to create PageSource for TableScan operator.

Insert-Only transactional tables are read in the same way a non-transactional tables are read, OrcPageSource is created for their splits which reads the data for the split and makes it available to TableScanOperator
CRUD transactional tables need special handling during reads because the file schema does not match the table for them due to the synthetic RowId column added which introduces additional Struct nesting as mentioned earlier:
1. RowId columns are added to the list of columns to be read from file
2. ORC reader is setup by accessing column name from the file instead of using the column indexes from table schema, equivalent to forcing hive.orc.use-column-names=true for CRUD transactional tables
3. OrcRecordReader is created for the ORC file of the split
4. OrcDeletedRows is created for delete_delta locations, if any.
5. OrcPageSouce is created that returns rows from OrcRecordReader which are not present in OrcDeletedRows. This cross referencing of deleted rows is done lazily for each Block of the Page only when that Block is needed to be read from the PageSource. This works well with the lazy materialization logic of Presto to skip over Blocks if a predicate does not apply to the Page at all.

Performance numbers

Each Insert on Hive transactional table can create additional splits for delta directories and each delete can create delete_delta directories that adds additional work of cross referencing deleted rows while reading the split. To measure the impact of these operations on reads from Presto we ran the following performance tests where multiple Hive transactional tables are created with varying number of Insert and Delete operations and runtime of different read-focused Presto queries were recorded:

Table Type	Description	delta directories	delete_delta directories
Flat	TPCDS store_sales scale 3000 table, 8.6B rows	0	0
Only Base	Hive transactional store_sales scale 3000 table: 8.6B rows	0	0
Base + 1-Delete	Derived from “Only Base” with rows having customer_id=100 deleted by 1 DELETE query: 347 deleted entries	0	1
Base + 1-Delete + 1-Insert	Derived from “Base + 1 Delete” with deleted rows added back by 1 INSERT query: 347 deleted entries + 347 inserted entries	1	1
Base + 5-Deletes	Derived from “Only Base” with rows for 5 customer_ids deleted by 5 DELETE queries: 1355 rows deleted	0	5
Base + 5-Deletes + 5-Inserts	Derived from “Base + 1 Delete” with deleted rows added back by 5 INSERT queries: 1355 deleted entries + 1355 inserted entries	5	5

Following is the result of these tests, ran on a cluster with 5 c3.4xlarge machines on AWS:

It was seen that there is an impact of deleted rows on read performance, which is expected as the work for the reader increases in this case. But with predicates in place, this impact was reduced as the amount of data to be read goes down.

Ongoing and Future work

There has been ongoing work on the Hive ACID integration and some improvements are planned in future, notably:

Bucketed Hive transactional table support has been added (#1591)
Support for original files is in progress (#2930), this will allow Presto to read the Hive tables that were converted to transactional table at some point after having non-transactional data
Write support will be taken up in future (#1956)
There is ongoing work on Hive side for ACID on Parquet format. Once that lands, Presto’s implementation will be extended to support Parquet too.

Acknowledgements and Conclusion

Thanks to the folks who helped out in the development of this feature: Abhishek Somani provided continuous guidance on internals of Hive ACID, Dain helped out with simplifying ORC reader and along with Piotr helped in code refinement and with multiple rounds of reviews.

While we continue development on this feature to get full fledged support including writes, you can start using it on Hive transactional tables which do not have files in flat format. If you have such tables and want to use Presto with them then you can apply this fix to your Presto installation or you can trigger a major compaction on all partitions to migrate full table into CRUD transactional table format.

Apache Pinot Connector

2020-05-25T00:00:00+00:00

Presto 334 introduces the new Pinot Connector which allows Presto to query data stored in Apache Pinot™. Not only does this allow access to Pinot tables but gives users the ability to do things they could not do with Pinot alone such as join Pinot tables to other tables and use Presto’s scalar functions, window functions and complex aggregations.

Pinot UDF’s can be directly used by including the Pinot SQL query in quotes, explained below in the Pinot SQL Passthrough section. This enables aggregations and other complex query types to be done directly in Pinot.

This connector supports Pinot 0.3.0 and newer.

Setup

Create a properties file in the catalog directory, such as etc/catalog/pinot.properties which includes at least the following to get started:

connector.name=pinot
pinot.controller-urls=host1:9000,host2:9000

The pinot.controller-urls is a comma separated list of controller hosts. If Pinot is deployed via Kubernetes and you expose the the pinot.controller-urls needs to point to the controller Service endpoint. The Pinot broker and server must be accessible via DNS as Pinot will return hostnames and not ip addresses.

If you have a smaller number of Pinot servers than Presto workers or a relatively small number of rows per Pinot segment, you can minimize the requests to pinot by increasing the number of Pinot segments per split (default is 1 segment per split):

pinot.segments-per-split=15

If DNS resolution is slow or you get Request timed out errors, you can increase the request timeout as follows:

pinot.request-timeout=3m

Schema

Pinot supports the following data types. Currently null values are not supported. The corresponding Presto datatypes are:

Pinot Datatype	Presto Datatype
boolean	boolean
integer	integer
float, double	double
string, bytes*	varchar
integer_array	array(integer)
float_array, double_array	array(double)
long_array	array(bigint)
string_array	array(varchar)

The Pinot bytes type is converted to a hex-encoded varchar. See the Pinot docs for more information.

Pinot SQL Passthrough

If you would like to leverage Pinot’s fast aggregations you can use a “dynamic” table where you specify the Pinot SQL query as the table name and it is passed directly to Pinot:

SELECT * 
FROM pinot.default."SELECT col3, col4, MAX(col1), COUNT(col2) FROM pinot_table GROUP BY col3, col4"
WHERE col3 IN ('FOO', 'BAR') AND col4 > 50
LIMIT 30000

The filter in the outer presto query will be pushed down into the Pinot query via Presto’s applyFilter(). These queries are routed to the broker and should not return huge amounts of data as broker queries currently return a single response with all the results. This is more suited to aggregate queries.

Limits are pushed into the “dynamic” Pinot query via Presto’s applyLimit(). The above query would yield the following Pinot PQL query:

Pinot functions such as PERCENTILEEST can be used in the quoted sql.

SELECT MAX(col1), COUNT(col2)
FROM pinot_table
WHERE col3 IN('FOO', 'BAR') and col4 > 50
LIMIT 30000

If you are returning a larger dataset you can issue a normal Presto query which will get routed to the Pinot servers which store the Pinot segments. Filters and Limits are pushed down to Pinot for regular queries as well.

Future Work

As Presto and Pinot continue to evolve the Pinot connector will leverage new features such as aggregation pushdown and more.

State of Presto

2020-05-15T00:00:00+00:00

Presto is continuing to gain adoption across many industries and use cases. Our community is growing rapidly and there is a lot going on, so we are taking the Presto Summit online. And we are starting with a State of Presto webinar with the founders of the project.

Update:

We had a great event with lots of questions from the audience, taking us beyond the planned time frame. Check out the recording to learn more:

Join us virtually to hear Presto co-creators Martin Traverso, Dain Sundstrom, and David Phillips talk about the state of Presto, followed by a live Q&A moderated by Presto maintainer Piotr Findeisen.

Agenda:

2020 project milestones
Community and technical growth
Recent Presto updates
Project roadmap
Live Q&A

Date: Thursday, 21 May 2020

Time: 11am PDT (San Francisco), 2pm EDT (New York), 7pm BST (London), 6pm UTC

Register now!

We look forward to many questions and a lively webinar.

Presto on FLOSS Weekly

2020-05-06T00:00:00+00:00

Spreading the word about our project is an important task to grow the community around Presto. With a large, lively community we can ensure the success of Presto. Today we had the opportunity to talk about Presto on the long running open source podcast FLOSS Weekly.

Randal Schwartz was joined by his co-host Simon Phipps. We introduced Presto overall and talked about use cases of Presto and the problems it can solve. Both hosts, as well as the live audience, had some great questions and we did our best to answer them.

We moved through the history of Presto, current users and usage, the community around the project, and Dain talked about some of the upcoming improvements. In the end it seemed like we just scratched the surface and all wanted to keep talking about the project.

It was a great conversation and you should check it out!

Watch a recording of the Presto episode of FLOSS Weekly now!

Presto: The Definitive Guide

2020-04-11T00:00:00+00:00

Nearly two years ago Matt and Martin got the ball rolling on getting a book about Presto happening. A thriving project and community like everyone around Dain, David and Martin, the founders and creators of Presto, just needs a book. Even in this digital age of online documentation, communities on chat and other platforms, and videos everywhere, there is great value in a well structured and written book. Today, we are happy to announce that our book Presto: The Definitive Guide.

Get a free copy of Trino: The Definitive Guide from Starburst now!

This first book about Presto, is finally available for you all to get, read and hopefully learn from.

Update April 2021: The project has moved to the new name Trino, and the content of our book has been updated to Trino: The Definitive Guide.

With the help of O’Reilly, the book is now available in digital form, and paper copies are just around the corner as well. You can find more information about the book on our permanent page about it.

It is based on the very recent 330 release of Presto, but applicable to any Presto version. The book is broken up into three separate parts. No matter, if you are beginner keen to learn, or maybe with just a bit of command line and SQL knowledge, or an advanced or even expert Presto user, we are certain that you can learn something from the book and encourage you to check it out.

The first part of the book establishes what Presto is, and gets you quick wins to install a minimal setup, run it, connect to it with the CLI and an application using the JDBC driver and run some SQL queries.

The second part dives into the details of the Presto architecture, query planning, connectors for all sorts of data sources and SQL usage. There is a lot to learn and digest in these main sections.

In the third part we round things out with tuning tips, a good overview of the Web UI, usage of other tools, security configuration and more tips to get Presto into production.

Of course, putting all this information together requires work from many people. And in fact we did get lots of help from members of the Presto community and O’Reilly.

Specifically, we have some great news from our major supporter, Starburst! Starburst allowed us to work on the book and bring it across the finish line.

And that turns out to be great news for you all as well. Not only is the book finished now, you can also get a free digital copy of Trino: The Definitive Guide from Starburst.

So what are you waiting for? Go get a copy, check out the code repository for the book, provide feedback and contact us on Slack.

Looking forward to it all!

Matt, Manfred and Martin

Exhausted, but happy authors

Beyond LIMIT, Presto meets OFFSET and TIES

2020-02-03T00:00:00+00:00

Presto follows the SQL Standard faithfully. We extend it only when it is well justified, we strive to never break it and we always prefer the standard way of doing things. There was one situation where we stumbled, though. We had a non-standard way of limiting query results with LIMIT n without implementing the standard way of doing that first. We have corrected that, adding ANSI SQL way of limiting query results, discarding initial results and – a hidden gem – retaining initial results in case of ties.

Limiting query results

Probably everyone using relational databases knows the LIMIT n syntax for limiting query results. It is supported by e.g. MySQL, PostgreSQL and many more SQL engines following their example. It is so common that one could think that LIMIT n is the standard way of limiting the query results. Let’s have a look at how various popular SQL engines provide this feature.

DB2, MySQL, MariaDB, PostgreSQL, Redshift, MemSQL, SQLite and many others provide the ... LIMIT n syntax.
SQL Server provides SELECT TOP n ... syntax.
Oracle provides ... WHERE ROWNUM <= n syntax.

And what does the SQL Standard say?

SELECT *
FROM my_table
FETCH FIRST n ROWS ONLY 

If we look again at the database systems mentioned above, it turns out many of them support the standard syntax too: Oracle, DB2, SQL Server and PostgreSQL (although that’s not documented currently).

And Presto? Presto has LIMIT n support since 2012. In Presto 310, we added also the FETCH FIRST n ROWS ONLY support.

Let’s have a look beyond the limits.

Tie break

Admittedly, FETCH FIRST n ROWS ONLY syntax is way more verbose than the short LIMIT n syntax Presto always supported (and still does). However, it is also more powerful: it allows selecting rows “top n, ties included”. Consider a case where you want to list top 3 students with highest score on an exam. What happens if the 3^rd, 4^th and 5^th persons have equal score? Which one should be returned? Instead of getting an arbitrary (and indeterminate) result you can use the FETCH FIRST n ROWS WITH TIES syntax:

SELECT student_name, score
FROM student s JOIN exam_result e ON s.id = e.student_id
ORDER BY score
FETCH FIRST 3 ROWS WITH TIES

The FETCH FIRST n ROWS WITH TIES clause retains all rows with equal values of the ordering keys (the ORDER BY clause) as the last row that would be returned by the FETCH FIRST n ROWS ONLY clause.

Offset

Per the SQL Standard, the FETCH FIRST n ROWS ONLY clause can be prepended with OFFSET m, to skip m initial rows. In such a case, it makes sense to use FETCH NEXT ... variant of the clause – it’s allowed with and without OFFSET, but definitely looks better with that clause.

SELECT student_name, score
FROM student s JOIN exam_result e ON s.id = e.student_id
ORDER BY score
OFFSET 5
FETCH NEXT 3 ROWS WITH TIES

As an extension to SQL Standard, and for the brevity of this syntax, we also allow OFFSET with LIMIT:

SELECT student_name, score
FROM student s JOIN exam_result e ON s.id = e.student_id
ORDER BY score
OFFSET 5
LIMIT 3

Concluding notes

LIMIT / FETCH FIRST ... ROWS ONLY, FETCH FIRST ... WITH TIES and OFFSET are powerful and very useful clauses that come especially handy when writing ad-hoc queries over big data sets. They offer certain syntactic freedom beyond what is described here, so check out documentation of OFFSET Clause and LIMIT or FETCH FIRST Clauses for all the options. Since semantics of these clauses depend on query results being well ordered, they are best used with ORDER BY that defines proper ordering. Without proper ordering the results are arbitrary (except for WITH TIES) which may or may not be a problem, depending on the use case.

For scheduled queries, or queries that are part of some workflow (as opposed to ad-hoc), we recommend using query predicates (where relevant) instead of OFFSET. Read more at https://use-the-index-luke.com/sql/partial-results/fetch-next-page.

□

Presto in 2019: Year in Review

2020-01-01T00:00:00+00:00

What a great year for the Presto community! We started with the year with the launch of the Presto Software Foundation, with the long term goal of ensuring the project remains collaborative, open and independent from any corporate interest, for years to come.

Since then, the community around Presto has grown and consolidated. We’ve seen contributions from more than 120 people across over 20 companies. Every week, 280 users and developers interact in the project’s Slack channel. We’d like to take the opportunity to thank everyone that contributed the project in one way or another. Presto wouldn’t be what it is without your help.

With the collaboration of companies such as Starburst, Qubole, Varada, Twitter, ARM Treasure Data, Wix, Red Hat, and the Big Things community, we ran several Presto summits across the world:

All these events were a huge success and brought thousands of Presto users, contributors and other community members together to share their knowledge and experiences.

The project has been more active than ever. We completed 28 releases comprised of more than 2850 commits in over 1500 pull requests. Of course, that alone is not a good measure of progress, so let’s take a closer look at everything that went in. And there is a lot to look at!

Language Features

FETCH FIRST n ROWS [ONLY | WITH TIES] standard syntax. The WITH TIES clause is particularly useful when some of the rows have the same value for the columns being used to order the results of a query. Consider a case where you want to list top 5 students with highest score on an exam. If the 6th person has the same score as the 5th, you want to know this as well, instead of getting an arbitrary and non-deterministic result:
```
SELECT student_name, score
FROM student JOIN exam_result USING (student_id)
ORDER BY score
FETCH FIRST 5 ROWS WITH TIES
```
OFFSET syntax, which is especially useful in ad-hoc queries.
COMMENT ON <table> syntax to set or remove table comments. Comments can be shown via DESCRIBE or the new system.metadata.table_comments table.
Support for LATERAL in the context of an outer join.
Support for UNNEST in the context of LEFT JOIN. With this feature, it is now possible to preserve the outer row when the array contains zero elements or is NULL. Most common usages of UNNEST in a CROSS JOIN should actually be using this form.
```
SELECT * FROM t LEFT JOIN UNNEST(t.a) u (v) ON true
```
IGNORE NULLS clause for window functions. This is useful when combined with functions such as lead, lag, first_value, last_value and nth_value if the dataset contains nulls.
ROW expansion using .* operator.
CREATE SCHEMA syntax and support in various connectors (Hive, Iceberg, MySQL, PostgreSQL, Redshift, SQL Server, Phoenix).
Support for correlated subqueries containing LIMIT or ORDER BY+LIMIT.
Subscript operator to access ROW type fields by index. This greatly improves usability and readability of queries when dealing with ROW types containing anonymous fields.

Query Engine

Generalize conditional, lazy loading and processing (a.k.a., Late Materialization) beyond Table Scan, Filter and Projection to support Join, Window, TopN and SemiJoin operators. This can dramatically reduce latency, CPU and I/O for highly selective queries. This is one of the most important performance optimizations in recent times and we will be blogging about this more in coming weeks.
Unwrap cast/predicate pushdown optimizations.
Connector pushdown during planning for operations such as limit, table sample, or projections. This allows connectors to optimize how data is accessed before it’s provided to the Presto engine for further processing.
Dynamic filtering.
Cost-Based Optimizer can now consider estimated query peak memory footprint. This is especially useful for optimizing bigger queries, where not all parts of the query can be run concurrently.
Improved handling of projections, aggregations and cross joins in cost based optimizer.
Improved accounting and reporting of physical and network data read or transmitted during query processing.

Performance

10x performance improvement for UNNEST.
2-7x improvement in performance of ORC decoders, resulting in a 10% global CPU improvement for the TPC-DS benchmark.
Improvements when reading small Parquet files, files with large number of columns, or files with small row groups. We found this very useful, for example, when working with data exported from Snowflake.
Support for new ORC bloom filters.
Remove redundant ORDER BY clauses.
Improvements for IN and NOT-IN with subquery expressions (i.e., semijoin).
Huge performance improvements when reading from information_schema.
Reduce query latency and Hive metastore load, for both SELECT and INSERT queries.
Improve metadata handling during planning. This can result in dramatic improvements in latency, especially for connectors such as MySQL, PostgreSQL, Redshift, SQL Server, etc. Some queries like SHOW SCHEMAS or SHOW TABLES that could take several minutes to complete now finish in a few seconds.
Improved stability, performance, and security when spilling is enabled.

Functions

combinations
format
UUID type and related functions.
all_match, any_match and none_match.
Support flexible aggregation with lambda expressions using reduce_agg.
New date and time functions: last_day_of_month, at_timezone and with_timezone.

Security

Role-based access control and related commands.
INVOKER security mode for views, which allows views to be run using the permissions of the current user.
Prevent replay attacks and result hijacking in client APIs.
JWT-based internal communication authentication, which obsoletes the need to use Kerberos or certificates and greatly simplifies secure setups.
Credential passthrough, which allows Presto to authenticate with the underlying data source with credentials provided by the user running a query. This especially useful when dealing with Google Storage in GCP or SQL databases that manage user authentication and authorization on their own.
Impersonation for Hive metastore.
Support for reading and writing encrypted files in HDFS using Hadoop KMS.
Support for encrypting spilled data.

Geospatial

New geospatial functions: ST_Points, ST_Length, ST_Area, line_interpolate_point and line_interpolate_points.
SphericalGeography type and related functions to support spatial features in geographic coordinates (latitude / longitude) using a spherical model of the earth.
Support for Google Maps Polyline format via to_encoded_polyline and from_encoded_polyline functions.
geometry_from_hadoop_shape to decode geometry objects in Spatial Framework for Hadoop representation

Cloud Integration

Support for Azure Data Lake Blob and ADLS Gen2 storage.
Support for Google Cloud Storage.
Several performance improvements for AWS S3.

CLI and JDBC Driver

JSON output format and improvements to CSV output format.
Support and stability improvements for running the CLI and JDBC driver with Java 11.
Improve compatibility of JDBC driver with third-party tools.
Syntax highlighting and multi-line editing.

New Connectors

Elasticsearch
Google Sheets
Amazon Kinesis
Apache Phoenix
MemSQL
Apache Iceberg (preview version still under development)

Other Improvements

Presto Docker image that provides an out-of-the-box single node cluster with the JMX, memory, TPC-DS, and TPC-H catalogs. It can be deployed as a full cluster by mounting in configuration and can be used for Kubernetes deployments.

Support for LZ4 and Zstd compression in Parquet and ORC. LZ4 is currently the recommended algorithm for fast, lightweight compression, and Zstd otherwise.
Support for insert-only Hive transactional tables and Hive bucketing v2 as part of making Presto compatible with Hive 3.
Improvements in ANALYZE statement for Hive connector.
Support for multiple files per bucket for Hive tables. This allows inserting data into bucketed tables without having to rewrite entire partitions and improves Presto compatibility with Hive and other tools.
Support for upper- and mixed-case table and column names in JDBC-based connectors.
New features and improvements in type mappings in PostgreSQL, MySQL, SQL Server and Redshift connectors. This includes support for PostgreSQL arrays and timestamp with time zone type, and the ability to read columns of unsupported types.
Improvements in Hive compatibility with Hive version 2.3 and with Cloudera (CDH)’s Hive.
Connector provided view definitions, which allow connectors to generate the definition dynamically at query time. For example, the connector can provide a union of two tables filtered on a disjoint time range, with the cutoff time determined at resolution time.
Lots and lots of bug fixes!

Coming Up…

These are some of the projects that are currently in progress and are likely to land in the short term.

Support for pushing down row dereference expressions into connectors. This will help reduce the amount of data and CPU needed to process highly nested columnar formats such as ORC and Parquet.
Extend dynamic filtering to support distributed joins and other operators. Use dynamic filters for pruning partitions at runtime when querying Hive.
Extended Late Materialization support to queries involving complex correlated subqueries.
Finalize Hive 3 support.
Improved INSERT into partitioned tables, which will help with large ETL queries.
Improvements and features in Iceberg connector.
Pinot connector.
Oracle connector.
Influx connector.
Prometheus connector.
Salesforce connector.
Support for Confluent registry in Kafka connector.
Revamp of the function registry and function resolution to support dynamically-resolved functions and SQL-defined functions.
A new Parquet writer optimized to work efficiently within Presto.

… and many, many more.

Hive 3 support in Presto

2019-12-28T00:00:00+00:00

The Hive community is centered around a few different Hive distributions, one of them being Hortonworks Data Platform (HDP). Even after the Cloudera-Hortonworks merger there is vivid interest in HDP 3, featuring Hive 3. Presto is ready for the game.

In this post, we summarize which Hive 3 features Presto already supports, covering all the work that went into Presto to achieve that. We also outline next steps lying ahead.

Introduction

There are several Hive versions in active use by the Hive community: 0.x, 1.x, 2.x and 3.x. Hive 3 major release brings a number of interesting features, including:

support for Hadoop Erasure Coding (EC), allowing much better HDFS storage capacity utilization without reducing data availability,
update to ORC ACID transactional tables - they no longer need to be bucketed,
transactional tables for all file formats (“insert-only” except for ORC),
materialized views,
new bucketing function, offering a better data distribution and less data skew,
new timestamp semantics and timestamp-related changes in file formats,
and a lot more (let’s skip over features and changes that are not interesting from Presto perspective).

That’s no surprise that many people want to try out all these features and run Hive 3, either the Apache project’s official release or using HDP version 3.

Hive 3 in Presto

The Presto community expressed interest in using Presto with Hive 3, both in the project’s issues and on Slack.

You spoke, we listened. Actually – we, community, spoke and listened.

In collaboration between Starburst, Qubole and the wider Presto community, Presto gradually improves its compatibility with Hive 3:

Presto 319 fixed issues with backwards-incompatible changes in Hive metastore thrift API
Presto 320 added continuous integration with Hive 3
Presto 321 added support for Hive bucketing v2 ("bucketing_version"="2")
Presto 325 added continuous integration with HDP 3’s Hive 3
Presto 327 added support for reading from insert-only transactional tables, and added compatibility with timestamp values stored in ORC by Hive 3.1

Upcoming improvements already being worked on include:

Try it out

The amazing Presto community is working hard on getting Hive 3 support fully integrated in the Presto project and a lot is already accomplished. Chances are THAT all you need is already included in the latest release. If you need one of the upcoming improvements, watch the pull requests linked above, the roadmap issue, join Slack and stay tuned for upcoming release announcements. In the meantime, you can try out the features today by running the 323-e release of Starburst Presto.

□

Presto Experiment with Graviton Processor

2019-12-23T00:00:00+00:00

This December, AWS announced new instance types powered by Arm-based AWS Graviton2 Processor. M6g, C6g, and R6g are designed to deliver up to 40% improved price/performance compared with the current generation instance types. We can achieve cost-effectiveness by using these instance type series. Presto is just a Java application, so that we should be able to run the workload with this type of cost-effective instance type without any modification.

But is it true? Initially, we do not have a clear answer to how much effort we need to bring Presto into the world of the different processors. No care about the underlying platform is generally beneficial for development. But if using different processors enables us to accelerate the performance and stability of Presto, we must care about it. We must prove anything unclear by the experiment.

This article is the report to clarify what we need to do to run Presto on the Arm-based platform and see how much benefit we can potentially obtain with Graviton Processor.

As the Graviton 2 based instance types are preview state, we tried to run Presto on A1 instance that has the first generation of Graviton processor inside. It still would be a helpful anchor to understand the potential benefit of the Graviton 2 processor.

How to make Presto compatible with Arm

We are going to build the binary of Presto supporting Arm platform first. From the results, there are not so many things to do so. As long as JVM supports the Arm platform, it should work without any modification in the application code. But Presto has some restrictions on the platform where it runs to protect the functionality, including plugins. For example, the latest Presto supports only x86 and PowerPC architectures. This limitation prevents us from using Presto on the Arm platform.

To make Presto runnable on Arm machine, we need to modify PrestoSystemRequirements class to allow aarch64 architecture and more. For experimental purposes, we can apply such a patch to remove the restriction altogether.

diff --git a/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java b/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java
index 07b7d12c64..b6a1249681 100644
--- a/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java
+++ b/presto-main/src/main/java/io/prestosql/server/PrestoSystemRequirements.java
@@ -71,9 +71,9 @@ final class PrestoSystemRequirements
 String osName = StandardSystemProperty.OS_NAME.value();
 String osArch = StandardSystemProperty.OS_ARCH.value();
 if ("Linux".equals(osName)) {
- if (!"amd64".equals(osArch) && !"ppc64le".equals(osArch)) {
- failRequirement("Presto requires amd64 or ppc64le on Linux (found %s)", osArch);
- }
 if ("ppc64le".equals(osArch)) {
 warnRequirement("Support for the POWER architecture is experimental");
 }

This patch is all we have to do to run Presto on the Arm platform. It should work for most cases except for the usage with Hive connector because it has a native code not yet available for Arm platform.

Prepare Docker Images

Docker container is a desirable option to run Presto experimentally due to its availability and easiness of use. But there is one thing to do to build Docker image supporting cross-platform.

Docker buildx is an experimental feature for the full support of Moby BuildKit toolkit. It enables us to build a Docker image supporting multiple platforms, including Arm. The feature is so useful that we can quickly make the cross-platform Docker image with a one-line command. But the feature is not generally available in the typical installation of Docker. Enabling the experimental flag is necessary as follows in the case of macOS.

And make sure to restart the Docker daemon. We can build the Docker image for Presto supporting aarch64 architecture with buildx command. We have used the source code of 317-SNAPSHOT with the earlier patch in the PrestoSystemRequirements.

$ docker buildx build \
 --build-arg VERSION=317-SNAPSHOT \
 --platform linux/arm64 \
 -f presto-base/Dockerfile-aarch64 \
 -t lewuathe/presto-base:317-SNAPSHOT-aarch64 \
 presto-base --push

$ docker buildx build \
 --build-arg VERSION=317-SNAPSHOT-aarch64 \
 --platform linux/arm64 \
 -t lewuathe/presto-coordinator:317-SNAPSHOT-aarch64 \
 presto-coordinator --push

$ docker buildx build \
 --build-arg VERSION=317-SNAPSHOT-aarch64 \
 --platform linux/arm64 \
 -t lewuathe/presto-worker:317-SNAPSHOT-aarch64 \
 presto-worker --push

We should be able to specify multiple platform names for --platform option. But unfortunately, the Docker image of OpenJDK for Arm is distributed under the separated organization, arm64v8/openjdk. Building an image supporting Arm requires us another Dockerfile. Anyway, Docker images containing Presto supporting Arm are now available.

Setup A1 Instance

The following setup prepares the environment enough to run docker-compose on the A1 instance. As no docker-compose binary for Arm is distributed officially, we need to install and build docker-compose with pip. Make sure to run them after the instance initialization completes.

# Install Docker
$ sudo yum update -y
$ sudo amazon-linux-extras install docker -y
$ sudo service docker start
$ sudo usermod -a -G docker ec2-user

# Install docker-compose
$ sudo yum install python2-pip gcc libffi-devel openssl-devel -y
$ sudo pip install -U docker-compose

Performance Comparison

Let’s briefly take a look into how the performance provided by the Graviton processor looks like. We are going to use a1.4xlarge as a benchmark instance of Graviton processor.

Here is our specification of the benchmark conditions.

We use the commit b0c07249de5c70a70b3037875df4fd0477dec9fc + the patch previously described.
1 coordinator + 2 worker processes run by docker-compose on a single instance.
We use a1.4xlarge and c5.4xlarge, whose CPU core and memory are the same as a1.4xlarge. And we also compared with m5.2xlarge, whose on-demand instance cost is close to a1.4xlarge.
We use q01, q10, q18, and q20 run on the TPCH connector. Since the Presto TPCH connector does not access external storage, we can measure pure CPU performance without worrying about network variance.
We choose tiny and sf1 as the scaling factor of TPCH connector
Our experiment measures the average time of 5 query runtime after 5 times warmup for every query.

OpenJDK 8

Here is the result of our experiment. The vertical axis represents the running time in milliseconds.

It shows c5.4xlarge achieves the best performance consistently in every case. Compared with m5.2xlarge, the result was switched by the query type. a1.4xlarge and m5.2xlarge are probably competing with each other.

Although we use OpenJDK 8 for this case, it might not be able to generate the code fully optimized for Arm architecture. In general, the later versions, such as OpenJDK 9 or 11, give us better performance.

OpenJDK 11

Let’s try to run Presto with OpenJDK 11 again. There is one thing to do. From JDK 9, the Attach API was disabled as default. We have found that we needed to allow the usage of attach API by adding the following option in jvm.config file, otherwise we will see an error message at the bootstrap phase.

-Djdk.attach.allowAttachSelf=true

Here is the performance comparison with OpenJDK 11.

a1.4xlarge and c5.4xlarge achieve even higher performance than OpenJDK 8 for every case. On the contrary, m5.2xlarge shows a slower result in some cases. While this result still demonstrates c5.4xlarge is the best instance in terms of the performance, the performance gaps between instances are smaller compared with the OpenJDK 8 cases. Especially, a1.4xlarge shows relatively competitive performance with the smaller dataset (tiny). How does the scaling factor influence performance? We’ll see.

The above chart shows how performance is affected by the scaling factor. c5.4xlarge demonstrates the most stable running time, regardless of the scaling factor. If we want to achieve stable performance as much as possible, c5.4xlarge is a good option in the list. a1.4xlarge and m5.2xlarge show similar volatility against the scaling factor this time.

Considering the cost of a1.4xlarge instance is 40% cheaper than c5.4xlarge, it may make sense to use a1.4xlarge for the specific case. The on-demand cost of a1.4xlarge is $9.8/day and c5.4xlarge is $16.3/day for on-demand instance type. The public announcement says Graviton 2 delivers 7x performance compared to the Graviton processor. We may expect an even better performance by using a new generation processor. We cannot wait for the general availability of Graviton 2.

Amazon Corretto

How about other JVM distributions? Now we have found Amazon Corretto also supports Arm architecture, and it distributes the Docker image built for Arm. Let’s try Amazon Corretto similarly.

This chart illustrates the performance result by different JDK implementations, OpenJDK 8, OpenJDK 11, and Amazon Corretto 11. Overall, OpenJDK 11 seems to be the best. But Amazon Corretto achieves the even better performance in some of the sf1 cases interestingly. It indicates that Presto with Amazon Corretto may provide better performance in some query types.

Wrap Up

As Presto is just a Java application, there are not so many things to do to support the Arm platform. Only applying one patch and one JVM option brings us Presto binary supporting the latest platform. It is always exciting to see a new technology used for complicated distributed systems such as Presto. The combination of cutting-edge technologies surely takes us a journey to the new horizon of technological innovation.

Last but not least, we have used docker-compose and TPCH connectors to execute queries for the Presto cluster quickly in the Arm platform. Note that the performance of a distributed system such as Presto depends on various kinds of factors. Please be sure to run your benchmark carefully when you try to use a new instance type in your production environment.

We have uploaded the Docker image used for this experiment publicly. Feel free to use them if you are interested in running Presto on the Arm platform.

# Image for Armv8 using OpenJDK 11
$ docker pull lewuathe/presto-coordinator:327-SNAPSHOT-aarch64
$ docker pull lewuathe/presto-worker:327-SNAPSHOT-aarch64


# Image for Armv8 using Amazon Corretto 11
$ docker pull lewuathe/presto-coordinator:327-SNAPSHOT-corretto
$ docker pull lewuathe/presto-worker:327-SNAPSHOT-corretto

And also, I have raised an issue to start the discussion of supporting Arm architecture in the community. It would be great if we could get any feedback from those who are interested in it.

Thanks!

First Presto Summit in India, Bangalore, September 2019

2019-09-05T00:00:00+00:00

Qubole organized the first ever Presto Summit in India on September 05, 2019. Bangalore, as the technology and startup hub of India was the perfect venue for India’s first Presto Summit. Presto has seen a lot of interest and adoption in this (south asia and asia pacific) region, as was evident with the turnout in the last two Presto Meetups organized by Qubole over the past year. Courtyard By Marriott, on Outer Ring Road (ORR) - a 17 KM stretch that hosts 10% of Bangalore’s working population (around 1 million people), as the conference venue proved to be an ideal destination for Presto enthusiasts, several of whom, work in its immediate vicinity.

With 150 attendees from more than 75 companies, Presto community in India was super excited and eager to meet and interact with Presto co-creators - Martin Traverso, Dain Sundstrom and David Phillips, who flew down to Bangalore for this Event.

Welcome Note by Joydeep Sen Sarma

Joydeep Sen Sarma, co-creator Hive and co-founder Qubole, kicked off the event by welcoming Presto co-creators, speakers and all the attendees. He also provided a brief historical perspective of Qubole’s contributions to Presto and highlighted the importance of Presto in Qubole’s customer base.

Keynote by Martin, Dain and David

Slides Video

This was followed by the most awaited presentation of the day - the keynote from Martin, Dain and David. Martin took the audience through Presto’s journey - right from its birth at Facebook, to its growth and adoption at Facebook, and finally to the present with the formation of Presto Software Foundation for wider community involvement. He also highlighted some of their design choices and some mis-steps they took along the way.

Presto at Grab

Slides Video

First industry speaker of the day was Edwin Hui Hean Law, Data Engineering Lead at Grab, Singapore. He and his team flew all the way from Singapore for Presto Summit - a true testament to their passion and interest in Presto. His talk covered Grab’s experience of using Presto on Amazon EMR followed by their migration to Presto on Qubole. He provided his insights on the relative pros and cons of these platforms. Final part of his talk covered his team’s recent experimentation with Presto on Kubernetes.

Read Support for Hive ACID tables in Presto

Slides Video

Next, Shubham Tagra, Sr. Staff at Qubole, presented his work on providing read support for Hive ACID tables in Presto. This has become increasingly important with the arrival of data privacy regulations like GDPR and CCPA that grant users “Right to erasure” and/or “Right to rectification”. These regulations require that organisations storing user data are obligated to delete or update user data as per user request. Hive ACID is a solution available in open source that addresses these problems around delete and updates. Shubham’s talk covered why he picked Hive ACID over other options available in open source, as well as details of Hive ACID and Presto integration that he added.

Presto Optimizations at Zoho Corporation

Slides Video

Post lunch, Praveen Krishna from Zoho Corporation, presented a summary of his team’s journey with Presto. In order to serve their teams with a pretty small cluster, they had to optimize Presto at various levels. Praveen’s team started by analyzing various phases of query execution and their impact on performance. Praveen’s team optimized Presto’s planner and reduced the planning time by 20-30% for queries involving multiple joins on wide tables. He also highlighted how they have integrated Apache Lucene to speed up full text search operation. After several iterations his team came up with a model where they maintained the Lucene index for each row group in the ORC itself. For columns with higher null ratio, replacing normal blocks with run length encoded blocks reduced memory consumption . With this logic implemented in ORC reader and Core Presto, they were able to reduce memory pressure in the cluster .

Presto at Walmart Labs

Slides Video

Second presentation in this session was from Ashish Kumar Tadose, Principal Engineer at Walmart Labs. He gave an overview of how his team is using Presto on Google Compute Cloud (GCP). He highlighted the challenges associated with querying diverse data sources at Walmart and how his team has tackled these challenges using Presto. His talk also described how his team has implemented monitoring, auto scaling, caching (via Alluxio), and security policies via Ranger.

Presto at InMobi

Slides Video

Ater a coffee break, Rohit Chatter, CTO at InMobi, provided a historical perspective of how his team has migrated from Hive in private Data centers to Presto on the public cloud. His talk covered various aspects of how his team handles autoscaling and workload management on the cloud.

Presto Scheduler Changes for Rubix

Slides Video

Next, Garvit Gupta from Microsoft presented his work on Presto scheduler changes for data locality and optimized scheduling for caching engines like RubiX. This work was done primarily as part of his internship at Qubole. This talk was co-presented by Ankit Dixit from Qubole, who first gave an overview of the Rubix caching engine and its architecture. Garvit highlighted the need for having locality as another dimension to be considered while assigning splits to nodes and how this led to the implementation of a new Presto scheduler. The new scheduling model manages to prioritize locality while ensuring a uniform distribution of workload to nodes and improves efficacy of any data caching framework that you would use with Presto. His talk covered the new scheduler changes in detail, and concluded with performance numbers where he saw upto 9x improvement in cached/local reads with RubiX.

Presto at MiQ Digital

Slides Video

Final presentation of the day was from Rohit Srivastava, Engineering Manager at MiQ Digital, who presented an overview of Unified Insights & Data Analytics platform at MiQ. He highlighted several challenges that his team had to overcome, such as scaling the team/infrastructure/company, dealing with data copies, duplication of data pre-processing and the cost and effort that goes into it, meeting strict SLAs etc. He gave an overview of how using Presto on Qubole for all dashboarding needs with additions like standardising most of their data to be stored in the Apache Parquet format on S3 has helped overcome some of these challenges.

In summary, first Presto Summit in India, had a great mix of talks - some were around Presto usage and experience of operating large Presto deployments across multiple clouds, while some others focussed on niche technical contributions around Presto scheduler changes for data locality, speeding up ORC reader, and read support for Hive ACID tables in Presto. Participants had interesting and engaging questions for all the speakers and in general, enjoyed interacting with Presto founders, other Presto users and developers in the region.

Videos and slides for all talks can be found here.

We look forward to the next Presto Summit in this region soon!

Unnest Operator Performance Enhancement with Dictionary Blocks

2019-08-23T00:00:00+00:00

Queries with CROSS JOIN UNNEST clause are expected to have a significant performance improvement starting version 316.

Executive Summary

The execution plans for queries with a CROSS JOIN UNNEST clause contain an Unnest Operator. The previous implementation of Unnest Operator performed a deep copy on all input blocks to generate output blocks. This caused high CPU consumption and memory allocation for the operator, and impacted the performance of such queries. The impact was worse for UNNEST queries accessing a high number of columns, or even a few columns with deeply nested schema.

We realized that the implementation can be made more efficient by avoiding copies in the Unnest Operator, if possible. Using dictionary blocks to create output blocks pointing to input elements has given us significant CPU and memory benefits by avoiding copies. The benchmark results for the new Unnest Operator implementation show more than ~10x gain in CPU time and 3x~5x gain in memory allocation.

Let’s try to understand this change with an example. At LinkedIn, the most common usage for CROSS JOIN UNNEST clause is seen to be for unnesting a single array or map column. A sample query with the clause would look like the following:

SELECT T.c0, U.unnest_c1 
FROM T CROSS JOIN UNNEST(c1) AS U(unnest_c1)

The plots below compare the performance of Unnest Operator in the previous and the current implementation for 3 different cases. Every case evaluates the Unnest Operator performance for a query like the above, on a table T with two columns c0 and c1. For all the 3 cases, c0 is a VARCHAR type column. But the nested column c1 is of ARRAY(VARCHAR), MAP(VARCHAR, VARCHAR) and ARRAY(ROW(VARCHAR, VARCHAR, VARCHAR)) types respectively. All the VARCHAR elements in both the columns have length 50, and the arrays in the second column have lengths distributed uniformly between 0 and 300.

We used JMH benchmark to measure the performance of the queries in terms of CPU time and memory allocations per operation. An “operation” (for the purposes of this measurement) is defined as the processing of 10,000 rows by an unnest operator. These results reflect the speedup of the operator and may not extend to the overall query execution.

The figure above compares the CPU times before and after the enhancements. For the three cases, we see that every operation finishes more than 10x faster. The new implementation removes the need of copying data for output block generation in this case, giving us significant CPU time savings.

The figure above compares the memory allocation per operation before and after the enhancement. The new Unnest Operator implementation does not allocate new large memory chunks for output blocks. Instead, it uses integer typed pointers pointing to input block elements, which results in smaller memory allocations than creating new VARCHAR blocks. This brings down the allocation rate by 3x-5x in this example.

Let’s dig into the design and implementation details.

Background

An Operator in Presto performs a step of computation on data. The local execution plan for a task involves pipelines of operators. Operators process pages coming from the previous Operator in the pipeline, and produce output pages for the next one. Code for an Operator has to be efficient, since it may be evaluated billions of times for a single query.

A Page is made of a set of blocks storing data for different columns. DictionaryBlock is one of the Block implementations in Presto. The elements in a DictionaryBlock are represented using an integer array (called ids) and a reference to another block. The values in ids array represent elements of the DictionaryBlock by pointing to element indices in the referenced block. DictionaryBlocks are useful to perform more efficient encoding of columns with duplicates.

The Unnest Operator was implemented before the DictionaryBlock was added. We saw an opportunity to enhance the performance of this Operator by using DictionaryBlocks. A DictionaryBlock can enable the Unnest Operator to reuse already constructed input blocks. Using DictionaryBlock for the Unnest operator eliminates the need for expensive copies and results in significant compute and memory savings.

Design

Consider the following CROSS JOIN UNNEST query on a table with one VARCHAR type and one ARRAY(VARCHAR) type columns.

SELECT T.name, U.unnested_position 
FROM T CROSS JOIN UNNEST(positions_held) AS U(unnested_position)

Elements of name column are replicated while we unnest elements in positions_held column. In this example, name is a “replicated column”, and positions_held will be referred to as an “unnested column”.

Multiple unnest columns are also allowed (eg. UNNEST(positions_held, company_name) AS U(unnested_position, unnested_company)), but that case is not that common. It requires special handling, and we talk about that later in the post.

In the old design, an element from a replicated column would get copied over n times for building the output, where n is the cardinality of the element in the unnest column. For example, Alice and Bob will be copied 2 and 3 times respectively. In the new design, the output block will contain n pointers to the element in the input block, without actually copying. It will store a reference to the input block as well. The benefits here are proportional to the replicated column element sizes. The bigger the element size, the greater the speedup.

Unnest columns are handled the same way. The previous design would copy them over one by one. This becomes CPU intensive and requires new memory allocations, especially in case of deeply nested columns, since a deep copy is required. In the new design, we try to use pointers instead of copies in most of the cases. The following figure shows the output block structure of the unnested_positions column in the query above, for the old and the new implementation.

The indices in the output block B3 shown above are strictly increasing starting from 0, but that is not always the case. The same input block can be used to generate multiple output blocks, with a different set of indices. Another interesting scenario is when multiple columns are being unnested. In that case, the output may require null appends because of the difference in cardinalities. We look for null elements in the input block and use their indices for handling the null-appends. If that is not possible, we have to fall back to copying data. We discuss this in more detail in the next section.

Implementation Challenges

Extracting Input from Nested Blocks

Data in the input unnest columns is represented in terms of nested structures (eg. ArrayBlock, MapBlock and RowBlock), which creates a layer of indirection on top of the actual element blocks. For the positions_held column from the example above, the input block is an ArrayBlock, that contains:

offset information for representing arrays in every row
actual data in the form of an underlying element block storing VARCHARs.

For building an output DictionaryBlock, we create pointers to this underlying block. While processing entries from input array block, array offsets are translated to indices of the underlying block. Similar translation has been implemented for unnest columns with array type, map type and array of row type. ColumnarMap, ColumnarArray and ColumnarRow structures are used for enabling such translation of indices.

Dealing with Multiple Unnest Columns

When there are more than one nested columns in a table, a user may want to unnest multiple columns in the same query. Consider a table S with 3 columns: name, schools_attended and graduation_dates. They have VARCHAR, ARRAY(VARCHAR) and ARRAY(VARCHAR) types respectively. Every row in this table indicates schools attended and corresponding graduation dates for a person. Let’s say a user wants to unnest the contents of the two array columns into unnested_school and unnested_graduation_date.

One naive way of doing that is using the CROSS JOIN UNNEST clause twice, on the two different columns. This translates to two different UNNEST operators (as shown in the query below) with a single unnest column producing two independent cross joins, and the execution will proceed the way we discussed earlier. This query structure is not very helpful, since we get blown up cross joined data.

SELECT S.name, U1.unnested_school, U2.unnested_graduation_date 
FROM S
CROSS JOIN UNNEST(schools_attended) AS U1(unnested_school) 
CROSS JOIN UNNEST(graduation_dates) AS U2(unnested_graduation_date)

The correct way of unnesting the two columns is using them in the same unnest clause, as shown below.

SELECT S.name, U.unnested_school, U.unnested_graduation_date 
FROM T 
CROSS JOIN UNNEST(schools_attended, graduation_dates) AS U(unnested_school, unnested_graduation_date) 

The arrays/maps being unnested in multiple columns can have different cardinalities. In this example, the graduation_date value for the last school may not be present, if the user has not yet graduated. Null elements need to be appended to the output unnest columns in such cases.

In the example data shown below, a NULL element is appended in the unnested_graduation_date column since the array in graduation_dates column is shorter than that in the schools_attended column.

Since we are using a DictionaryBlock for building the unnest output column, appending a null gets slightly tricky. How do we create a pointer for representing a NULL? The DictionaryBlock implementation, as of now, does not have a way to represent null elements. In such cases, we first check for existence of a null element in the input block. If we find a NULL element there, we use the index of that element while appending NULLs in the output. Otherwise we copy elements from the input to create a new output block, like we used to do in the previous implementation.

In cases with multiple columns, the length of arrays/maps are usually the same, and misalignments are not that frequent. Having said that, misalignments can result in copying of data while building output blocks if NULL elements are not present in the input. This may reduce the CPU and memory savings (even increase the average memory allocation in some cases), but this specific case is not common.

Future Work

Performance for the queries with CROSS JOIN UNNEST clause can be further improved through the following optimizations.

While unnesting a deeply nested column of type array(row(.....)), the user is often interested in a small subset of fields from the row. Such cases can benefit from optimization of the logical plan through the pushdown of dereference projections. There are ongoing efforts in the community in this direction.

The dictionary blocks created in the discussed implementation use the input block as a reference. What happens if the input itself is a DictionaryBlock? We end up with two levels of dereferencing. Such cases can be further optimized by collapsing the multiple indirections into a single one.

The common case for unnest column does not involve any NULL appends. The unnested output DictionaryBlock in this case represents a range over the input block. We can avoid the DictionaryBlock creation by using the getRegion method on the input block.

For variable-width and complex columns, usage of DictionaryBlock can be beneficial in terms of CPU and memory. This may be overkill for primitive types (booleans or integers) and we might be better off copying rather than creating a dictionary block. Selectively choosing to use dictionary blocks based on the type can be helpful.

Conclusion

LinkedIn’s data ecosystem makes heavy use of tables with deeply nested columns, and this change is beneficial for handling Presto queries on such tables. In our internal experiments with production data, we have seen queries perform up to ~9x faster with as much as ~13x less cpu usage.

We look forward to people in the community trying this out starting with the 316 release. We would love to hear others’ observations of performance after this change. Feel free to reach out to me over slack (handle @padesai) or LinkedIn with questions or feedback.

A Report of First Ever Presto Conference Tokyo

2019-07-11T00:00:00+00:00

Nowadays, Presto is getting much attraction from the various kind of companies all around the world. Japan is not an exception. Many companies are using Presto as their primary data processing engine.

To keep in touch with each other among the community members in Japan, we have just held the first ever Presto conference in Tokyo with welcoming Presto creators, Dain Sundstrom, Martin Traverso and David Phillips. The conference was hosted at the Tokyo office of Arm Treasure Data. This article is the summary of the conference aiming to convey the excitement in the room.

Presto: Current and Future

First of all, Presto creators introduced their work in these days and software foundation launched in the last year. They covered the following changes and enhancements achieved by the community recently.

Presto Software Foundation
New Connectors
- Phoenix
- Elasticsearch
- Apache Ranger

Attendees can also learn several plans that will happen shortly.

The plan to support more complex pushdown to connectors
Case-sensitive identifier
Timestamp semantics
Dynamic filtering
Connectors such as Iceberg, Kinesis, Druid.
Coordinator high availability

Reading The Source Code of Presto

To make attendees get used to the technical talk about Presto in the conference, Leo provided a guide for walking around the source code of Presto code. Since the Presto source code repository is enormous, it must be helpful as a leader to help developers explore the forest of the codebase.

Reading The Source Code of Presto from Taro L. Saito

Presto At Arm Treasure Data

Then Kai (it’s me) provides an overview of how Arm Treasure Data uses Presto in their service. Presto is heavily used to support many enterprise use cases, including IoT data analysis, and it is becoming the hub component processing high throughput workload from many kinds of clients such as Spark, ODBC and JDBC.

Presto At Arm Treasure Data - 2019 Updates from Taro L. Saito

Large Scale Migration from Hive to Presto in Yahoo! JAPAN

We could learn how hard to migrate large scale workload from Hive to Presto from the presentation given by Star from Yahoo! Japan. Quite a few people seem to be interested in the tool they have created to convert HiveQL into Presto SQL. They might have faced the same type of challenges.

Large scale migration fromHive to Presto at Yahoo! JAPAN from Yahoo!デベロッパーネットワーク

Presto At LINE

LINE is the biggest company providing the mobile communication tool in Japan (say WhatsApp in Japan). Wataru Yukawa, Yuya Ebihara gave us how they can improve their platform with collaborating with the community. We could find difficulty and challenge primarily provoked by the dependencies on other Hadoop ecosystems such as HDFS and Spark.

Presto conferencetokyo2019 from wyukawa

One notable thing in the session was the question about the discussion of how to make the error message excellent provided by Presto. David and creators are genuinely caring about the error message shown by the system. To reduce the time consumed to deal with the inquiry about the error, improving the error message is one of the best options. That’s the primary reason to maintain the error message easy to understand.

Q&A Session

At the end of the conference, attendees got a chance to freely ask Presto creators about a bunch of topics not only Presto technical thing but also their working style, or thoughts. Here is a part of the list of Q&A talked at the conference.

Q: What do you expect most from Japan community?

Considering the communication in the Israel community, gaining the diversity of the use case will make Presto better. We are expecting that kind of diversity. Japan surely has a unique community to solve the difficulty. Having a Japanese slack channel might be a good idea to help each other :)

Q: How do you review the pull request code? How to keep the quality of the code review process?

Code review difficulty depends on the complexity of PR itself. We use IntelliJ extensively to read the code base. There are mainly two things to keep the code review quality. One is that involving the actual code review will make you a good reviewer. Another thing is automating minor checks such as code style. These things are helpful to keep the code review process functional.

Make it readable is the most important thing in the Presto codebase.

Do not use the abbreviation and slang because not everyone can understand these words at a glance

Write comment -> Write code -> Delete comment. That is the process to make the code readable itself.

Q: SQL on Everything approach vs. pursuing the performance. Which direction should Presto move forward?

It depends on the community decision. However, along with the discussion with several companies in the community, even not a single company does not show much concern about the performance of Presto.

Wrap Up

This conference was the first ever Presto conference inviting the Presto creators in Tokyo. We were able to have an exciting discussion with the community developers and creators. One of the great things we could find in the conference was the enthusiasm of creators to make Presto usable by every developer. They are genuinely caring about the error message checked by users, code quality read by developers. Thanks to this type of good usability from the viewpoint of both users and developers, Presto keeps gaining attraction from the community.

That was a great time to have many conversations with the community members. We really appreciate developers in the community and creators. Thank you so much for coming to the conference and see you next time!

Reference

Introduction to Trino Cost-Based Optimizer

2019-07-04T00:00:00+00:00

Last edited 15 June 2022: Update to use the Trino project name.

The Cost-Based Optimizer (CBO) in Trino achieves stunning results in industry standard benchmarks (and not only in benchmarks)! The CBO makes decisions based on several factors, including shape of the query, filters and table statistics. I would like to tell you more about what the table statistics are in Trino and what information can be derived from them.

This post was originally published at Starburst Data Engineering Blog.

Background

Before diving deep into how Trino analyzes statistics, let’s set up a stage so that our considerations are framed in some context. Let’s consider a Data Scientist who wants to know which customers spend most dollars with the company, based on history of orders (probably to offer them some discounts). They would probably fire up a query like this:

SELECT c.custkey, sum(l.price)
FROM customer c, orders o, lineitem l
WHERE c.custkey = o.custkey AND l.orderkey = o.orderkey
GROUP BY c.custkey ORDER BY sum(l.price) DESC;

Now, Trino needs to create an execution plan for this query. It does so by first transforming a query to a plan in the simplest possible way — here it will create CROSS JOINS for FROM customer c, orders o, lineitem l part of the query and FILTER for WHERE c.custkey = o.custkey AND l.orderkey = o.orderkey. The initial plan is very naïve — CROSS JOINS will produce humongous amounts of intermediate data. There is no point in even trying to execute such a plan and Trino won’t do that. Instead, it applies transformation to make the plan more what user probably wanted, as shown below. Note: for succinctness, only part of the query plan is drawn, without aggregation (GROUP BY) and sorting (ORDER BY).

Indeed, this is much better than the CROSS JOINS. But we can do even better, if we consider cost.

Cost-Based Optimizer

Without going into database internals on how JOIN is implemented, let’s take for granted that it makes a big difference which table is right and which is left in the JOIN. (Simple explanation would be that the table on the right basically needs to be kept in the memory while JOIN result is calculated). Because of that, the following plans produce same result, but may have different execution time or memory requirements.

CPU time, memory requirements and network bandwidth usage are the three dimensions that contribute to query execution time, both in single query and concurrent workloads. These dimensions are captured as the cost in Trino.

Our Data Scientist knows that most of the customers made at least one order and every order had at least one item (and many orders had many items), so lineitem is the biggest table, orders is medium and customer is the smallest. When joining customer and orders, having orders on the right side of the JOIN is not a good idea! However, how the planner can know that? In the real world, the query planner cannot reliably deduce information just from table names. This is where table statistics kick in.

Table statistics

Trino has connector-based architecture. A connector can provide table and column statistics:

number of rows in a table,
number of distinct values in a column,
fraction of NULL values in a column,
minimum/maximum value in a column,
average data size for a column.

Of course, if some information is missing — e.g. average text length in a varchar column is unknown — a connector can still provide other information and Cost-Based Optimizer will be able to use that.

In our Data Scientist’s example, data sizes can look something like the following:

Having this knowledge, Trino’s Cost-Based Optimizer will come up with completely different join ordering in the plan.

Filter statistics

As we saw, knowing the sizes of the tables involved in a query is fundamental to properly reordering the joins in the query plan. However, knowing just the sizes is not enough. Returning to our example, the Data Scientist might want to drill down into results of their previous query, to know which customers repeatedly bought and spent most money on a particular item (clearly, this must be some consumable, or a mobile phone). For this, they will use almost identical query as the original one, adding one more condition.

SELECT c.custkey, sum(l.price)
FROM customer c, orders o, lineitem l
WHERE c.custkey = o.custkey AND l.orderkey = o.orderkey
  AND l.item = 106170                              --- additional condition
GROUP BY c.custkey ORDER BY sum(l.price) DESC;

The additional FILTER might be applied after the JOIN or before. Obviously, filtering as early as possible is the best strategy, but this also means the actual size of the data involved in the JOIN will be different now. In our Data Scientist’s example, the join order will indeed be different.

Under the Hood

Execution Time and Cost

From external perspective, only three things really matter:

execution time,
execution cost (in dollars),
ability to run (sufficiently) many concurrent queries at a time.

The execution time is often called “wall time” to emphasize that we’re not really interested in “CPU time” or number of machines/nodes/threads involved. Our Data Scientist’s clock on the wall is the ultimate judge. It would be nice if they were not forced to get coffee/eat lunch during each query they run. On the other hand, a CFO will be interested in keeping cluster costs at the lowest possible level (without, of course, impeding employees’ effectiveness). Lastly, a System Administrator needs to ensure that all cluster users can work at the same time. That is, that the cluster can handle many queries at a time, yielding enough throughput that “wall time” observed by each of the users is satisfactory.

It is possible to optimize for only one of the above dimensions. For example, we can have single node cluster and CFO will be happy (but employees will go somewhere else). Contrarily, we may have thousand node cluster even if the company cannot afford that. Users will be (initially) happy, until the company goes bankrupt. Ultimately, however, we need to balance these trade-offs, which basically means that queries need to be executed as fast as possible, with as little resources as possible.

In Trino, this is modeled with the concept of the cost, which captures properties like CPU cost, memory requirements and network bandwidth usage. Different variants of a query execution plan are explored, assigned a cost and compared. The variant with the least overall cost is selected for execution. This approach neatly balances the needs of cluster users, administrators and the CFO.

The cost of each operation in the query plan is calculated in a way appropriate for the type of the operation, taking into account statistics of the data involved in the operation. Now, let’s see where the statistics come from.

Statistics

In our Data Scientist’s example, the row counts for tables were taken directly from table statistics, i.e. provided by a connector. But where did “~3K rows” come from? Let’s dive into some nitty-gritty details.

A query execution plan is made of “building block” operations, including:

table scans (reading the table; at runtime this is actually combined with a filter)
filters (SQL’s WHERE clause or any other conditions deduced by the query planner)
projections (i.e. computing output expressions)
joins
aggregations (in fact there are a few different “building blocks” for aggregations, but that’s a story for another time)
sorting (SQL’s ORDER BY)
limiting (SQL’s LIMIT)
sorting and limiting combined (SQL’s ORDER BY .. LIMIT .. deserves specialized support)
and a lot more!

The way how the statistics are computed for most interesting “building blocks” is discussed below.

Table Scan statistics

As explained in “Table statistics” section, the connector which defines the table is responsible for providing the table statistics. Furthermore, the connector will be informed about any filtering conditions that are to be applied to the data read from the table. This may be important e.g. in the case of Hive partitioned table, where statistics are stored on per-partition basis. If the filtering condition excludes some (or many) partitions, the statistics will consider smaller data set (remaining partitions) and will be more accurate.

To recall, a connector can provide the following table and column statistics:

number of rows in a table,
number of distinct values in a column,
fraction of NULL values in a column,
minimum/maximum value in a column,
average data size for a column.

Filter statistics

When considering a filtering operation, a filter’s condition is analyzed and the following estimations are calculated:

what is the probability that data row will pass the filtering condition. From this, expected number of rows after the filter is derived,
fraction of NULL values for columns involved in the filtering condition (for most conditions, this will simply be 0%),
number of distinct values for columns involved in the filtering condition,
number of distinct values for columns that were not part of the filtering condition, if their original number of distinct values was more than the expected number of data rows that pass the filter.

For example, for a condition like l.item = 106170 we can observe that:

no rows with l.item being NULL will meet the condition,
there will be only one distinct value of l.item (106170) after the filtering operation,
on average, number of data rows expected to pass the filter will be equal to number_of_input_rows * fraction_of_non_nulls / distinct_values. (This assumes, of course, that users most often drill down in the data they really have, which is quite a reasonable assumption and also safe to make).

Projection statistics

Projections (l.item – 1 AS iid) are similar to filters, except that, of course, they do not impact the expected number of rows after the operation.

For a projection, the following types of column statistics are calculated (if possible for given projection expression):

number of distinct values produced by the projection,
fraction of NULL values produced by the projection,
minimum/maximum value produced by the projection.

Naturally, if iid is only returned to the user, then these statistics are not useful. However, if it’s later used in filter or join operation, these statistics are important to correctly estimate the number of rows that meet the filter condition or are returned from the join.

Conclusion

Summing up, Trino’s Cost-Based Optimizer is conceptually a very simple thing. Alternative query plans are considered, the best plan is chosen and executed. Details are not so simple, though. Fortunately, to use Trino, one doesn’t need to know all these details. Of course, anyone with a technical inclination that like to wander in database internals is invited to study the Trino code!

Enabling Trino CBO is really simple:

set optimizer.join-reordering-strategy=AUTOMATIC and join-distribution-type=AUTOMATIC in your config.properties,
analyze your tables,
no, there is no third step. That’s it!

Take Trino CBO for a spin today and let us know about your Trino experience!

□

Dynamic filtering for highly-selective join optimization

2019-06-30T00:00:00+00:00

By using dynamic filtering via run-time predicate pushdown, we can significantly optimize highly-selective inner-joins.

Introduction

In the highly-selective join scenario, most of the probe-side rows are dropped immediately after being read, since they don’t match the join criteria.

Our idea was to extend Presto’s predicate pushdown support from the planning phase to run-time, in order to skip reading the non-relevant rows from our connector into Presto¹. It should allow much faster joins, when the build-side scan results in a low-cardinality table:

The approach above is called “dynamic filtering”, and there is an ongoing effort to integrate it into Presto.

The main difficulty is the need to pass the build-side values from the inner-join operator to the probe-side scan operator, since the operators may run on different machines. A possible solution is to use the coordinator to facilitate the message passing. However, it requires multiple changes in the existing Presto codebase and careful design is needed to avoid overloading the coordinator.

Since it’s a complex feature with lots of moving parts, we suggest the approach below that allows solving it in a simpler way for specific join use-cases. We note that parts of the implementation below will also help implementing the general dynamic filtering solution.

Design

Our approach relies on the cost-based optimizer (CBO) that allows using “broadcast” join, since in our case the build-side is much smaller than the probe-side. In this case, the probe-side scan and the inner-join operators are running in the same process - so the message passing between them becomes much simpler.

Therefore, most of the required changes are at the LocalExecutionPlanner class, and there is no dependencies on the planner nor the coordinator.

Implementation

First, we make sure that a broadcast join is used and that the local stage query plan contains the probe-side TableScan node. Otherwise - we don’t apply our the optimization since we need access to the probe-side PageSourceProvider for predicate pushdown.

Then, we add a new “collection” operator, just before the hash-builder operator as described below:

This operator collects the build-side values, and after its input is over, exposes the resulting dynamic filter as a TupleDomain to the probe-side PageSourceProvider.

Since the probe-side scan operators are running concurrently with the build-side collection, we don’t block the first probe-side splits - but allow them to be processed while dynamic filters collection is in progress.

The lookup-join operator is not changed, but the optimization above allows it to process much less probe-side rows, while keeping the result the same.

Benchmarks

We ran TPC-DS queries on i3.metal 3-node Varada cluster using TPC-DS scale 1000 data. The following queries benefit the most for our dynamic filtering implementation (measuring the elapsed time in seconds).

Query	Dynamic filtering & CBO	Only CBO	No CBO
q10	2.5	8.9	10.0
q20	3.9	12.6	26.7
q31	6.5	34.8	41.5
q32	6.9	23.0	29.7
q34	3.1	11.4	14.1
q69	2.7	8.9	9.9
q71	9.9	91.8	107.4
q77	3.5	17.9	18.1
q96	1.9	8.0	10.2
q98	5.8	26.5	57.1

For example, running the TPC-DS q71 query results in ~9x performance improvement:

Dynamic filtering	Enabled	Disabled
Elapsed (sec)	10	92
CPU (min)	14	127
Data read (GB)	11	112

Discussion

These queries are joining large fact “sales” tables with much smaller and filtered dimension tables (e.g. “items”, “customers”, “stores”) - resulting in significant optimization by using dynamic filtering.

Note that we rely on the fact that our connector allows efficient run-time filtering of the build-side table, by using an inline index for every column for each split.

We also rely on the CBO and statistics’ estimation to correctly convert join distribution type to “broadcast” join. Since current statistics’ estimation doesn’t support all query plans, this optimization cannot be currently applied for some types of aggregations (e.g. TPC-DS q19 query).

In addition, our current dynamic filtering doesn’t support multiple join operators in the same stage, so there are some TPC-DS queries (e.g. q13) that may be optimized further.

Future work

The implementation above is currently in the process of being reviewed and will be available in a release soon. In addition, we intend to improve the existing implementation to resolve the limitations described above, and to support more join patterns.

Initially we had experimented with adding Index Join support to our connector, but since it requires a global index and efficient lookups for high performance, we switched to the dynamic filtering approach. ↩

Release 315

2019-06-15T00:00:00+00:00

This version adds support for FETCH FIRST ... WITH TIES syntax, locality-awareness to default scheduler for better workload balancing, the new format() function, and improved support for ORC bloom filters. Additionally, connectors can now provide view definitions, which opens up several new use cases.

Release notes
Download

Release 314

2019-06-08T00:00:00+00:00

This version adds support for reading ZSTD and LZ4-compressed Parquet data and writing ZSTD-compressed ORC data, improves compatibility with the Hive 2.3+ metastore, supports mixed-case field names in Elasticsearch, adds JSON output format for the CLI, and improves the rendering of the plan structure in EXPLAIN output.

Release notes
Download

Apache Phoenix Connector

2019-06-04T00:00:00+00:00

Presto 312 introduces a new Apache Phoenix Connector, which allows Presto to query data stored in HBase using Apache Phoenix. This unlocks new capabilities that previously weren’t possible with Phoenix alone, such as federation (querying of multiple Phoenix clusters) and joining Phoenix data with data from other Presto data sources.

Setup

To get started, simply drop in a new catalog properties file, such as etc/catalog/phoenix.properties, which defines the following:

connector.name=phoenix
phoenix.connection-url=jdbc:phoenix:host1,host2,host3:2181:/hbase
phoenix.config.resources=/path/to/hbase-site.xml

The phoenix.connection-url is the standard Phoenix connection string, which contains the zookeeper quorum host information and root zookeeper node.

The phoenix.config.resources is a comma separated list of configuration files, used to specify any custom connection properties.

Schema

For the most part, data types in Phoenix match up with those in Presto, with a few minor exceptions. One thing to note, however, is that tables in Phoenix require a primary key, whereas Presto has no concept of primary keys. To handle this, the Phoenix connector uses a table property to specify the primary key. For example, consider the following statement in Phoenix:

CREATE TABLE example (
  pk_part_1 varchar,
  pk_part_2 varchar,
  val bigint
  CONSTRAINT pk PRIMARY KEY (pk_part_1, pk_part_2)
)

The equivalent statement in Presto would look something like:

CREATE TABLE phoenix.default.example (
  pk_part_1 varchar,
  pk_part_2 varchar,
  val bigint
)
WITH (
  rowkeys = 'pk_part_1,pk_part2'
)

Additional Phoenix and HBase table properties can be specified in a similar way. Note also that the default (empty) schema in Phoenix will always map to a Presto schema named “default”.

Beyond MapReduce

When Phoenix users want to run long-running queries that scan over all/most of the data in a table, they typically have used the Phoenix MapReduce integration. However, this has limitations, as the document states:

Note: The SELECT query must not perform any aggregation or use DISTINCT as these are not supported by our map-reduce integration.

This is because the framework only constructs simple Mappers which scan over each region. To do more complex operations like aggregations, the framework would need Reducers as well. Someone could implement that, but then they would essentially be on the path towards rewriting Hive from scratch.

Presto now provides the ability to do these more complex operations. The Phoenix connector performs the same filtered scans as the MapReduce framework, but now the Presto engine does the aggregations, joins, etc.

Federation

With the Phoenix connector, querying multiple Phoenix clusters is as easy as querying the respective catalogs. As a simple example, suppose we have one cluster in region us-west and another cluster in us-east. If we create two catalog files, phoenix_west.properties and phoenix_east.properties, then we can query both:

SELECT 'us-west' as region, * FROM phoenix_west.default.example
UNION
SELECT 'us-east' as region, * FROM phoenix_east.default.example

Joining with other data sources

Another nice feature of Presto is the ability to join data in Phoenix with other data sources. Suppose we have the following tables:

customer (
  custkey bigint,
  comment varchar,
  ...
)

orders (
  orderkey bigint,
  custkey bigint,
  totalprice double,
  ...
)

Suppose further that:

Either table can hold large amounts of data
The customer comment field can change frequently
We want to be able to query for orders with a certain totalprice range, and join with the customer table to get the comment for these orders

Phoenix/HBase is a row-oriented storage solution with very fast lookup by primary key. On the other hand, ORC is a column-oriented file format that can filter results by column value very efficiently. So in this use case, it might make sense to store the customer table in Phoenix with custkey as the primary key, and the orders table in ORC, perhaps in an object store like S3. We can then use Presto to leverage the strengths of each of our data stores and combine OLTP with OLAP:

SELECT c.custkey, c.comment, o.totalprice
FROM phoenix.tpch.customer AS c
INNER JOIN
(
  SELECT custkey, totalprice FROM hive.tpch.orders WHERE totalprice < 100
) o
ON c.custkey = o.custkey

Inserting/Updating data

In the prior example, since our customer data is coming from Phoenix, our OLTP store, we can easily insert new data:

INSERT INTO phoenix.tpch.customer VALUES (101, 'some comment')

Since Presto’s INSERT translates to Phoenix’s UPSERT, inserting is the same as updating - i.e. if there’s already a custkey of 101, then the comment will get updated instead.

Future work

With upcoming improvements to Presto, there will be opportunities to further optimize the performance of the Phoenix connector.

One of the biggest ways Phoenix optimizes performance is through the use of HBase coprocessors, which allow custom code to be run on each regionserver. For example, to do aggregations, Phoenix runs a partial aggregation in the coprocessor of each table region, and the result for each region is then passed back to the client for a final aggregation. That way, the table data itself doesn’t need to be sent from each region to the client - just the partial aggregation result. However, currently only filters are pushed down to the Phoenix connector. With the ongoing work in Presto to support more complex pushdown to connectors, we will be able to pushdown operations like aggregations to the Phoenix connector, which in turn can push them further down to the HBase coprocessors.

Another area of potential improvement is integration with Presto’s cost-based optimizer, which can analyze table statistics to do things like join reordering. Phoenix already supports statistics collection, with more improvements underway, so this is just a matter of integrating with the Presto statistics framework.

Questions?

If you have any questions about the connector, or Phoenix in general, feel free to ask on the Phoenix dev mailing list: dev@phoenix.apache.org.